MongoDB: Find and remove duplicates without MapReduce

The following command will find all emails that exists more than once in a collection called users

db.users.aggregate([
 {$group: {_id: "$email", count: {$sum: 1}}},
 {$match: {count: {$gt: 1} }} 
])

The following command will create a unique index on email and delete the duplicates

db.users.createIndex( {email: 1}, {unique: true, dropDups: true} )
Advertisements

MongoDB: How to aggregate the lengths of nested arrays

Lets say you have collection (mycollection) that has an embedded object (tags) which happens to be an array.

> db.mycollection.find()
{ "_id" : ObjectId("515c8ab8e4b06d8f844ac0bd"), "tags" : [ "a", "b", "c" ] }
{ "_id" : ObjectId("515c8ab8e4b06d8f844ac0bc"), "tags" : [ "b", "c" ] }

You have to count the total number of tags across all documents in the collection. Here are a couple of ways to accomplish this.

1. MapReduce

db.mycollection.mapReduce(
    function() { emit('tags', { count: this.tags.length }); },
    function(key, values) {
        var result = { count: 0 };
        values.forEach(function(value) {
            result.count += value.count;
        });
        return result;
    },
    { out: { inline: 1 }}
);

First argument above is the map function. It scans the entire collection and emits the number of replies in each document under a constant key.

Second argument above is the reduce function. It will examine the emitted values consolidate (literally reduce) the result.

2. Aggregation Framework

db.mycollection.aggregate(
    { $project: { tags: 1 }},
    { $unwind: "$tags" },
    { $group: { _id: "result", count: { $sum: 1 }}}
);

First argument specifies that the field of interest is tags.

Second argument unwinds the array so that its elements can be iterated over.

Third argument does the total under a bucket called “result”