MongoDB: How to aggregate the lengths of nested arrays

Lets say you have collection (mycollection) that has an embedded object (tags) which happens to be an array.

> db.mycollection.find()
{ "_id" : ObjectId("515c8ab8e4b06d8f844ac0bd"), "tags" : [ "a", "b", "c" ] }
{ "_id" : ObjectId("515c8ab8e4b06d8f844ac0bc"), "tags" : [ "b", "c" ] }

You have to count the total number of tags across all documents in the collection. Here are a couple of ways to accomplish this.

1. MapReduce

db.mycollection.mapReduce(
    function() { emit('tags', { count: this.tags.length }); },
    function(key, values) {
        var result = { count: 0 };
        values.forEach(function(value) {
            result.count += value.count;
        });
        return result;
    },
    { out: { inline: 1 }}
);

First argument above is the map function. It scans the entire collection and emits the number of replies in each document under a constant key.

Second argument above is the reduce function. It will examine the emitted values consolidate (literally reduce) the result.

2. Aggregation Framework

db.mycollection.aggregate(
    { $project: { tags: 1 }},
    { $unwind: "$tags" },
    { $group: { _id: "result", count: { $sum: 1 }}}
);

First argument specifies that the field of interest is tags.

Second argument unwinds the array so that its elements can be iterated over.

Third argument does the total under a bucket called “result”

Advertisements

MongoDB: How to check for duplicates in a collection

This techniques to find out dupes in a MongoDB collection uses Map Reduce. The steps involved are simple:

Create a script called, say checkdupes.js add the following code to it.  The script runs on a collection called myCollection and examines the values of the field called myField, for each distinct value of myField, it inserts a document in a new collection called myDupesCollection.

m = function () {
    emit(this.myField, 1);
}

r = function (k, vals) {
    return Array.sum(vals);
}
res = db.myCollection.mapReduce(m, r, { out : "myDupesCollection" });

The script above can be run from the command line as follows

mongo myDB checkdupes.js

Now check for the dupes in the newly created collection by running the following command

db.myDupesCollection.find({value: {$gt: 1}});