MongoDB: How to check for duplicates in a collection

This techniques to find out dupes in a MongoDB collection uses Map Reduce. The steps involved are simple:

Create a script called, say checkdupes.js add the following code to it.  The script runs on a collection called myCollection and examines the values of the field called myField, for each distinct value of myField, it inserts a document in a new collection called myDupesCollection.

m = function () {
    emit(this.myField, 1);
}

r = function (k, vals) {
    return Array.sum(vals);
}
res = db.myCollection.mapReduce(m, r, { out : "myDupesCollection" });

The script above can be run from the command line as follows

mongo myDB checkdupes.js

Now check for the dupes in the newly created collection by running the following command

db.myDupesCollection.find({value: {$gt: 1}});
About these ads

One thought on “MongoDB: How to check for duplicates in a collection

  1. You can share your own knowledge by writing in a
    blog, or enrich your knowledge by reading someone else’s. This method
    assumes both blog applications are installed
    in the same domain and to the same database.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s