First experiments with map-reduce


In the previous post I have shown how to create / update / query a map function for creating a view using jcouchdb Java library.

In this one I’m going to show how to use reduce functions.

Create a View in CouchDB

In that  post I created a helper function that allowed me to create / update a view but this function did not define a reduce function. So, I’m going to introduce a slight change for setting reduce function as well as map.

public class HelperFunctions {
    private static final String ViewsPath = "views";
...
    // Define view with map and reduce function
    public static void defineView(Database db,
                                  String name,
                                  String mapFn,
                                  String reduceFn) {
        DesignDocument doc = new DesignDocument(ViewsPath);

        // Check if the documents exists...
        try {
            DesignDocument old = db.getDesignDocument(doc.getId());
            doc.setRevision(old.getRevision());
            doc.setViews(old.getViews());
        } catch (NotFoundException e) {
            // Do nothing, it is enough knowing that it does not exist
        }
        View view = new View();
        view.setMap(mapFn);
        view.setReduce(reduceFn);
        doc.addView(name, view);
        db.createOrUpdateDocument(doc);
    }

    // Define view only with map function
    public static void defineView(Database db,
                                  String name,
                                  String mapFn) {
        defineView(db, name, mapFn, null);
    }
}

I created a new defineView with an extra argument (reduceFn) and moved the code to this implementation. Previous interface is kept and invokes the new with null value for reduceFn argument indicating that no reduce function.

In the previous code we add an extra instruction (line 21) for setting reduce function (view.setReduce(reduceFn)).

And that’s it! Pretty easy, isn’t it?

Using map – reduce for building a histogram

For showing how to use map and reduce, we are going to count the number of documents that we created in the previous post for each date.

The map function is:

function(doc) {
    emit(doc.date, 1);
}

and the reduce function:

function(keys, values) {
    return sum(values);
}

Basically the view emits a pair keyvalue where key is the date and the value is 1 for each date. Then reduce function uses internal sumfunction for counting the values (always 1).

But, we want the counter grouped per date (that’s why we set the key equal to document’s date). This is controlled programmatically with the third argument of queryView (Options).

        // Define map and reduce functions
        HelperFunctions.defineView(db,
                "dateHistogram",
                "function(doc) { emit(doc.date, 1); }",
                "function(keys, values) { return sum(values); }");
        // Execute query and group results by key
        ViewResult<Object> result = db.queryView("views/dateHistogram",
                Object.class,
                new Options().group(true),
                null);
        // Display results
        List<ValueRow<Object>> rows = result.getRows();
        for (ValueRow<Object> row : rows) {
            System.out.printf("%s - %s\n", row.getKey(), row.getValue());
        }

In previous examples of queryViews I didn’t set any option, here I set group to true for grouping results by key, this is simply done with new Options().group(true).

NOTE: queryViews by default executes reduce function, if defined, but you can disable it invoking the method reduce(false) from Options.
The output is:

2012-01-01 - 351
2012-01-02 - 328
2012-01-03 - 339
2012-01-04 - 347
2012-01-05 - 359
2012-01-06 - 311
2012-01-07 - 327
2012-01-08 - 326
2012-01-09 - 356
2012-01-10 - 361
2012-01-11 - 337
2012-01-12 - 320
2012-01-13 - 309
2012-01-14 - 336
2012-01-15 - 334
2012-01-16 - 302
2012-01-17 - 325
2012-01-18 - 297
2012-01-19 - 323
2012-01-20 - 356
2012-01-21 - 347
2012-01-22 - 304
2012-01-23 - 312
2012-01-24 - 332
2012-01-25 - 347
2012-01-26 - 329
2012-01-27 - 332
2012-01-28 - 384
2012-01-29 - 333
2012-01-30 - 336

If we would like to count the number of values that are even and odd, it’s enough defining a map function as:

function(doc) {
    emit(doc.even, 1);
}

The output is:

no - 5000
yes - 5000
Advertisements

One thought on “First experiments with map-reduce

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s