First experiments with map function


While in Relational Databases you write select statements for choosing a subset of your original database, in CouchDB  you write map-reduce functions that lets you extract part of the data to work with.

The reason for calling the views as map is because extracted data are pairs {key, value}. Where both key and value can be any JSON data structure. Views are sorted by keys.

Some characteristics of map functions are:

  • Filter documents from your database to let you work with some of them.
  • Extract part of the original document.
  • Sort (filtered) documents.
  • Do calculations on the data of the document (using reduce).
In addition, CouchDB builds indexes on the extracted data making its usage very efficient.

Write documents

Lets start writing a bunch of documents in our Database that then we will use for playing with map and reduce.

        Random random = new Random(System.currentTimeMillis());
        int totalSize = 10000;
        int bulkSize = 1000;

        // Create totalSize documents
        int records = 0;
        while (records < totalSize) {
            // Create a list of documents that are going to be created using 'bulk'
            List docList = new ArrayList();
            Date firstDate = new GregorianCalendar(2012, GregorianCalendar.JANUARY, 1).getTime();
            Calendar calendar = GregorianCalendar.getInstance();
            SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd");

            for (int i = 0; i < bulkSize; i++) {
                int value = (records + i);
                calendar.setTime(firstDate);
                calendar.add(Calendar.DATE, random.nextInt(30));

                BaseDocument doc = new BaseDocument();
                doc.setProperty("name", "name." + value);
                doc.setProperty("value", value);
                doc.setProperty("even", (value % 2) == 0 ? "yes" : "no");
                doc.setProperty("date", sdf.format(calendar.getTime()));
                docList.add(doc);
            }
            List infoList = db.bulkCreateDocuments(docList);
            assertEquals(infoList.size(), docList.size());
            for (DocumentInfo info : infoList) {
                assertNotNull(info.getId());
                assertNotNull(info.getRevision());
            }
            records += bulkSize;
        }

Each document has the following fields:

  • name: The string name plus a dot and a sequence number (example: name.1234).
  • value: The sequence number used in name.
  • even: yes or no depending if the number is even or odd.
  • date: January 1st, 2012 plus a random number of day between 0 and 30 and formatted as yyyy-MM-dd (year with 4 digits, month as 2 digits and day as 2 digits). The reason for choosing this format year-month-day is because then they are sorted and will allow us to choose dates in a range.

In order to check that we actually created bulkSize documents on each iteration, we iterate on the list of DocumentInfo returned by bulkCreateDocuments and use assert to check that they actually have an identifier.

This is a pretty small document but will allow me to show some basics on map-reduce.

Define a view

Views are defined using JavaScript functions where you evaluate if one document has to be emitted (be part of the final view).

In the following example we choose those documents where the value of even field is yes and value is less than 10. The view contains as key the name and value is the identifier of the document.

function(doc) {
    if (doc.even == 'yes' && doc.value < 10) {
        emit (doc.name, doc._id);
    }
}

The resulting displayed using CouchDB Futon is:

emit function has two arguments that are going to be the key and the value of the resulting map.

Keys might be null (useful if you need all results in no special order) or complex structures if you need more than one item (used in views where you want to choose by more than one key).

Create a View in CouchDb

Views are Design Documents and we I have defined a helper function for defining and updating a view.

public class HelperFunctions {
    private static final String ViewsPath = "views";

    public static void defineView(Database db,
                                  String name,
                                  String mapFn) {
        DesignDocument doc = new DesignDocument(ViewsPath);

        // Check if the documents exists...
        try {
            DesignDocument old = db.getDesignDocument(doc.getId());
            doc.setRevision(old.getRevision());
            doc.setViews(old.getViews());
        } catch (NotFoundException e) {
            // Do nothing, it is enough knowing that it does not exist
        }
        View view = new View();
        view.setMap(mapFn);
        doc.addView(name, view);
        db.createOrUpdateDocument(doc);
    }
}

See this for more details on this function.

Basic query view in CouchDB

Pretty simple…

        ViewResult result = db.queryView(map, BaseDocument.class, null, null);

We just need two arguments, the name of the map function and the class of the output documents (here we used BaseDocument). The result is a ViewResult<BaseDocument> than includes the list of BaseDocuments emitted by the map function.

        // Define a view called lessThan10
        // that emits as key a field named value and as value the id of the document
        HelperFunctions.defineView(db, "lessThan10",
                "function(doc) { if (doc.value < 10) emit(doc.value, doc._id); }");
        // Query view with no options and not keys
        ViewResult result = db.queryView("views/lessThan10", BaseDocument.class, null, null);

        // Check that we get 10 results as expected (see DB creation above)
        assertEquals(result.getRows().size(), 10);

        // Display results
        List<ValueRow> rows = result.getRows();
        for (ValueRow row : rows) {
            System.out.printf("%3d - %s\n", ((Long) row.getKey()).intValue(), row.getValue());
        }

This code displays:

  0 - b4956242da33280c47337b61254c7d68
  1 - b4956242da33280c47337b61254c839a
  2 - b4956242da33280c47337b61254c8c88
  3 - b4956242da33280c47337b61254c8eed
  4 - b4956242da33280c47337b61254c9e1b
  5 - b4956242da33280c47337b61254ca6c5
  6 - b4956242da33280c47337b61254cb10e
  7 - b4956242da33280c47337b61254cb14e
  8 - b4956242da33280c47337b61254cc069
  9 - b4956242da33280c47337b61254cce23

NOTE: We have being able to cast the key of the document to a Long since we know from our map function that that’s the case.

Advertisements

2 thoughts on “First experiments with map function

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s