CouchDB poll!!!


I’d like to know how people is using CouchDB and I couldn’t find a better way than asking it!

The very first question is if you are actually using it, evaluating it, playing with it, curious about it… Might also be interesting knowing if recent movement of Damien Katz (leaving Apache CouchDB) might affect your decision.

Then if you use it from one of programming language or deploy your application inside CouchDB, or is it a HTML/JavaScript application that uses AJAX for interacting with CouchDB…

Third question is for people using it from Java and wondering about which API / library are you using…

Please, leave any comment or question that you would like to have included in this poll.

Advertisements

CouchDB beam uses 100% of CPU and network


While playing around with CouchDB I have observed that my system was running really slow (actually first of all I notice it heating) and the System Monitor showed that both CPU was at 100% and network usage was very high. The responsible for this was beam.

Doing some Google research I found that beam was related to CouchDB but the only thing that I was trying to do is connecting to a CouchDb server and after getting a DB Connector using a wrong user/password combination, try to create a document.

Debugging my Java code I found that the system was running some code in Apache HTTP client that for some reason did not finish (showing an error or throwing an Exception) but keep trying and trying.

Trying to find the problem I downloaded the sources of the latest version (4.2) of httpclient, httpclient-cache and httpcore for debugging them and … surprise! the problem disappear. Then I downloaded the sources for version 4.1.1 (the one that I was using before) and failed again.

Conclusion: If you have problems with beam and you are using apache-http version try upgrading it.

NOTE:

CouchDB, jcouchdb and map (in more detail)


I have already written a very first tutorial about jcouchdb and map. But for those used to write SELECT in SQL there a lot of things other than those simple queries.

Nevertheless, you should switch your mind since working with CouchDB map/reduce function will be a completely new paradigm and there are no magic recipes about how to convert SQL tables on CouchDB documents and SELECTS on map/reduce functions.

Example Database and create, update, query Views in CouchDb

First, I will refer you to the Database that I created as example here. Where I inserted some small documents with some few properties and the helper functions for creating / updating / querying views introduced here.

Limiting the number of results in jcouchdb

HelperFunctions.defineView(db, "sortedById",
                           "function(doc) { emit(doc._id, doc); }");
ViewResult result = db.queryView("views/sortedById",
                                 BaseDocument.class,
                                 new Options().limit(100),
                                 null);
// Check that we get 100 results
assertEquals(result.getRows().size(), 100);

In line 1-2 we define a map function that emits the identifier of the document  and the document itself. Then we call queryView (line 3) method asking for a document of type BaseDocument (line 4) and we introduce a new method in Options class called limit that allows you to specify the maximum number of document to retrieve.

Line 8 verifies that we actually get 100 documents.

NOTEViewResult contains a method name getTotalRows that returns the number of row that meet the condition BUT not the ones that have been retrieved.

One additional question is that map returns the results ordered by key (first argument of emit function) so, in the previous example, we got the results ordered by identifier.

Getting results ordered with jcouchdb

Consider the following code that defines a view for getting results ordered by date.

HelperFunctions.defineView(db,
                           "sortedByDate",
                           "function(doc) { emit(doc.date, doc); }");
ViewResult result = db.queryView("views/sortedByDate",
                                               BaseDocument.class,
                                               new Options().limit(100),
                                               null);
List rows = result.getRows();

// Check that we get 100 results
assertEquals(rows.size(), 100);

// Verify that they are actually sorted by date (key)
for (int i = 1; i < rows.size(); i++) {
    assertTrue(((String) rows.get(i - 1).getKey()).compareTo((String) rows.get(i).getKey()) }

As in the previous example we define a view (using our HelperFunctions) and then invoke Database queryView. We limit the number of results to 100 (line 6) and then in line 14-16 we iterate in results comparing that each date compared with previous is actually equal or greater.

NOTE: we can actually do this because we saved dates as YYYY-MM-DD.

Querying in jcouchdb for a key value that is in a set of values

Consider the same map function (sortedByDate) that in previous section and now we are going to use queryViewByKeys instead of queryView that includes an extra argument that is the list of values to match the key that we are looking for.

HelperFunctions.defineView(db, "sortedByDate", "function(doc) { emit(doc.date, doc); }");
List list = Arrays.asList("2012-01-01", "2012-01-02", "2012-01-03");

ViewResult result = db.queryViewByKeys("views/sortedByDate",
                                                     BaseDocument.class,
                                                     list,
                                                     null,
                                                     null);
List<ValueRow> rows = result.getRows();
assertTrue(rows.size() != 0);

// Check that results are actually one of the desired
for (ValueRow row : rows) {
    String date = (String) row.getValue().getProperty("date");
    assertTrue(list.contains(date));
}

In line 2 we define a list of Strings containing those dates that we want to match.

In line 4 we can see the extra argument (the list of keys to match).

Lines 13 through 16 we check that the date on the retrieved documents are actually included in the list.

NOTE: method getRow used in line 14 returns the second argument used in the emit of the map function (the document for this sortedByDate map function).

How do you see it so far?

First experiments with map-reduce


In the previous post I have shown how to create / update / query a map function for creating a view using jcouchdb Java library.

In this one I’m going to show how to use reduce functions.

Create a View in CouchDB

In that  post I created a helper function that allowed me to create / update a view but this function did not define a reduce function. So, I’m going to introduce a slight change for setting reduce function as well as map.

public class HelperFunctions {
    private static final String ViewsPath = "views";
...
    // Define view with map and reduce function
    public static void defineView(Database db,
                                  String name,
                                  String mapFn,
                                  String reduceFn) {
        DesignDocument doc = new DesignDocument(ViewsPath);

        // Check if the documents exists...
        try {
            DesignDocument old = db.getDesignDocument(doc.getId());
            doc.setRevision(old.getRevision());
            doc.setViews(old.getViews());
        } catch (NotFoundException e) {
            // Do nothing, it is enough knowing that it does not exist
        }
        View view = new View();
        view.setMap(mapFn);
        view.setReduce(reduceFn);
        doc.addView(name, view);
        db.createOrUpdateDocument(doc);
    }

    // Define view only with map function
    public static void defineView(Database db,
                                  String name,
                                  String mapFn) {
        defineView(db, name, mapFn, null);
    }
}

I created a new defineView with an extra argument (reduceFn) and moved the code to this implementation. Previous interface is kept and invokes the new with null value for reduceFn argument indicating that no reduce function.

In the previous code we add an extra instruction (line 21) for setting reduce function (view.setReduce(reduceFn)).

And that’s it! Pretty easy, isn’t it?

Using map – reduce for building a histogram

For showing how to use map and reduce, we are going to count the number of documents that we created in the previous post for each date.

The map function is:

function(doc) {
    emit(doc.date, 1);
}

and the reduce function:

function(keys, values) {
    return sum(values);
}

Basically the view emits a pair keyvalue where key is the date and the value is 1 for each date. Then reduce function uses internal sumfunction for counting the values (always 1).

But, we want the counter grouped per date (that’s why we set the key equal to document’s date). This is controlled programmatically with the third argument of queryView (Options).

        // Define map and reduce functions
        HelperFunctions.defineView(db,
                "dateHistogram",
                "function(doc) { emit(doc.date, 1); }",
                "function(keys, values) { return sum(values); }");
        // Execute query and group results by key
        ViewResult<Object> result = db.queryView("views/dateHistogram",
                Object.class,
                new Options().group(true),
                null);
        // Display results
        List<ValueRow<Object>> rows = result.getRows();
        for (ValueRow<Object> row : rows) {
            System.out.printf("%s - %s\n", row.getKey(), row.getValue());
        }

In previous examples of queryViews I didn’t set any option, here I set group to true for grouping results by key, this is simply done with new Options().group(true).

NOTE: queryViews by default executes reduce function, if defined, but you can disable it invoking the method reduce(false) from Options.
The output is:

2012-01-01 - 351
2012-01-02 - 328
2012-01-03 - 339
2012-01-04 - 347
2012-01-05 - 359
2012-01-06 - 311
2012-01-07 - 327
2012-01-08 - 326
2012-01-09 - 356
2012-01-10 - 361
2012-01-11 - 337
2012-01-12 - 320
2012-01-13 - 309
2012-01-14 - 336
2012-01-15 - 334
2012-01-16 - 302
2012-01-17 - 325
2012-01-18 - 297
2012-01-19 - 323
2012-01-20 - 356
2012-01-21 - 347
2012-01-22 - 304
2012-01-23 - 312
2012-01-24 - 332
2012-01-25 - 347
2012-01-26 - 329
2012-01-27 - 332
2012-01-28 - 384
2012-01-29 - 333
2012-01-30 - 336

If we would like to count the number of values that are even and odd, it’s enough defining a map function as:

function(doc) {
    emit(doc.even, 1);
}

The output is:

no - 5000
yes - 5000

First experiments with map function


While in Relational Databases you write select statements for choosing a subset of your original database, in CouchDB  you write map-reduce functions that lets you extract part of the data to work with.

The reason for calling the views as map is because extracted data are pairs {key, value}. Where both key and value can be any JSON data structure. Views are sorted by keys.

Some characteristics of map functions are:

  • Filter documents from your database to let you work with some of them.
  • Extract part of the original document.
  • Sort (filtered) documents.
  • Do calculations on the data of the document (using reduce).
In addition, CouchDB builds indexes on the extracted data making its usage very efficient.

Write documents

Lets start writing a bunch of documents in our Database that then we will use for playing with map and reduce.

        Random random = new Random(System.currentTimeMillis());
        int totalSize = 10000;
        int bulkSize = 1000;

        // Create totalSize documents
        int records = 0;
        while (records < totalSize) {
            // Create a list of documents that are going to be created using 'bulk'
            List docList = new ArrayList();
            Date firstDate = new GregorianCalendar(2012, GregorianCalendar.JANUARY, 1).getTime();
            Calendar calendar = GregorianCalendar.getInstance();
            SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd");

            for (int i = 0; i < bulkSize; i++) {
                int value = (records + i);
                calendar.setTime(firstDate);
                calendar.add(Calendar.DATE, random.nextInt(30));

                BaseDocument doc = new BaseDocument();
                doc.setProperty("name", "name." + value);
                doc.setProperty("value", value);
                doc.setProperty("even", (value % 2) == 0 ? "yes" : "no");
                doc.setProperty("date", sdf.format(calendar.getTime()));
                docList.add(doc);
            }
            List infoList = db.bulkCreateDocuments(docList);
            assertEquals(infoList.size(), docList.size());
            for (DocumentInfo info : infoList) {
                assertNotNull(info.getId());
                assertNotNull(info.getRevision());
            }
            records += bulkSize;
        }

Each document has the following fields:

  • name: The string name plus a dot and a sequence number (example: name.1234).
  • value: The sequence number used in name.
  • even: yes or no depending if the number is even or odd.
  • date: January 1st, 2012 plus a random number of day between 0 and 30 and formatted as yyyy-MM-dd (year with 4 digits, month as 2 digits and day as 2 digits). The reason for choosing this format year-month-day is because then they are sorted and will allow us to choose dates in a range.

In order to check that we actually created bulkSize documents on each iteration, we iterate on the list of DocumentInfo returned by bulkCreateDocuments and use assert to check that they actually have an identifier.

This is a pretty small document but will allow me to show some basics on map-reduce.

Define a view

Views are defined using JavaScript functions where you evaluate if one document has to be emitted (be part of the final view).

In the following example we choose those documents where the value of even field is yes and value is less than 10. The view contains as key the name and value is the identifier of the document.

function(doc) {
    if (doc.even == 'yes' && doc.value < 10) {
        emit (doc.name, doc._id);
    }
}

The resulting displayed using CouchDB Futon is:

emit function has two arguments that are going to be the key and the value of the resulting map.

Keys might be null (useful if you need all results in no special order) or complex structures if you need more than one item (used in views where you want to choose by more than one key).

Create a View in CouchDb

Views are Design Documents and we I have defined a helper function for defining and updating a view.

public class HelperFunctions {
    private static final String ViewsPath = "views";

    public static void defineView(Database db,
                                  String name,
                                  String mapFn) {
        DesignDocument doc = new DesignDocument(ViewsPath);

        // Check if the documents exists...
        try {
            DesignDocument old = db.getDesignDocument(doc.getId());
            doc.setRevision(old.getRevision());
            doc.setViews(old.getViews());
        } catch (NotFoundException e) {
            // Do nothing, it is enough knowing that it does not exist
        }
        View view = new View();
        view.setMap(mapFn);
        doc.addView(name, view);
        db.createOrUpdateDocument(doc);
    }
}

See this for more details on this function.

Basic query view in CouchDB

Pretty simple…

        ViewResult result = db.queryView(map, BaseDocument.class, null, null);

We just need two arguments, the name of the map function and the class of the output documents (here we used BaseDocument). The result is a ViewResult<BaseDocument> than includes the list of BaseDocuments emitted by the map function.

        // Define a view called lessThan10
        // that emits as key a field named value and as value the id of the document
        HelperFunctions.defineView(db, "lessThan10",
                "function(doc) { if (doc.value < 10) emit(doc.value, doc._id); }");
        // Query view with no options and not keys
        ViewResult result = db.queryView("views/lessThan10", BaseDocument.class, null, null);

        // Check that we get 10 results as expected (see DB creation above)
        assertEquals(result.getRows().size(), 10);

        // Display results
        List<ValueRow> rows = result.getRows();
        for (ValueRow row : rows) {
            System.out.printf("%3d - %s\n", ((Long) row.getKey()).intValue(), row.getValue());
        }

This code displays:

  0 - b4956242da33280c47337b61254c7d68
  1 - b4956242da33280c47337b61254c839a
  2 - b4956242da33280c47337b61254c8c88
  3 - b4956242da33280c47337b61254c8eed
  4 - b4956242da33280c47337b61254c9e1b
  5 - b4956242da33280c47337b61254ca6c5
  6 - b4956242da33280c47337b61254cb10e
  7 - b4956242da33280c47337b61254cb14e
  8 - b4956242da33280c47337b61254cc069
  9 - b4956242da33280c47337b61254cce23

NOTE: We have being able to cast the key of the document to a Long since we know from our map function that that’s the case.

Design Documents


Design documents are a special CouchDB document. This documents do not contain data but functions.

These documents have an identifier starting with _design/. Some examples of design documents are:

  • Views a.k.a. map / reduce functions.
  • Validation functions,
  • Show functions,
  • and List transforms.

Other than the fact that these functions are executed inside CouchDB, they are stored in a very similar way to other documents: they have identifiers and revisions and these need to be managed properly.

Storing Design Documents.

In the following piece of code, I am going to show how to store and update a view.

    DesignDocument doc = new DesignDocument("my_views");
    String mapFn = "function(doc) { if (doc.even == 'yes' &&  doc.value < 10) { emit(doc.name, doc._id); }";
    // Check if the documents exists...
    try {
        DesignDocument old = db.getDesignDocument(doc.getId());
        doc.setRevision(old.getRevision());
        doc.setViews(old.getViews());
    } catch (NotFoundException e) {
        // Do nothing, it is enough knowing that it does not exist
    }
    View view = new View();
    view.setMap(mapFn);
    doc.addView(name, view);
    db.createOrUpdateDocument(doc);

First thing to note, is that views are stored with an identifier equal to _design/XXX where XXX is something that you choose. In my case I define it equal to my_views so, I create a new DesignDocument with my_views as argument.

Then, it is important to note that since design documents have identifiers and revision, we should check if the design document already exists before trying to add a new view. We do it directly trying to get the document with identifier _design/my_views, if it does not exist we get an exception (but that’s ok since I catch it!). If already exists a document with the same identifier, we copy the revision (doc.setRevision(old.getRevision())) and copy already existing views (doc.setViews(old.getViews())).

Finally, we create a new View object, assign the map function that is defined as a String and then add this view to the current document (addView(name, view)).

Last sentence (createOrUpdateDocument(doc)) is the same shown in previous posts.

NOTE: You might store multiple views with the same identifier. They are stored in the design document inside views field with the name that you provide as argument in addView call. In the following capture we can see that we have two views stored under the same identifier _design/my_views (they are part of the same document): the first lists documents with a value even and less than 10 while the second view lists all that are less than 10 (even and odd).

 

Installing CouchDB 1.2 in Ubuntu 12.04


Sad enough, seems that there is no version of CouchDB 1.2 in Ubuntu Software Sources -you will find 1.0.1 but it is pretty old- so you have to download and compile it!

Installing Ubuntu Desktop 12.04 LTS

The very first thing is going to Ubuntu site and download the ISO image (I have chosen Desktop -easier administration- and 64 bits edition).

Since I run it in a virtual machine (with VmWare Fussion) and after a couple of unsuccessful (easy) installations I’ve decide to go with manual installation.

My (non-default) Settings are:

  • Processor & RAM: 1 CPU, 3072 MB.
  • Hard Disks: SCSI Disk, 40 GB (Split in 2 GB files).
  • Network: NAT (Generated MAC address – click on Advanced Options and then Generate).

Doing it manually went pefectly well other than the resolution originally set during the installation (5120 x 3200) that I switched to 1024×768 by clicking on an icon on the top right corner -a gear- and then choosing System Setting… > Displays).

During the installation I have chosen NOT to download upgrades / updates so I did it just after the reboot. The easiest way is open a terminal and run:

$ sudo apt-get update
$ sudo apt-get dist-upgrade
$ sudo reboot

It downloads from your source repository the description of the updates and then the second command upgrade packages installed and outdated.

NOTE: I reboot the system since the kernel was updated too.

Installing CouchDB 1.2

  • Start installing some packages that are going to be needed:
$ sudo apt-get install g++
$ sudo apt-get install erlang-base erlang-dev erlang-eunit erlang-nox
$ sudo apt-get install libmozjs185-dev libicu-dev libcurl4-gnutls-dev libtool
  • Then go to CouchDB site and choose to download sources.
  • In a terminal, go to the folder where you have downloaded the file and run:
$ ./configure
$ make
$ sudo make install
  • Start couchdb
$ sudo couchdb
Apache CouchDB 1.2.0 (LogLevel=info) is starting.
Apache CouchDB has started. Time to relax.
[info] [<0.32.0>] Apache CouchDB has started on http://127.0.0.1:5984/
  • Check that is actually running using curl (install it if needed):
$ sudo apt-get install curl
$ curl -X GET http://localhost:5984
[info] [<0.361.0>] 127.0.0.1 - - GET / 200
{"couchdb":"Welcome","version":"1.2.0"} $

And, hopefully, that’s it!

Setting CouchDB as service

Maybe you don’t want to start CouchDB manually each time you reboot your machine.

In that case some few things to do:

  • Add couchdb user and group.
$ sudo adduser --disabled-login --disabled-password --no-create-home couchdb
Adding user `couchdb' ...
Adding new group `couchdb' (1001) ...
Adding new user `couchdb' (1001) with group `couchdb' ...
Not creating home directory `/home/couchdb'.
Changing the user information for couchdb
Enter the new value, or press ENTER for the default
 Full Name []: CouchDB Admin
 Room Number []:
 Work Phone []:
 Home Phone []:
 Other []:
Is the information correct? [Y/n] Y
$
  • Check/set right owner for files and folders (alternatively, you might have it installed in /var instead of /usr/local/var.
$ sudo chown -R couchdb:couchdb /usr/local/var/log/couchdb
$ sudo chown -R couchdb:couchdb /usr/local/var/lib/couchdb
$ sudo chown -R couchdb:couchdb /usr/local/var/run/couchdb
$ 
  • Link couchdb service script to /etc/init.d.
$ sudo ln -s /usr/local/etc/init.d/couchdb  /etc/init.d
$ 
  • Configure service to start when you enter / change levels:
$ sudo update-rc.d couchdb defaults
 Adding system startup for /etc/init.d/couchdb ...
   /etc/rc0.d/K20couchdb -> ../init.d/couchdb
   /etc/rc1.d/K20couchdb -> ../init.d/couchdb
   /etc/rc6.d/K20couchdb -> ../init.d/couchdb
   /etc/rc2.d/S20couchdb -> ../init.d/couchdb
   /etc/rc3.d/S20couchdb -> ../init.d/couchdb
   /etc/rc4.d/S20couchdb -> ../init.d/couchdb
   /etc/rc5.d/S20couchdb -> ../init.d/couchdb
$ 

Next time that you reboot your machine you should be able to access CouchDB with no additional actions.

Possible errors / problems

This is a list of errors that I’ve seen not following these steps:

  • When erlang is not installed you get:
checking for erl... no
configure: error: Could not find the `erl' executable. Is Erlang installed?
  • Original documentation says to install libmozjs-dev but it is not in Ubuntu Software Sources: Install libmozjs185-dev.
  • When g++ is not installed, configure phase fails complaining about SpiderMonkey not installed!!! Please, install g++ and try again. This is the error that I got:
checking for JS185... yes
checking for JS185... yes
checking for JS185... yes
checking for jsapi.h... no
checking js/jsapi usability... no
checking js/jsapi.h presence... no
checking for js/jsapi.h... no
configure: error: Could not find the jsapi header

Are the Mozilla SpiderMonkey headers installed?
$ ls /usr/include/js/jsapi.h
-rw-r--r-- 1 root root 136254 Mar 31  2011 /usr/include/js/jsapi.h
$
  • When erlang-eunit is not installed you get errors at compile time (even that configure finish correctly). Install erlang-eunit to fix it! This is the error that I got:
$ make
make  all-recursive
make[1]: Entering directory `/home/onabai/Downloads/apache-couchdb-1.2.0'
Making all in bin
make[2]: Entering directory `/home/onabai/Downloads/apache-couchdb-1.2.0/bin'
make[2]: Nothing to be done for `all'.
make[2]: Leaving directory `/home/onabai/Downloads/apache-couchdb-1.2.0/bin'
...
Making all in mochiweb
make[3]: Entering directory `/home/onabai/Downloads/apache-couchdb-1.2.0/src/mochiweb'
/usr/bin/erlc  mochifmt.erl
./mochifmt.erl:none: error in parse transform 'eunit_autoexport': {undef,
                                              [{eunit_autoexport,
                                                parse_transform,
                                                [[{attribute,1,file,
                                                   {"./mochifmt.erl",1}},
                                                  {attribute,7,module,
                                                   mochifmt},
                                                  {attribute,8,author,
                                                   'bob@mochimedia.com'},
                                                  {attribute,9,export,
                                                   [{format,2},
                                                    {format_field,2},
                                                    {convert_field,2},
                                                    {get_value,2},
                                                    {get_field,2}]},
                                                  {attribute,10,export,
                                                   [{tokenize,1},
                                                    {format,3},
                                                    {get_field,3},
                                                    {format_field,3}]},
                                                  {attribute,11,export,
                                                   [{bformat,2},{bformat,3}]},
...
                                               {compile,
                                                '-foldl_transform/2-anonymous-2-',
                                                2},
                                               {compile,foldl_transform,2},
                                               {compile,
                                                '-internal_comp/4-anonymous-1-',
                                                2},
                                               {compile,fold_comp,3},
                                               {compile,internal_comp,4},
                                               {compile,internal,3}]}
make[3]: *** [mochifmt.beam] Error 1
make[3]: Leaving directory `/home/onabai/Downloads/apache-couchdb-1.2.0/src/mochiweb'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/home/onabai/Downloads/apache-couchdb-1.2.0/src'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/onabai/Downloads/apache-couchdb-1.2.0'
make: *** [all] Error 2
$
  • If you did not install erlang-nox package you get an error when try to run CouchDB. Install it and re-run CouchDB. The error is something like this:
$ sudo couchdb
{"init terminating in do_boot",{{badmatch,{error,{"no such file or directory","inets.app"}}},[{couch,start,0},{init,start_it,1},{init,start_em,1}]}}

Crash dump was written to: erl_crash.dump
init terminating in do_boot ()
$