JSON floats become BigDecimal using JCoachDB…


When converting  types from one representation into another we always know that there are some issues with serialization and deserialization. Of course with JSON is not different!

Type representation representation

Assumptions:

  1. We are using Java for interfacing with CouchDB
  2. We have chosen jcouchdb.
  3. jcouchdb uses svenson for converting from Java into JSON and viceversa.
  4. We have a CouchDB BaseDocument and we want to store an Integer, a Long, a Float and a Double.

Our code is something like:

// Create a document
BaseDocument doc1 = new BaseDocument();
doc1.setId("test_0001");
doc1.setProperty("Integer", new Integer(10));
doc1.setProperty("Long", new Long(10));
doc1.setProperty("Float", new Float(10));
doc1.setProperty("Double", new Double(10));

// Persist it into the database
db.createDocument(doc1);

Lets see what we get when we read that document from the database and print its values.

// Retrieve it
BaseDocument doc2 = db.getDocument(BaseDocument.class, doc1.getId());

// Print values
System.out.println("Values...");
System.out.println("Integer      " +
        doc1.getProperty("Integer") + " / " +
        doc2.getProperty("Integer"));
System.out.println("Long         " +
        doc1.getProperty("Long") + " / " +
        doc2.getProperty("Long"));
System.out.println("Float        " +
        doc1.getProperty("Float") + " / " +
        doc2.getProperty("Float"));
System.out.println("Double       " +
        doc1.getProperty("Double") + " / " +
        doc2.getProperty("Double"));

And the output is…

Values...
Integer      10 / 10
Long         10 / 10
Float        10.0 / 10.0
Double       10.0 / 10.0

Seems fine! But… what happens if we print the types of the arguments…

// Print types
System.out.println("Types...");
System.out.println("Integer      " +
        doc1.getProperty("Integer").getClass().getSimpleName() + " / " +
        doc2.getProperty("Integer").getClass().getSimpleName());
System.out.println("Long         " +
        doc1.getProperty("Long").getClass().getSimpleName() + " / " +
        doc2.getProperty("Long").getClass().getSimpleName());
System.out.println("Float        " +
        doc1.getProperty("Float").getClass().getSimpleName() + " / " +
        doc2.getProperty("Float").getClass().getSimpleName());
System.out.println("Double       " +
        doc1.getProperty("Double").getClass().getSimpleName() + " / " +
        doc2.getProperty("Double").getClass().getSimpleName());

Now we get as output…

Types...
Integer      Integer / Long
Long         Long / Long
Float        Float / BigDecimal
Double       Double / BigDecimal

We see that types have changed integer numbers are always Long while floating point numbers are BigDecimal.

Problems with conversion and precision

The question on this is problems with precision.
The following example creates a BaseDocument with an Integer, Long, Float and Double but now instead of 10 we have a value of 123.456.

// Create a document
BaseDocument doc1 = new BaseDocument();
doc1.setId("test_0002");
doc1.setProperty("Float", new Float(123.456));
doc1.setProperty("Double", new Double(123.456));

// Persist it into the database
db.createDocument(doc1);

// Retrieve it
BaseDocument doc2 = db.getDocument(BaseDocument.class, doc1.getId());

And when we print the value using:

// Print values
System.out.println("Values...");
System.out.println("Float        " +
        doc1.getProperty("Float") + " / " +
        doc2.getProperty("Float"));
System.out.println("Double       " +
        doc1.getProperty("Double") + " / " +
        doc2.getProperty("Double"));

we get:

Values...
Float        123.456 / 123.45600000000000307
Double       123.456 / 123.45600000000000307

It is not such a big difference but if you compare numbers, of course, they will not be the same…
And the problem is with the representation of 123.456 using the different data types.

System.out.println("Float      : " + new Float(123.456));
System.out.println("Double     : " + new Double(123.456));
System.out.println("BigDecimal : " + new BigDecimal(123.456));

shows

Float      : 123.456
Double     : 123.456
BigDecimal : 123.4560000000000030695446184836328029632568359375
Advertisements

CouchDB poll!!!


I’d like to know how people is using CouchDB and I couldn’t find a better way than asking it!

The very first question is if you are actually using it, evaluating it, playing with it, curious about it… Might also be interesting knowing if recent movement of Damien Katz (leaving Apache CouchDB) might affect your decision.

Then if you use it from one of programming language or deploy your application inside CouchDB, or is it a HTML/JavaScript application that uses AJAX for interacting with CouchDB…

Third question is for people using it from Java and wondering about which API / library are you using…

Please, leave any comment or question that you would like to have included in this poll.

CouchDB, jcouchdb and map (in more detail)


I have already written a very first tutorial about jcouchdb and map. But for those used to write SELECT in SQL there a lot of things other than those simple queries.

Nevertheless, you should switch your mind since working with CouchDB map/reduce function will be a completely new paradigm and there are no magic recipes about how to convert SQL tables on CouchDB documents and SELECTS on map/reduce functions.

Example Database and create, update, query Views in CouchDb

First, I will refer you to the Database that I created as example here. Where I inserted some small documents with some few properties and the helper functions for creating / updating / querying views introduced here.

Limiting the number of results in jcouchdb

HelperFunctions.defineView(db, "sortedById",
                           "function(doc) { emit(doc._id, doc); }");
ViewResult result = db.queryView("views/sortedById",
                                 BaseDocument.class,
                                 new Options().limit(100),
                                 null);
// Check that we get 100 results
assertEquals(result.getRows().size(), 100);

In line 1-2 we define a map function that emits the identifier of the document  and the document itself. Then we call queryView (line 3) method asking for a document of type BaseDocument (line 4) and we introduce a new method in Options class called limit that allows you to specify the maximum number of document to retrieve.

Line 8 verifies that we actually get 100 documents.

NOTEViewResult contains a method name getTotalRows that returns the number of row that meet the condition BUT not the ones that have been retrieved.

One additional question is that map returns the results ordered by key (first argument of emit function) so, in the previous example, we got the results ordered by identifier.

Getting results ordered with jcouchdb

Consider the following code that defines a view for getting results ordered by date.

HelperFunctions.defineView(db,
                           "sortedByDate",
                           "function(doc) { emit(doc.date, doc); }");
ViewResult result = db.queryView("views/sortedByDate",
                                               BaseDocument.class,
                                               new Options().limit(100),
                                               null);
List rows = result.getRows();

// Check that we get 100 results
assertEquals(rows.size(), 100);

// Verify that they are actually sorted by date (key)
for (int i = 1; i < rows.size(); i++) {
    assertTrue(((String) rows.get(i - 1).getKey()).compareTo((String) rows.get(i).getKey()) }

As in the previous example we define a view (using our HelperFunctions) and then invoke Database queryView. We limit the number of results to 100 (line 6) and then in line 14-16 we iterate in results comparing that each date compared with previous is actually equal or greater.

NOTE: we can actually do this because we saved dates as YYYY-MM-DD.

Querying in jcouchdb for a key value that is in a set of values

Consider the same map function (sortedByDate) that in previous section and now we are going to use queryViewByKeys instead of queryView that includes an extra argument that is the list of values to match the key that we are looking for.

HelperFunctions.defineView(db, "sortedByDate", "function(doc) { emit(doc.date, doc); }");
List list = Arrays.asList("2012-01-01", "2012-01-02", "2012-01-03");

ViewResult result = db.queryViewByKeys("views/sortedByDate",
                                                     BaseDocument.class,
                                                     list,
                                                     null,
                                                     null);
List<ValueRow> rows = result.getRows();
assertTrue(rows.size() != 0);

// Check that results are actually one of the desired
for (ValueRow row : rows) {
    String date = (String) row.getValue().getProperty("date");
    assertTrue(list.contains(date));
}

In line 2 we define a list of Strings containing those dates that we want to match.

In line 4 we can see the extra argument (the list of keys to match).

Lines 13 through 16 we check that the date on the retrieved documents are actually included in the list.

NOTE: method getRow used in line 14 returns the second argument used in the emit of the map function (the document for this sortedByDate map function).

How do you see it so far?

First experiments with map-reduce


In the previous post I have shown how to create / update / query a map function for creating a view using jcouchdb Java library.

In this one I’m going to show how to use reduce functions.

Create a View in CouchDB

In that  post I created a helper function that allowed me to create / update a view but this function did not define a reduce function. So, I’m going to introduce a slight change for setting reduce function as well as map.

public class HelperFunctions {
    private static final String ViewsPath = "views";
...
    // Define view with map and reduce function
    public static void defineView(Database db,
                                  String name,
                                  String mapFn,
                                  String reduceFn) {
        DesignDocument doc = new DesignDocument(ViewsPath);

        // Check if the documents exists...
        try {
            DesignDocument old = db.getDesignDocument(doc.getId());
            doc.setRevision(old.getRevision());
            doc.setViews(old.getViews());
        } catch (NotFoundException e) {
            // Do nothing, it is enough knowing that it does not exist
        }
        View view = new View();
        view.setMap(mapFn);
        view.setReduce(reduceFn);
        doc.addView(name, view);
        db.createOrUpdateDocument(doc);
    }

    // Define view only with map function
    public static void defineView(Database db,
                                  String name,
                                  String mapFn) {
        defineView(db, name, mapFn, null);
    }
}

I created a new defineView with an extra argument (reduceFn) and moved the code to this implementation. Previous interface is kept and invokes the new with null value for reduceFn argument indicating that no reduce function.

In the previous code we add an extra instruction (line 21) for setting reduce function (view.setReduce(reduceFn)).

And that’s it! Pretty easy, isn’t it?

Using map – reduce for building a histogram

For showing how to use map and reduce, we are going to count the number of documents that we created in the previous post for each date.

The map function is:

function(doc) {
    emit(doc.date, 1);
}

and the reduce function:

function(keys, values) {
    return sum(values);
}

Basically the view emits a pair keyvalue where key is the date and the value is 1 for each date. Then reduce function uses internal sumfunction for counting the values (always 1).

But, we want the counter grouped per date (that’s why we set the key equal to document’s date). This is controlled programmatically with the third argument of queryView (Options).

        // Define map and reduce functions
        HelperFunctions.defineView(db,
                "dateHistogram",
                "function(doc) { emit(doc.date, 1); }",
                "function(keys, values) { return sum(values); }");
        // Execute query and group results by key
        ViewResult<Object> result = db.queryView("views/dateHistogram",
                Object.class,
                new Options().group(true),
                null);
        // Display results
        List<ValueRow<Object>> rows = result.getRows();
        for (ValueRow<Object> row : rows) {
            System.out.printf("%s - %s\n", row.getKey(), row.getValue());
        }

In previous examples of queryViews I didn’t set any option, here I set group to true for grouping results by key, this is simply done with new Options().group(true).

NOTE: queryViews by default executes reduce function, if defined, but you can disable it invoking the method reduce(false) from Options.
The output is:

2012-01-01 - 351
2012-01-02 - 328
2012-01-03 - 339
2012-01-04 - 347
2012-01-05 - 359
2012-01-06 - 311
2012-01-07 - 327
2012-01-08 - 326
2012-01-09 - 356
2012-01-10 - 361
2012-01-11 - 337
2012-01-12 - 320
2012-01-13 - 309
2012-01-14 - 336
2012-01-15 - 334
2012-01-16 - 302
2012-01-17 - 325
2012-01-18 - 297
2012-01-19 - 323
2012-01-20 - 356
2012-01-21 - 347
2012-01-22 - 304
2012-01-23 - 312
2012-01-24 - 332
2012-01-25 - 347
2012-01-26 - 329
2012-01-27 - 332
2012-01-28 - 384
2012-01-29 - 333
2012-01-30 - 336

If we would like to count the number of values that are even and odd, it’s enough defining a map function as:

function(doc) {
    emit(doc.even, 1);
}

The output is:

no - 5000
yes - 5000

First experiments with map function


While in Relational Databases you write select statements for choosing a subset of your original database, in CouchDB  you write map-reduce functions that lets you extract part of the data to work with.

The reason for calling the views as map is because extracted data are pairs {key, value}. Where both key and value can be any JSON data structure. Views are sorted by keys.

Some characteristics of map functions are:

  • Filter documents from your database to let you work with some of them.
  • Extract part of the original document.
  • Sort (filtered) documents.
  • Do calculations on the data of the document (using reduce).
In addition, CouchDB builds indexes on the extracted data making its usage very efficient.

Write documents

Lets start writing a bunch of documents in our Database that then we will use for playing with map and reduce.

        Random random = new Random(System.currentTimeMillis());
        int totalSize = 10000;
        int bulkSize = 1000;

        // Create totalSize documents
        int records = 0;
        while (records < totalSize) {
            // Create a list of documents that are going to be created using 'bulk'
            List docList = new ArrayList();
            Date firstDate = new GregorianCalendar(2012, GregorianCalendar.JANUARY, 1).getTime();
            Calendar calendar = GregorianCalendar.getInstance();
            SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd");

            for (int i = 0; i < bulkSize; i++) {
                int value = (records + i);
                calendar.setTime(firstDate);
                calendar.add(Calendar.DATE, random.nextInt(30));

                BaseDocument doc = new BaseDocument();
                doc.setProperty("name", "name." + value);
                doc.setProperty("value", value);
                doc.setProperty("even", (value % 2) == 0 ? "yes" : "no");
                doc.setProperty("date", sdf.format(calendar.getTime()));
                docList.add(doc);
            }
            List infoList = db.bulkCreateDocuments(docList);
            assertEquals(infoList.size(), docList.size());
            for (DocumentInfo info : infoList) {
                assertNotNull(info.getId());
                assertNotNull(info.getRevision());
            }
            records += bulkSize;
        }

Each document has the following fields:

  • name: The string name plus a dot and a sequence number (example: name.1234).
  • value: The sequence number used in name.
  • even: yes or no depending if the number is even or odd.
  • date: January 1st, 2012 plus a random number of day between 0 and 30 and formatted as yyyy-MM-dd (year with 4 digits, month as 2 digits and day as 2 digits). The reason for choosing this format year-month-day is because then they are sorted and will allow us to choose dates in a range.

In order to check that we actually created bulkSize documents on each iteration, we iterate on the list of DocumentInfo returned by bulkCreateDocuments and use assert to check that they actually have an identifier.

This is a pretty small document but will allow me to show some basics on map-reduce.

Define a view

Views are defined using JavaScript functions where you evaluate if one document has to be emitted (be part of the final view).

In the following example we choose those documents where the value of even field is yes and value is less than 10. The view contains as key the name and value is the identifier of the document.

function(doc) {
    if (doc.even == 'yes' && doc.value < 10) {
        emit (doc.name, doc._id);
    }
}

The resulting displayed using CouchDB Futon is:

emit function has two arguments that are going to be the key and the value of the resulting map.

Keys might be null (useful if you need all results in no special order) or complex structures if you need more than one item (used in views where you want to choose by more than one key).

Create a View in CouchDb

Views are Design Documents and we I have defined a helper function for defining and updating a view.

public class HelperFunctions {
    private static final String ViewsPath = "views";

    public static void defineView(Database db,
                                  String name,
                                  String mapFn) {
        DesignDocument doc = new DesignDocument(ViewsPath);

        // Check if the documents exists...
        try {
            DesignDocument old = db.getDesignDocument(doc.getId());
            doc.setRevision(old.getRevision());
            doc.setViews(old.getViews());
        } catch (NotFoundException e) {
            // Do nothing, it is enough knowing that it does not exist
        }
        View view = new View();
        view.setMap(mapFn);
        doc.addView(name, view);
        db.createOrUpdateDocument(doc);
    }
}

See this for more details on this function.

Basic query view in CouchDB

Pretty simple…

        ViewResult result = db.queryView(map, BaseDocument.class, null, null);

We just need two arguments, the name of the map function and the class of the output documents (here we used BaseDocument). The result is a ViewResult<BaseDocument> than includes the list of BaseDocuments emitted by the map function.

        // Define a view called lessThan10
        // that emits as key a field named value and as value the id of the document
        HelperFunctions.defineView(db, "lessThan10",
                "function(doc) { if (doc.value < 10) emit(doc.value, doc._id); }");
        // Query view with no options and not keys
        ViewResult result = db.queryView("views/lessThan10", BaseDocument.class, null, null);

        // Check that we get 10 results as expected (see DB creation above)
        assertEquals(result.getRows().size(), 10);

        // Display results
        List<ValueRow> rows = result.getRows();
        for (ValueRow row : rows) {
            System.out.printf("%3d - %s\n", ((Long) row.getKey()).intValue(), row.getValue());
        }

This code displays:

  0 - b4956242da33280c47337b61254c7d68
  1 - b4956242da33280c47337b61254c839a
  2 - b4956242da33280c47337b61254c8c88
  3 - b4956242da33280c47337b61254c8eed
  4 - b4956242da33280c47337b61254c9e1b
  5 - b4956242da33280c47337b61254ca6c5
  6 - b4956242da33280c47337b61254cb10e
  7 - b4956242da33280c47337b61254cb14e
  8 - b4956242da33280c47337b61254cc069
  9 - b4956242da33280c47337b61254cce23

NOTE: We have being able to cast the key of the document to a Long since we know from our map function that that’s the case.

Design Documents


Design documents are a special CouchDB document. This documents do not contain data but functions.

These documents have an identifier starting with _design/. Some examples of design documents are:

  • Views a.k.a. map / reduce functions.
  • Validation functions,
  • Show functions,
  • and List transforms.

Other than the fact that these functions are executed inside CouchDB, they are stored in a very similar way to other documents: they have identifiers and revisions and these need to be managed properly.

Storing Design Documents.

In the following piece of code, I am going to show how to store and update a view.

    DesignDocument doc = new DesignDocument("my_views");
    String mapFn = "function(doc) { if (doc.even == 'yes' &&  doc.value < 10) { emit(doc.name, doc._id); }";
    // Check if the documents exists...
    try {
        DesignDocument old = db.getDesignDocument(doc.getId());
        doc.setRevision(old.getRevision());
        doc.setViews(old.getViews());
    } catch (NotFoundException e) {
        // Do nothing, it is enough knowing that it does not exist
    }
    View view = new View();
    view.setMap(mapFn);
    doc.addView(name, view);
    db.createOrUpdateDocument(doc);

First thing to note, is that views are stored with an identifier equal to _design/XXX where XXX is something that you choose. In my case I define it equal to my_views so, I create a new DesignDocument with my_views as argument.

Then, it is important to note that since design documents have identifiers and revision, we should check if the design document already exists before trying to add a new view. We do it directly trying to get the document with identifier _design/my_views, if it does not exist we get an exception (but that’s ok since I catch it!). If already exists a document with the same identifier, we copy the revision (doc.setRevision(old.getRevision())) and copy already existing views (doc.setViews(old.getViews())).

Finally, we create a new View object, assign the map function that is defined as a String and then add this view to the current document (addView(name, view)).

Last sentence (createOrUpdateDocument(doc)) is the same shown in previous posts.

NOTE: You might store multiple views with the same identifier. They are stored in the design document inside views field with the name that you provide as argument in addView call. In the following capture we can see that we have two views stored under the same identifier _design/my_views (they are part of the same document): the first lists documents with a value even and less than 10 while the second view lists all that are less than 10 (even and odd).

 

Working with attachments


Attachments in CouchDB are like attachments in an e-mail message.

Attachments might be considered as files with a content-type (its MIME type) and might be of any type: images (png, jpeg, gif, svg…), plain text, pdf, MS/Word, Excel…

Attachments have special treatment and not as “regular” properties of the document since they might be updated and retrieved but they cannot be queried: (so far) does not make sense query for images containing people smiling or music containing a C# minor played by a Stradivarius.

The easiest way for creating an attachment is create a document as I posted in here and then invoke createAttachment method from Database class.

Creating an attachment from a file

This example starts creating a BaseDocument with two properties called site and leitmotif and then creates an attachment called logo of type image/svg+xml. The method createAttachment has two interfaces, here we use one with an InputStream containing the attachment and its size as arguments.

// Create a BaseDocument
BaseDocument doc = new BaseDocument();
doc.setProperty("site", "www.onabai.com");
doc.setProperty("leitmotif", "Working on a cloud of documents!");
db.createDocument(doc);
System.out.printf("Id: %s rev.: %s\n", doc.getId(), doc.getRevision());

// Add attachment
File file = new File("resources/jcouchdb.svg");
InputStream is = new FileInputStream(file);
String rev = db.createAttachment(doc.getId(), doc.getRevision(),
                                 "logo", "image/svg+xml",
                                 is, file.length());
System.out.printf("New rev.: %s\n", rev);

The value returned by createAttachment is the new revision of the Document and the console output of the previous code is:

Id: 7f18cfd54dfa7372d0a7f79ae9002082 rev.: 1-f7d6c9a930283da1916cd9623bde99ab 
New rev.: 2-aecba701f8b643b71d28c50f83414705

an if we go to CouchDB Futton interface we will see:

Screen Shot 2012-05-08 at 10.57.23

and the (last version of the) logo can be browsed in /db/document_id/attachment_name, in our case: /onabai/7f18cfd54dfa7372d0a7f79ae9002082/logo.

Screen Shot 2012-05-08 at 11.16.31

Attachment storage in CouchDB

For a better understanding of what CouchDB Futton is displaying and how attachments work, it is interesting to get a little on details on how attachments are saved in CouchDB…

We can see the information of  the attachment that we have created (logo) is inside a JSON field called _attacments. For each attachment, we have one entry in _attachments containing the following fields:

  • content_type: the MIME type of the corresponding attachment.
  • revpos: revision (not the complete revision of the document but just the sequential part of therevision).
  • length: length / size of the attachment.
  • stub: denotes that this is not the complete attachment.

Creating an attachment from a byte[]

But attachments (as I said before) might be any MIME type… Here I create one of type plain/textand use the second interface of createAttachment where the attachment is sent in an argument as a byte[].

// Add another attachment
String quote = "It's all about the cloud. The fact of being always connected opens a whole new universe of possibilities. But whenever you design software for the cloud don't forget the sunny days where clouds are not visible.";
rev = db.createAttachment(doc.getId(), rev,
                          "quote", "text/plain", 
                          quote.getBytes());
System.out.printf("Newer rev.: %s\n", rev);

And the output is:

Newer rev.: 3-08fb27a94de3081c6b61d9171c0aa077

where we can see the new revision and in CouchDB Futton we get:

Screen Shot 2012-05-08 at 11.21.11

Where we have two attachments: quote and logo. Pay attention to logo revpos that is 2 indicating the revision on which it has been created / updated, while logo revpos is in 3 since this was the version of the document when it was created / updated.

Updating an attachment

Lets change the logo image attached to our document.

// Update logo attachment
file = new File("resources/onabai-logo.gif");
is = new FileInputStream(file);
rev = db.updateAttachment(doc.getId(), rev,
                          "logo", "image/gif",
                          is, file.length());
System.out.printf("New Logo rev.: %s\n", rev);

The output…

New Logo rev.: 4-a9c0fa53220eddd27b0b6d740219b961

and CouchDB Futton…

Screen Shot 2012-05-08 at 11.32.37

Showing that logo revpos is now 4 while quote revpos is still 3.

The logo in the browser is

Screen Shot 2012-05-08 at 11.34.42

where we can see the new attachment. But since CouchDB is multi version, we can still get previous version in /db/document_id/attachment_name?rev=revision

If we update the document as in this code:

// Retrieve the document with the desired id
BaseDocument doc = db.getDocument(BaseDocument.class, 
                                  "7f18cfd54dfa7372d0a7f79ae9002082");
// Add extra property
doc.setProperty("author", "OnaBai");
db.updateDocument(doc);
System.out.printf("New Document rev.: %s\n", doc.getRevision());

and CouchDB Futton displays the document with the additional property author.

Screen Shot 2012-05-08 at 11.48.00

IMPORTANT: Do not forget to read the document before updating it otherwise you will remove the content of it. CouchDB document storage is multi version but the content of an updated document is not merged with previous content.

This code:

// Retrieve the document with the desired id 
BaseDocument doc = new BaseDocument(); 
doc.setId("7f18cfd54dfa7372d0a7f79ae9002082"); 
doc.setRevision("5-e613b877d4ce0382104c5cf1c1f078fa"); 
// Assign value to property 
doc.setProperty("author", "OnaBai"); 
db.updateDocument(doc); 
System.out.printf("New Document rev.: %s\n", doc.getRevision());

makes disappear all properties but author.

Screen Shot 2012-05-08 at 11.51.06