10 min read

(For more resources related to this topic, see here.)

What is CouchDB?

The first sentence of CouchDB’s definition (as defined by http://couchdb.apache.org/) is as follows:

CouchDB is a document database server, accessible through the RESTful JSON API.

Let’s dissect this sentence to fully understand what it means. Let’s start with the term database server.

Database server

CouchDB employs a document-oriented database management system that serves a flat collection of documents with no schema, grouping, or hierarchy. This is a concept that NoSQL has introduced, and is a big departure from relational databases (such as MySQL), where you would expect to see tables, relationships, and foreign keys. Every developer has experienced a project where they have had to force a relational database schema into a project that really didn’t require the rigidity of tables and complex relationships. This is where CouchDB does things differently; it stores all of the data in a self-contained object with no set schema. The following diagram will help to illustrate this:

In order to handle the ability for many users to belong to one-to-many groups in a relational database (such as MySQL), we would create a users table, a groups table, and a link table, called users_groups. This practice is common to most web applications.

Now look at the CouchDB documents. There are no tables or link tables, just documents. These documents contain all of the data pertaining to a single object.

This diagram is very simplified. If we wanted to create more logic around the groups in CouchDB, we would have had to create group documents, with a simple relationship between the user documents and group documents.

Let’s dig into what documents are and how CouchDB uses them.

Documents

To illustrate how you might use documents, first imagine that you are physically filling out the paper form of a job application. This form has information about you, your address, and past addresses. It also has information about many of your past jobs, education, certifications, and much more. A document would save all of this data exactly in the way you would see it in the physical form – all in one place, without any unnecessary complexity.

In CouchDB, documents are stored as JSON objects that contain key and value pairs. Each document has reserved fields for metadata such as id, revision, and deleted. Besides the reserved fields, documents are 100 percent schema-less, meaning that each document can be formatted and treated independently with as many different variations as you might need.

Example of a CouchDB document

Let’s take a look at an example of what a CouchDB document might look like for a blog post:

{ "_id": "431f956fa44b3629ba924eab05000553", "_rev": "1-c46916a8efe63fb8fec6d097007bd1c6", "title": "Why I like Chicken", "author": "Tim Juravich", "tags": [ "Chicken", "Grilled", "Tasty" ], "body": "I like chicken, especially when it's grilled." }

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

JSON format

The first thing you might notice is the strange markup of the document, which is JavaScript Object Notation (JSON). JSON is a lightweight data-interchange format based on JavaScript syntax and is extremely portable. CouchDB uses JSON for all communication with it.

Key-value storage

The next thing that you might notice is that there is a lot of information in this document. There are key-value pairs that are simple to understand, such as “title”, “author”, and “body”, but you’ll also notice that “tags” is an array of strings. CouchDB lets you embed as much information as you want directly into a document. This is a concept that might be new to relational database users who are used to normalized and structured databases.

Reserved fields

Let’s look at the two reserved fields: _id and _rev.

_id is the unique identifier of the document. This means that _id is mandatory, and no two documents can have the same value. If you don’t define an _id on creation of a document, CouchDB will choose a unique one for you.

_rev is the revision version of the document and is the field that helps drive CouchDB’s version control system. Each time you save a document, the revision number is required so that CouchDB knows which version of the document is the newest. This is required because CouchDB does not use a locking mechanism, meaning that if two people are updating a document at the same time, then the first one to save his/her changes first, wins. One of the unique things about CouchDB’s revision system is that each time a document is saved, the original document is not overwritten, and a new document is created with the new data, while CouchDB stores a backup of the previous documents in its original form in an archive. Old revisions remain available until the database is compacted, or some cleanup action occurs.

The last piece of the definition sentence is the RESTful JSON API. So, let’s cover that next.

RESTful JSON API

In order to understand REST, let’s first define HyperText Transfer Protocol (HTTP ). HTTP is the underlying protocol of the Internet that defines how messages are formatted and transmitted and how services should respond when using a variety of methods. These methods consist of four main verbs, such as GET, PUT, POST, and DELETE. In order to fully understand how HTTP methods function, let’s first define REST.

Representation State Transfer (REST ) is a stateless protocol that accesses addressable resources through HTTP methods. Stateless means that each request contains all of the information necessary to completely understand and use the data in the request, and addressable resources means that you can access the object via a URL.

That might not mean a lot in itself, but, by putting all of these ideas together, it becomes a powerful concept. Let’s illustrate the power of REST by looking at two examples:

Resource

GET

PUT

POST

DELETE

http://localhost/collection

Read a list of all of the items inside of collection

Update the Collection with another collection

Create a new collection

Delete the collection

http://localhost/collection/abc123

Read the details of the abc123 item inside of collection

Update the details of abc123 inside of collection

Create a new object abc123 inside of a collection

Delete abc123 from collection

By looking at the table, you can see that each resource is in the form of a URL. The first resource is collection, and the second resource is abc123, which lives inside of collection. Each of these resources responds differently when you pass different methods to them. This is the beauty of REST and HTTP working together.

Notice the bold words I used in the table: Read, Update, Create, and Delete. These words are actually, in themselves, another concept, and it, of course, has its own term; CRUD. The unflattering term CRUD stands for Create, Read, Update, and Delete and is a concept that REST uses to define what happens to a defined resource when an HTTP method is combined with a resource in the form of a URL. So, if you were to boil all of this down, you would come to the following diagram:

This diagram means:

  1. In order to CREATE a resource, you can use either the POST or PUT method
  2. In order READ a resource, you need to use the GET method
  3. In order to UPDATE a resource, you need to use the PUT method
  4. In order to DELETE a resource, you need to use the DELETE method

As you can see, this concept of CRUD makes it really clear to find out what method you need to use when you want to perform a specific action.

Now that we’ve looked at what REST means, let’s move onto the term API , which means Application Programming Interface. While there are a lot of different use cases and concepts of APIs, an API is what we’ll use to programmatically interact with CouchDB.

Now that we have defined all of the terms, the RESTful JSON API could be defined as follows: we have the ability to interact with CouchDB by issuing an HTTP request to the CouchDB API with a defined resource, HTTP method, and any additional data. Combining all of these things means that we are using REST. After CouchDB processes our REST request, it will return with a JSON-formatted response with the result of the request.

All of this background knowledge will start to make sense as we play with CouchDB’s RESTful JSON API, by going through each of the HTTP methods, one at a time.

We will use curl to explore each of the HTTP methods by issuing raw HTTP requests.

Time for action – getting a list of all databases in CouchDB

Let’s issue a GET request to access CouchDB and get a list of all of the databases on the server.

  1. Run the following command in Terminal

    curl -X GET http://localhost:5984/_all_dbs

  2. Terminal will respond with the following:

    ["_users"]

What just happened?

We used Terminal to trigger a GET request to CouchDB’s RESTful JSON API. We used one of the options: -X, of curl, to define the HTTP method. In this instance, we used GET. GET is the default method, so technically you could omit -X if you wanted to. Once CouchDB processes the request, it sends back a list of the databases that are in the CouchDB server. Currently, there is only the _users database, which is a default database that CouchDB uses to authenticate users.

Time for action – creating new databases in CouchDB

In this exercise, we’ll issue a PUT request , which will create a new database in CouchDB.

  1. Create a new database by running the following command in Terminal:

    curl -X PUT http://localhost:5984/test-db

  2. Terminal will respond with the following:

    {"ok":true}

  3. Try creating another database with the same name by running the following command in Terminal:

    curl -X PUT http://localhost:5984/test-db

  4. Terminal will respond with the following:

    {"error":"file_exists","reason":"The database could not be
    created, the file already exists."}

  5. Okay, that didn’t work. So let’s to try to create a database with a different name by running the following command in Terminal:

    curl -X PUT http://localhost:5984/another-db

  6. Terminal will respond with the following:

    {"ok":true}

  7. Let’s check the details of the test-db database quickly and see more detailed information about it. To do that, run the following command in Terminal:

    curl -X GET http://localhost:5984/test-db

  8. Terminal will respond with something similar to this (I re-formatted mine for readability):

    {
    "committed_update_seq": 1,
    "compact_running": false,
    "db_name": "test-db",
    "disk_format_version": 5,
    "disk_size": 4182,
    "doc_count": 0,
    "doc_del_count": 0,
    "instance_start_time": "1308863484343052",
    "purge_seq": 0,
    "update_seq": 1
    }

What just happened?

We just used Terminal to trigger a PUT method to the created databases through CouchDB’s RESTful JSON API, by passing test-db as the name of the database that we wanted to create at the end of the CouchDB root URL. When the database was successfully created, we received a message that everything went okay.

Next, we created a PUT request to create another database with the same name test-db. Because there can’t be more than one database with the same name, we received an error message

We then used a PUT request to create a new database again, named another-db. When the database was successfully created, we received a message that everything went okay.

Finally, we issued a GET request to our test-db database to find out more information on the database. It’s not important to know exactly what each of these statistics mean, but it’s a useful way to get an overview of a database.

It’s worth noting that the URL that was called in the final GET request was the same URL we called when we first created the database. The only difference is that we changed the HTTP method from PUT to GET. This is REST in action!

LEAVE A REPLY

Please enter your comment!
Please enter your name here