(For more resources related to this topic, see here.)
Comparison
Choosing a technology does not merely involve a technical comparison. Several other factors related to documentation, maintainability, stability and maturity, vendor support, developer community, license, price, and the future of the product or the organization behind it also play important roles. Having said that, I must also add that technical comparison should continue to play a pivotal role.
We will start a deep technical comparison of the previously mentioned products and then look at the semi-technical and non-technical aspects for the same.
Technical comparison
From a technical perspective, we compare on the following parameters:
-
Implementation language
-
Engine types
-
Speed
Implementation language
One of the more important factors that come into play is how can, if required, the product be extended; the programming language in which the product itself is written determines a large part of it. Some of the database may provide a different language for writing plugins but it may not always be true:
-
Amazon SimpleDB: It is available in cloud and has a client SDK for Java, .NET, PHP, and Ruby. There are libraries for Android and iOS as well.
-
BaseX: Written in Java. To extend, one must code in Java.
-
Cassandra: Everything in Java.
-
CouchDB: Written in Erlang. To extend use Erlang.
-
Google Datastore: It is available in cloud and has SDK for Java, Python, and Go.
-
HBase: It is Java all the way.
-
MemcacheDB: Written in C. Uses the same language to extend.
-
MongoDB: Written in C++. Client drivers are available in several languages including but not limited to JavaScript, Java, PHP, Python, and Ruby.
-
Neo4j: Like several others, it is Java all the way
-
Redis: Written in C. So you can extend using C.
Great, so the first parameter itself may have helped you shortlist the products that you may be interested to use based on the developers available in your team or for hire. You may still be tempted to get smart people onboard and then build competency based on the choice that you make, based on subsequent dimensions.
Note that for the databases written in high-level languages like Java, it may still be possible to write extensions in languages like C or C++ by using interfaces like JNI or otherwise.
Amazon SimpleDB provides access via the HTTP protocol and has SDK in multiple languages. If you do not find an SDK for yourself, say for example, in JavaScript for use with NodeJS, just write one.
However, life is not open with Google Datastore that allows access only via its cloud platform App Engine and has SDKs only in Java, Python, and the Go languages. Since the access is provided natively from the cloud servers, you cannot do much about it. In fact, the top requested feature of the Google App Engine is support for PHP ( See http://code.google.com/p/googleappengine/issues/list).
Engine types
Engine types define how you will structure the data and what data design expertise your team will need. NoSQL provides multiple options to choose from.
Database |
Column oriented |
Document store |
Key value store |
Graph |
Amazon SimpleDB |
No |
No |
Yes |
No |
BaseX |
No |
Yes |
No |
No |
Cassandra |
Yes |
Yes |
No |
No |
CouchDB |
No |
Yes |
No |
No |
Google Datastore |
Yes |
No |
No |
No |
HBase |
Yes |
No |
No |
No |
MemcacheDB |
No |
No |
Yes |
No |
MongoDB |
No |
Yes |
No |
No |
Neo4j |
No |
No |
No |
Yes |
Redis |
No |
Yes |
Yes |
No |
You may notice two aspects of this table – a lot of No and multiple Yes against some databases. I expect the table to be populated with a lot more Yes over the next couple of years. Specifically, I expect the open source databases written in Java to be developed and enhanced actively providing multiple options to the developers.
Speed
One of the primary reasons for choosing a NoSQL solution is speed. Comparing and benchmarking the databases is a non-trivial task considering that each database has its own set of hardware and other configuration requirements. Having said that, you can definitely find a whole gambit of benchmark results comparing one NoSQL database against the other with details of how the tests were executed.
Of all that is available, my personal choice is the Yahoo! Cloud Serving Benchmark (YCSB) tool. It is open source and available on Github at https://github.com/brianfrankcooper/YCSB. It is written in Java and clients are available for Cassandra, DynamoDB, HBase, HyperTable, MongoDB, Redis apart from several others that we have not discuss in this book.
Before showing some results from the YCSB, I did a quick run on a couple of easy-to-set-up databases myself. I executed them without any optimizations to just get a feel of how easy it is for software to incorporate it without needing any expert help.
I ran it on MongoDB on my personal box (server as well as the client on the same machine), DynamoDB connecting from a High-CPU Medium (c1.medium) box, and MySQL on the same High-CPU Medium box with both server and client on the same machine. Detailed configurations with the results are shown as follows:
Server configuration:
Parameter |
MongoDB |
DynamoDB |
MySQL |
Processor |
5 EC2 Compute Units |
N/A |
5 EC2 Compute Units |
RAM |
1.7 GB with Apache HTTP server running (effective free: 200 MB, after database is up and running) |
N/A |
1.7GB with Apache HTTP server running (effective free: 500MB, after database is up and running) |
Hard disk |
Non-SSD |
N/A |
Non-SSD |
Network configuration |
N/A |
US-East-1 |
N/A |
Operating system |
Ubuntu 10.04, 64 bit |
N/A |
Ubuntu 10.04, 64 bit |
Database version |
1.2.2 |
N/A |
5.1.41 |
Configuration |
Default |
Max write: 500, Max read: 500 |
Default |
Client configuration:
Parameter |
MongoDB |
DynamoDB |
MySQL |
Processor |
5 EC2 Compute Units |
5 EC2 Compute Units |
5 EC2 Compute Units |
RAM |
1.7GB with Apache HTTP server running (effective free: 200MB, after database is up and running) |
1.7GB with Apache HTTP server running (effective free: 500MB, after database is up and running) |
1.7GB with Apache HTTP server running (effective free: 500MB after database is up and running) |
Hard disk |
Non-SSD |
Non-SSD |
Non-SSD |
Network configuration |
Same Machine as server |
US-East-1 |
Same Machine as server |
Operating system |
Ubuntu 10.04, 64 bit |
Ubuntu 10.04, 64 bit |
Ubuntu 10.04, 64 bit |
Record count |
1,000,000 |
1,000 |
1,000,000 |
Max connections |
1 |
5 |
1 |
Operation count (workload a) |
1,000,000 |
1,000 |
1,000,000 |
Operation count (workload f) |
1,000,000 |
100,000 |
1,000,000 |
Results:
Workload |
Parameter |
MongoDB |
DynamoDB |
MySQL |
Workload-a (load) |
Total time |
290 seconds |
16 seconds |
300 seconds |
|
Speed (operations/second) |
2363 to 4180 (approximately 3700) Bump at 1278 |
50 to 82 (operations/second) |
3135 to 3517 (approximately 3300) |
|
Insert latency |
245 to 416 microseconds (approximately 260) Bump at 875 microseconds |
12 to 19 milliseconds |
275 to 300 microseconds (approximately 290) |
Workload-a (run) |
Total time |
428 seconds |
17 seconds |
240 seconds |
|
Speed |
324 to 4653 |
42 to 78 |
3970 to 4212 |
|
Update latency |
272 to 2946 microseconds |
13 to 23.7 microseconds |
219 to 225.5 microseconds |
|
Read latency |
112 to 5358 microseconds |
12.4 to 22.48 microseconds |
240.6 to 248.9 microseconds |
Workload-f (load) |
Total time |
286 seconds |
Did not execute |
295 seconds |
|
Speed |
3708 to 4200 |
|
3254 to 3529 |
|
Insert latency |
228 to 265 microseconds |
|
275 to 299 microseconds |
Workload-f (run) |
Total time |
412 seconds |
Did not execute |
1022 seconds |
|
Speed |
192 to 4146 |
|
224 to 2096 |
|
Update latency |
219 to 336 microseconds |
|
216 to 233 microseconds, with two bursts at 600 and 2303 microseconds |
|
Read latency |
119 to 5701 microseconds |
|
1360 to 8246 microseconds |
|
Read Modify Write (RMW) latency |
346 to 9170 microseconds |
|
1417 to 14648 microseconds |
Do not read too much into these numbers as they are a result of the default configuration, out-of-the-box setup without any optimizations.
Some of the results from YCSB published by Brian F. Cooper (http://www.brianfrankcooper.net/pubs/ycsb-v4.pdf) are shown next.
For update-heavy, 50-50 read-update:
For read-heavy, under varying hardware:
There are some more from Sergey Sverchkov at Altoros (http://altoros.com/nosql-research) who published their white paper recently.
Summary
In this article, we did a detailed comparative study of ten NoSQL databases on few parameters, both technical and non-technical.
Resources for Article :
Further resources on this subject:
- Getting Started with CouchDB and Futon [Article]
- Ruby with MongoDB for Web Development [Article]
- An Introduction to Rhomobile [Article]