6 min read

(For more resources related to this topic, see here.)

Comparison

Choosing a technology does not merely involve a technical comparison. Several other factors related to documentation, maintainability, stability and maturity, vendor support, developer community, license, price, and the future of the product or the organization behind it also play important roles. Having said that, I must also add that technical comparison should continue to play a pivotal role.

We will start a deep technical comparison of the previously mentioned products and then look at the semi-technical and non-technical aspects for the same.

Technical comparison

From a technical perspective, we compare on the following parameters:

  • Implementation language

  • Engine types

  • Speed

Implementation language

One of the more important factors that come into play is how can, if required, the product be extended; the programming language in which the product itself is written determines a large part of it. Some of the database may provide a different language for writing plugins but it may not always be true:

  • Amazon SimpleDB: It is available in cloud and has a client SDK for Java, .NET, PHP, and Ruby. There are libraries for Android and iOS as well.

  • BaseX: Written in Java. To extend, one must code in Java.

  • Cassandra: Everything in Java.

  • CouchDB: Written in Erlang. To extend use Erlang.

  • Google Datastore: It is available in cloud and has SDK for Java, Python, and Go.

  • HBase: It is Java all the way.

  • MemcacheDB: Written in C. Uses the same language to extend.

  • MongoDB: Written in C++. Client drivers are available in several languages including but not limited to JavaScript, Java, PHP, Python, and Ruby.

  • Neo4j: Like several others, it is Java all the way

  • Redis: Written in C. So you can extend using C.

Great, so the first parameter itself may have helped you shortlist the products that you may be interested to use based on the developers available in your team or for hire. You may still be tempted to get smart people onboard and then build competency based on the choice that you make, based on subsequent dimensions.

Note that for the databases written in high-level languages like Java, it may still be possible to write extensions in languages like C or C++ by using interfaces like JNI or otherwise.

Amazon SimpleDB provides access via the HTTP protocol and has SDK in multiple languages. If you do not find an SDK for yourself, say for example, in JavaScript for use with NodeJS, just write one.

However, life is not open with Google Datastore that allows access only via its cloud platform App Engine and has SDKs only in Java, Python, and the Go languages. Since the access is provided natively from the cloud servers, you cannot do much about it. In fact, the top requested feature of the Google App Engine is support for PHP ( See http://code.google.com/p/googleappengine/issues/list).

Engine types

Engine types define how you will structure the data and what data design expertise your team will need. NoSQL provides multiple options to choose from.

Database

Column oriented

Document store

Key value store

Graph

Amazon SimpleDB

No

No

Yes

No

BaseX

No

Yes

No

No

Cassandra

Yes

Yes

No

No

CouchDB

No

Yes

No

No

Google Datastore

Yes

No

No

No

HBase

Yes

No

No

No

MemcacheDB

No

No

Yes

No

MongoDB

No

Yes

No

No

Neo4j

No

No

No

Yes

Redis

No

Yes

Yes

No

You may notice two aspects of this table – a lot of No and multiple Yes against some databases. I expect the table to be populated with a lot more Yes over the next couple of years. Specifically, I expect the open source databases written in Java to be developed and enhanced actively providing multiple options to the developers.

Speed

One of the primary reasons for choosing a NoSQL solution is speed. Comparing and benchmarking the databases is a non-trivial task considering that each database has its own set of hardware and other configuration requirements. Having said that, you can definitely find a whole gambit of benchmark results comparing one NoSQL database against the other with details of how the tests were executed.

Of all that is available, my personal choice is the Yahoo! Cloud Serving Benchmark (YCSB) tool. It is open source and available on Github at https://github.com/brianfrankcooper/YCSB. It is written in Java and clients are available for Cassandra, DynamoDB, HBase, HyperTable, MongoDB, Redis apart from several others that we have not discuss in this book.

Before showing some results from the YCSB, I did a quick run on a couple of easy-to-set-up databases myself. I executed them without any optimizations to just get a feel of how easy it is for software to incorporate it without needing any expert help.

I ran it on MongoDB on my personal box (server as well as the client on the same machine), DynamoDB connecting from a High-CPU Medium (c1.medium) box, and MySQL on the same High-CPU Medium box with both server and client on the same machine. Detailed configurations with the results are shown as follows:

Server configuration:

Parameter

MongoDB

DynamoDB

MySQL

Processor

5 EC2 Compute Units

N/A

5 EC2 Compute Units

RAM

1.7 GB with Apache HTTP server running (effective free: 200 MB, after database is up and running)

N/A

1.7GB with Apache HTTP server running (effective free: 500MB, after database is up and running)

Hard disk

Non-SSD

N/A

Non-SSD

Network configuration

N/A

US-East-1

N/A

Operating system

Ubuntu 10.04, 64 bit

N/A

Ubuntu 10.04, 64 bit

Database version

1.2.2

N/A

5.1.41

Configuration

Default

Max write: 500,

Max read: 500

Default

Client configuration:

Parameter

MongoDB

DynamoDB

MySQL

Processor

5 EC2 Compute Units

5 EC2 Compute Units

5 EC2 Compute Units

RAM

1.7GB with Apache HTTP server running (effective free: 200MB, after database is up and running)

1.7GB with Apache HTTP server running (effective free: 500MB, after database is up and running)

1.7GB with Apache HTTP server running (effective free: 500MB after database is up and running)

Hard disk

Non-SSD

Non-SSD

Non-SSD

Network configuration

Same Machine as server

US-East-1

Same Machine as server

Operating system

Ubuntu 10.04, 64 bit

Ubuntu 10.04, 64 bit

Ubuntu 10.04, 64 bit

Record count

1,000,000

1,000

1,000,000

Max connections

1

5

1

Operation count (workload a)

1,000,000

1,000

1,000,000

Operation count (workload f)

1,000,000

100,000

1,000,000

Results:

Workload

Parameter

MongoDB

DynamoDB

MySQL

Workload-a (load)

Total time

290 seconds

16 seconds

300 seconds

 

Speed (operations/second)

2363 to 4180

(approximately 3700)

Bump at 1278

50 to 82 (operations/second)

3135 to 3517 (approximately 3300)

 

Insert latency

245 to 416 microseconds

(approximately 260)

Bump at 875 microseconds

12 to 19 milliseconds

275 to 300 microseconds

(approximately 290)

Workload-a (run)

Total time

428 seconds

17 seconds

240 seconds

 

Speed

324 to 4653

42 to 78

3970 to 4212

 

Update latency

272 to 2946 microseconds

13 to 23.7 microseconds

219 to 225.5 microseconds

 

Read latency

112 to 5358 microseconds

12.4 to 22.48 microseconds

240.6 to 248.9 microseconds

Workload-f (load)

Total time

286 seconds

Did not execute

295 seconds

 

Speed

3708 to 4200

 

3254 to 3529

 

Insert latency

228 to 265 microseconds

 

275 to 299 microseconds

Workload-f (run)

Total time

412 seconds

Did not execute

1022 seconds

 

Speed

192 to 4146

 

224 to 2096

 

Update latency

219 to 336 microseconds

 

216 to 233 microseconds, with two bursts at 600 and 2303 microseconds

 

Read latency

119 to 5701 microseconds

 

1360 to 8246 microseconds

 

Read Modify Write (RMW) latency

346 to 9170 microseconds

 

1417 to 14648 microseconds

Do not read too much into these numbers as they are a result of the default configuration, out-of-the-box setup without any optimizations.

Some of the results from YCSB published by Brian F. Cooper (http://www.brianfrankcooper.net/pubs/ycsb-v4.pdf) are shown next.

For update-heavy, 50-50 read-update:

For read-heavy, under varying hardware:

There are some more from Sergey Sverchkov at Altoros (http://altoros.com/nosql-research) who published their white paper recently.

Summary

In this article, we did a detailed comparative study of ten NoSQL databases on few parameters, both technical and non-technical.

Resources for Article :


Further resources on this subject:


 

LEAVE A REPLY

Please enter your comment!
Please enter your name here