(For more resources related to this topic, see here.)

What is a document?

While it may vary for various implementations of different Document Oriented Databases available, as far as MongoDB is concerned it is a BSON document, which stands for Binary JSON. JSON (JavaScript Object Notation) is an open standard developed for human readable data exchange. Though a thorough knowledge of JSON is not really important to understand MongoDB, for keen readers the URL to its RFC is http://tools.ietf.org/html/rfc4627. Also, the BSON specification can be found at http://bsonspec.org/. Since MongoDB stores the data as BSON documents, it is a Document Oriented Database.

What does a document look like?

Consider the following example where we represent a person using JSON:

{
"firstName":"Jack",
"secondName":"Jones",
"age":30,
"phoneNumbers":[
{fixedLine:"1234"},
{mobile:"5678"}
],
"residentialAddress":{
lineOne:"…",
lineTwo:"…",
city:"…",
state:"…",
zip:"…",
country:"…"
}
}

As we can see, a JSON document always starts and ends with curly braces and has all the content within these braces. Multiple fields and values are separated by commas, with a field name always being a string value and the value being of any type ranging from string, numbers, date, array, another JSON document, and so on. For example in "firstName":"Jack", the firstName is the name of the field whereas Jack is the value of the field.

Need for MongoDB

Many of you would probably be wondering why we need another database when we already have good old relational databases. We will try to see a few drivers from its introduction back in 2009.

Relational databases are extremely rich in features. But these features don't come for free; there is a price to pay and it is done by compromising on the scalability and flexibility. Let us see these one by one.

Scalability

It is a factor used to measure the ease with which a system can accommodate the growing amount of work or data. There are two ways in which you can scale your system: scale up, also known as scale vertically or scale out, also known as scale horizontally. Vertical scalability can simply be put up as an approach where we say "Need more processing capabilities? Upgrade to a bigger machine with more cores and memory". Unfortunately, with this approach we hit a wall as it is expensive and technically we cannot upgrade the hardware beyond a certain level. You are then left with an option to optimize your application, which might not be a very feasible approach for some systems which are running in production for years.

On the other hand, Horizontal scalability can be described as an approach where we say "Need more processing capabilities? Simple, just add more servers and multiply the processing capabilities". Theoretically this approach gives us unlimited processing power but we have more challenges in practice. For many machines to work together, there would be a communication overhead between them and the probability of any one of these machines being down at a given point of time is much higher.

MongoDB enables us to scale horizontally easily, and at the same time addresses the problems related to scaling horizontally to a great extent. The end result is that it is very easy to scale MongoDB with increasing data as compared to relational databases.

Ease of development

MongoDB doesn't have the concept of creation of schema as we have in relational databases. The document that we just saw can have an arbitrary structure when we store them in the database. This feature makes it very easy for us to model and store relatively unstructured/ complex data, which becomes difficult to model in a relational database. For example, product catalogues of an e-commerce application containing various items and each having different attributes. Also, it is more natural to use JSON in application development than tables from relational world.

Ok, it looks good, but what is the catch? Where not to use MongoDB?

To achieve the goal of letting MongoDB scale out easily, it had to do away with features like joins and multi document/distributed transactions. Now, you must be wondering it is pretty useless as we have taken away two of the most important features of the relational database. However, to mitigate the problems of joins is one of the reasons why MongoDB is document oriented. If you look at the preceding JSON document for the person, we have the address and the phone number as a part of the document. In relational database, these would have been in separate tables and retrieved by joining these tables together.

Distributed/Multi document transactions inhibit MongoDB to scale out and hence are not supported and nor there is a way to mitigate it. MongoDB still is atomic but the atomicity for inserts and updates is guaranteed at document level and not across multiple documents. Hence, MongoDB is not a good fit for scenarios where complex transactions are needed, such as in an OLTP banking applications. This is an area where good old relational database still rules.

To conclude, let us take a look at the following image. This graph is pretty interesting and was presented by Dwight Merriman, Founder and CEO of 10gen, the MongoDB company in one of his online courses.

so-what-mongodb-img-0

As we can see, we have on one side some products like Memcached which is very low on functionality but high on scalability and performance. On the other end we have RDBMS (Relational Database Management System) which is very rich in features but not that scalable. According to the research done while developing MongoDB, this graph is not linear and there is a point in it after which the scalability and performance fall steeply on adding more features to the product. MongoDB sits on this point where it gives maximum possible features without compromising too much on the scalability and performance.