NoSQL has seen a sharp rise in both adoption and migration from the tried and tested relational database management systems. The open source world has accepted it with open arms, which wasn’t the case with large enterprise organisations that still prefer and require ACID-compliant databases. However, as there are so many NoSQL databases, it’s difficult to keep track of them all! Let’s explore the most popular and different ones available to us:
1 – Apache Cassandra
Apache Cassandra is an open source NoSQL database. Cassandra is a distributed database management system that is massively scalable. An advantage of using Cassandra is its ability to manage large amounts of structured, semi-structured, and unstructured data. What makes Cassandra more appealing as a database system is its ability to ‘Scale Horizontally’, and it’s one of the few database systems that can process data in real time and generate high performance and maintain high availability. The mixture of a column-oriented database with a key-value store means not all rows require a column, but the columns are grouped, which is what makes them look like tables. Cassandra is perfect for ‘mission critical’ big data projects, as Cassandra offers ‘no single point of failure’ if a data node goes down.
2 – MongoDB
MongoDBis an open source schemaless NoSQL database system; its unique appeal is that it’s a ‘Document database’ as opposed to a relational database. This basically means it’s a ‘data dumpster’ that’s free for all. The added benefit in using MongoDB is that it provides high performance, high availability, and easy scalability (auto-sharding) for large sets of unstructured data in JSON-like files. MongoDB is the ultimate opposite to the popular MySQL. MySQL data has to be read in rows and columns, which has its own set of benefits with smaller sets of data.
3 – Neo4j
Neo4j is an open source NoSQL ‘graph-based database’. Neo4j is the frontrunner of the graph-based model. As a graph database, it manages and queries highly connected data reliably and efficiently. It allows developers to store data more naturally from domains such as social networks and recommendation engines. The data collected from sites and applications are initially stored in nodes that are then represented as graphs.
4 – Hadoop
Hadoop is easy to look over as a NoSQL database due to its ecosystem of tools for big data. It is a framework for distributed data storage and processing, designed to help with huge amounts of data while limiting financial and processing-time overheads. Hadoop includes a database known as HBase, which runs on top of HDFS and is a distributed, column-oriented data store. HBase is also better known as a distributed storage system for Hadoop nodes, which are then used to run analytics with the use of MapReduce V2, also known as Yarn.
5 – OrientDB
OrientDB has been included as a wildcard! It’s a very interesting database and one that has everything going for it, but has always been in the shadows of Neo4j. Orient is an open source NoSQL hybrid graph-document database that was developed to combine the flexibility of a document database with the complexity of a graph database (Mongo and Neo4j all in one!). With the growth of complex and unstructured data (such as social media), relational databases were not able to handle the demands of storing and querying this type of data. Document databases were developed as one solution and visualizing them through nodes was another solution. Orient has combined both into one, which sounds awesome in theory but might be very different in practice! Whether the Hybrid approach works and is adopted remains to be seen.