RethinkDB is a relatively new, fully open-source NoSQL database, featuring: ridiculously easy sharding, replicating, & database management, table joins (that’s right!), geospatial & time-series support, and real-time monitoring of complicated queries. I think the feature list alone makes this a piece of tech worth looking further into, to say nothing of the fact that we’ll likely be seeing an explosion of apps that use RethinkDB as their fundamental database–so developers, get ready to have to learn about yet another database. That said, like any tool, you should consult your doctor when deciding if RethinkDB is right for you.
Like most NoSQL offerings, RethinkDB has a few conscience trade-offs in its design, most notably when it comes to ACID compliance, and the CAP-theorem.
Also, because of how queries are performed and returned, “big data” use cases are probably not a great fit for this database–specifically if you want to handle results larger than 64 MB, or are performing computationally intensive work on your stored data.
The web console is insanely easy to use, and gives you all of the control you need for administrating your data-center–even if it is only a data-center of one database.
Setting up a data-center is just a matter of pointing your new database to an existing node in a cluster. Once that’s done, you can use the web console to shard (and re-shard) your data, as well as determine how many replicas you want floating around.
You can also run queries (and profile those queries) against your databases straight form the web console, giving you quick access to your data and performance.
One of the best pieces of syntatic sugar that RethinkDB provides, in my opinion, is the ability to do table joins. While, certainly, this isn’t that magical–what we’re doing is essentially a nested query via a specified field to be used as the nested lookup’s primary key–it really does make queries easy to read and compose.
r.table("table1").eq_join("doc_field_as_table2_primary_key", r.table("table2")).zip().run()
Even more awesomely, the JavaScript ORM Thinky allows for very slick, seamless query-level joins, based on the same principal.
Given that location aware queries are becoming more and more popular, if not downright necessary, it’s great to see that RethinkDB comes with support for the following geometric primitives:point, line, polygon (at least 3 sided), circle, and polygonSub (subtract one polygon from the larger, enclosing polygon).
It allows for the following types of queries: distance, intersects, includes, getIntersecting, and getNearest. For example, you can find all of the documents within 5 km of Greenwich, England.
r.table("table1").getNearest(r.point(0,0), {index: "table1_geo_index", maxDist: 5, unit: "km"}).run()
Official drivers do native conversions for you, which means timezone-aware context driven queries can be made that allow you to find documents that occurred at a given time on a given day in a given timezone.
Some other cool features:
Take, for example, the desire to figure out how many customer support tickets were coming in between 9 am, and 5 pm, every day. We don’t want to have to figure out how to offset the time-stamp on each document, given that the timezones could each be different. Thankfully, RethinkDB will do this accounting, and spread out the computation across the cluster without asking us for a thing.
r.table('customer-support-tickets').filter(function (ticket) {
// ticket.hours() is automatically dealt with in its own timezone
return ticket('time').hours().lt(9).or(
ticket('time').hours().ge(17));
}).count().run();
Probably by far and away the most impressive feature of RethinkDB has to be change-feeds. You can turn almost every practical query that you would want to monitor into a live stream of changes just by chaining the function call changes() to the end.
For example, monitor the changes to a given table:
r.table("table1").changes().run()
or to a given query (the ordering of a table, for instance):
r.table("table1").orderBy("key").changes().run()
And of course, the queries can be made more complicated, but these examples above should blow your mind. No more pulling, no more having to come up with the data diffs yourself before pushing them to the client. RethinkDB will do the diff for you, and push the results straight to your server.
There is one caveat here, however; while this is decent for order-of-magnitude: 10 clients, it is more efficient to couple your change-feeds to a pub-sub service when pushing to many clients.
RethinkDB has a lot of cool things to be excited about: ReQL (it’s readable, highly functional syntax), cluster management, primitives for 21st century applications, and change-feeds. And you know what, if RethinkDB only had change-feeds, I would still be extremely excited about it–think of all that time you no longer have to spend banging your head against the wall trying to deal with consistence and concurrency issues!
If you are thinking about starting a new project, or are tired of fighting with your current NoSQL database, and don’t have any requirements in the “avoid camp”, you should highly consider using RethinkDB.
Jonathan Pollack is a full stack developer living in Berlin. He previously worked as a web developer at a public shoe company, and prior to that, worked at a start up that’s trying to build the world’s best pan-cloud virtualization layer. He can be found on Twitter @murphydanger.
I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…
Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…
Once we learn how to deploy an Ubuntu server, how to manage users, and how…
Key-takeaways: Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…
While developing a web application, or setting dynamic pages and meta tags we need to deal with…
Software architecture is one of the most discussed topics in the software industry today, and…