At MongoDB.local London event that happened in September this year, Eliot Horowitz, the CTO and Co-Founder of MongoDB took to the stage to talk about the latest features in MongoDB 4.2. He also discussed the updates to Ops Manager and MongoDB Atlas, and new cloud services including integrated full-text search, the Realm development platform, and MongoDB Data Lake.
MongoDB.local is a one-day educational conference that brings together people who develop MongoDB and its ecosystem, as well as fellow MongoDB users. This is where you can get a deeper knowledge of the latest in MongoDB, tools, and best practices directly from the MongoDB experts.
This article lists the various features that have landed in MongoDB 4.2. To get a practical understanding of administering database applications both on-premises and on the cloud, check out our book Mastering MongoDB 4.x – Second Edition by Alex Giamas.
Exciting features in MongoDB 4.2
MongoDB 4.0 came with support for multi-document transactions on replica sets. This support was extended in MongoDB 4.2 by introducing distributed transactions. These add support for multi-document transactions on sharded clusters and also include the existing support for multi-document transactions on replica sets.
Distributed transactions have the same syntax and semantics as the replica set transactions. They are fully ACID compliant and have conversational syntax. Another important update is that there is now no limit to how big a transaction can be. “It is just a matter of how much hardware you have and what the hardware can handle,” Horowitz adds.
Also, previously the sharding system did not allow changing the shard key as it often meant moving a document from one shard to another. Starting with MongoDB 4.2, you are allowed to change the shard key and that too very easily. Now, if you change the value of a shard key and a document is required to be moved from one shard to another, MongoDB will automatically wrap that update behind the scenes inside of a transaction.
This is one step towards ensuring that there is no “difference between a sharded MongoDB cluster and a replica set,” Horowitz shared.
Another function that Horowitz talked about was global cluster locale reassignment. For instance, suppose you have geo zone sharding with some data residing in Europe and some other data in the US. When the users move, you can just change the value of their location field and that data will be automatically moved from Europe to the US using a transaction.
Retryable reads and writes
Retryable reads and writes enable the MongoDB drivers to automatically retry certain transactions if they encounter network errors or if they were not able to find a healthy primary in the replica sets or sharded cluster. Starting with MongoDB 4.2, this feature is enabled by default.
One of the main goals of this feature is ensuring that whenever there is some change in the infrastructure whether it is for planned maintenance or know crashes, the application code shouldn’t care or be affected.
Explaining through an example, he shared, “You have got a web page that does 20 different database operations. Rather than having to reload the entire thing, rather than having to wrap the entire web page in some sort of loop the driver under the covers can just say this I am going to retry this operation.” He adds, “So if a write fails it will retry that write automatically and will have a contract with the server to guarantee that every write happens once and only once.”
Much more expressive updates
MongoDB’s query language is now much richer and expressive with the support for aggregations and other modern use-cases including geo-based search, graph search, and text search. You can do things like sums, handle arrays, and other math directly through an update statement.
“Let’s imagine you’ve got a document and all you want to do is to set the value of A to the value of B+C in every document. Previously, you couldn’t do that and now you can do very simple arithmetic in MongoDB.”
On-demand materialized views
The MongoDB aggregation pipeline, a framework for data aggregation, consists of stages. Each stage is responsible for transforming a document as they pass through the pipeline. MongoDB 4.2 introduces a new stage called ‘$merge’ that allows you to create collections based on aggregation and update those created collections efficiently.
The $out stage already allows creating collections based on an aggregation. It takes the results of an aggregation and puts it into a new collection. But the difference is that it replaces the collections entire contents with the new results. As it regenerates the entire collection every time, it ends up consuming a lot of CPU and IO.
The new $merge feature can incorporate the pipeline results into an existing output collection rather than fully replacing the collection. This enables users to create on-demand materialized views, where the content of the output collection is perennially updated “maybe every minute, every hour, or maybe every day depending on the use case.”
In MongoDB 4.2, we have wildcard indexes that let you index an entire document or a subset of a document. It is introduced to support queries against unknown or arbitrary fields. Horowitz explains, “Previously, you were required to either add an index for every attribute you care about or put these into an array…With wild card indexes, you can actually just say “hey index the entire document or index this entire subset of the document.” What will happen is we will actually index everything in there so you can just do any query that you want.”
However, keep in mind that wildcard indexes are not really designed to replace workload-based index planning. It is suitable for cases when you have polymorphic patterns in your data. Examples of data containing polymorphic pattern include product catalogs, e-commerce, social data, and IoT applications.
Along with offering such great features, it is also important for a database to provide developers a great operational experience. It should have great availability, a powerful monitoring and alerting system, backup, self-service, and APIs. To manage MongoDB we have two options: MongoDB Ops Manager and MongoDB Atlas.
MongoDB Ops Manager
MongoDB Ops Manager is the “best way to run MongoDB on-premises.” Its backup system offers great features such as point-in-time restore and queryable snapshots. In previous versions, however, it was a complex system and in many cases expensive to run. Starting with MongoDB 4.2, it was completely overhauled to be much simpler. Now, there is no concept of “heads.”
This release also introduces a new Kubernetes operator for Ops Manager. On-premise users are moving to private cloud and for that, they mainly rely on Kubernetes. This is why you now have the Kubernetes operator for Ops Manager. It will enable you to directly control the Ops Manager through your Kubernetes interfaces.
MongoDB Atlas is a fully-managed MongoDB as a service. It now has integration with Terraform, a tool used for building, changing, and versioning infrastructure. There is also a new feature called Atlas Auto Scaling for fully-automated capacity management. Once you enable the feature, Atlas will monitor resource utilization metrics in real-time and automatically scale up or down your VM.
In terms of security, MongoDB Atlas is now ISO 27001 certified and PCI compliant. It also supports field-level encryption (FLE) beta. This enables applications to encrypt fields in documents before transmitting data to the server. This encryption happens on the client-side and is completely transparent to the developers.
Another key update in this release is the introduction of MongoDB Atlas Full-Text Search (Beta). Atlas now has a rich-text search functionality against your fully managed MongoDB databases. Horowitz explains, “Today, you typically have to take in MongoDB and synchronize it to some other system (such as Elasticsearch) and under those systems is Apache Lucene.” The team decided to remove this “middleman” to let users go “straight from MongoDB to Lucene.”
Horowitz also talked about MongoDB Atlas Data Lake that enables you to quickly query data in any format on Amazon S3 using the MongoDB Query Language (MQL). It lets you run regular MongoDB queries against data in Amazon S3. It supports any file format including JSON, BSON, CSV, TSV, Avro, and Parquet formats.
In May this year, MongoDB acquired Realm, a database for mobile applications. Horowitz gave some insight into what future plans he has for Realm. “MongoDB is investing in a lot of the things that Realm users have been asking for a long time or taking a lot of the resources we have and making sure that we can accelerate the core realm roadmap as fast as possible.”
Among the new features that RealmDB will get are new data types for unstructured data such as Dicts, Sets, Any/Mixed type for polymorphic data. It will have cascading deletes, inheritance, analytics and transformational queries, support for more platforms. Horowitz plans to integrate Realm more tightly with MongoDB and together they will be called MongoDB Realm. It will be “the best way to build data-intensive applications anywhere.”
This article walked you through the new features in MongoDB 4.2, Ops Manager, Atlas, and much more presented by Eliot Horowitz in his MongoDB.local talk. Check out our book Mastering MongoDB 4.x – Second Edition by Alex Giamas to become a successful MongoDB expert.
This book dives into niche areas of managing databases (such as modeling and querying databases) along with various administration techniques in MongoDB, and much more.