(For more resources related to this topic, see here.)

Introducing Origin

Origin is a gem that provides the DSL for Mongoid queries. Though at first glance, a question may seem to arise as to why we need a DSL for Mongoid queries; If we are finally going to convert the query to a MongoDB-compliant hash, then why do we need a DSL?

Origin was extracted from Mongoid gem and put into a new gem, so that there is a standard for querying. It has no dependency on any other gem and is a standalone pure query builder. The idea was that this could be a generic DSL that can be used even without Mongoid!

So, now we have a very generic and standard querying pattern. For example, in Mongoid 2.x we had the criteria any_in and any_of, and no direct support for the and, or, and nor operations. In Mongoid 2.x, the only way we could fire a $or or a $and query was like this:

Author.where("$or" => {'language' => 'English', 'address.city' =>
'London
'})

And now in Mongoid 3, we have a cleaner approach.

Author.or(:language => 'English', 'address.city' => 'London')

Origin also provides good selectors directly in our models. So, this is now much more readable:

Book.gte(published_at: Date.parse('2012/11/11'))

Memory maps, delayed sync, and journals

As we have seen earlier, MongoDB stores data in memory-mapped files of at most 2 GB each. After the data is loaded for the first time into the memory mapped files, we now get almost memory-like speeds for access instead of disk I/O, which is much slower. These memory-mapped files are preallocated to ensure that there is no delay of the file generation while saving data.

However, to ensure that the data is not lost, it needs to be persisted to the disk. This is achieved by journaling. With journaling, every database operation is written to the oplog collection and that is flushed to disk every 100 ms. Journaling is turned on by default in the MongoDB configuration. This is not the actual data but the operation itself. This helps in better recovery (in case of any crash) and also ensures the consistency of writes. The data that is written to various collections are flushed to the disk every 60 seconds. This ensures that the data is persisted periodically and also ensures the speed of data access is almost as fast as memory. MongoDB relies on the operating system for the memory management of its memory-mapped files. This has the advantage of getting inherent OS benefits as the OS is improved. Also, there's the disadvantage of lack of control on how memory is managed by MongoDB.

However, what happens if something goes wrong (server crashes, database stops, or disk is corrupted)? To ensure durability, whenever data is saved in files, the action is logged to a file in a chronological order. This is the journal entry, which is also a memory-mapped file but is synced with the disk every 100 ms. Using the journal, the database can be easily recovered in case of any crash. So, in the worst case scenario, we could potentially lose 100 ms of information. This is a fair price to pay for the benefits of using MongoDB.

MongoDB journaling makes it a very robust and durable database. However, it also helps us decide when to use MongoDB and when not to use it. 100 ms is a long time for some services, such as financial core banking or maybe stock price updates. In such applications, MongoDB is not recommended.

For most cases that are not related to heavy multi-table transactions like most financial applications MongoDB can be suitable.

All these things are handled seamlessly, and we don't usually need to change anything. We can control this behavior via the configuration of MongoDB but usually it's not recommended. Let's now see how we save data using Mongoid.

Updating documents and attributes

As with ActiveModel specifications, save will update the changed attributes and return the updated object, otherwise it will return false on failure. The save! function will raise an exception on the error. In both cases, if we pass validate: false as a parameter to save, it will bypass the validations.

A lesser-known persistence option is the upsert action. An upsert action creates a new document if it does not find it and overwrites the object if it finds it. A good reason to use upsert is in the find_and_modify action.

For example, suppose we want to reserve a book in our Sodibee system, and we want to ensure that at any one point, there can be only one reservation for a book. In a traditional scenario:

t1: Request-1 searches for a for a book which is not reserved and finds it
t2: Now, it saves the book with the reservation information
t3: Request-2 searches for a reservation for the same book and finds that the book is reserved
t4: Request-2 handles the situation with either error or waits for reservation to be freed

So far so good! However in a concurrent model, especially for web applications, it creates problems.

t1: Request-1 searches for a book which is not reserved and finds it
t2: Request-2 searches for a reservation for the same book and also gets back that book since it's not yet reserved
t3: Request-1 saves the book with its reservation information
t4: Request-2 now overwrites previous update and saves the book with its reservation information

Now we have a situation where two requests think that the reservation for the book was successful and that is against our expectations. This is a typical problem that plagues most web applications. The various ways in which we can solve this is discussed in the subsequent sections.

Write concern

MongoDB helps us ensure write consistency. This means that when we write something to MongoDB, it now guarantees the success of the write operation. Interestingly, this is a configurable option and is set to acknowledged by default. This means that the write is guaranteed because it waits for an acknowledgement before returning success.

In earlier versions of Mongoid, safe: true was turned off by default. This meant that success of the write operation was not guaranteed. The write concern is configured in Mongoid.yml as follows:

development:
sessions:
default:
hosts:
- localhost:27017
options:
write:
w: 1

The default write concern in Mongoid is configured with w: 1, which means that the success of a write operation is guaranteed. Let's see an example:

class Author
include Mongoid::Document
field :name, type: String
index( {name: 1}, {unique: true, background: true})
end

Indexing blocks read and write operations. Hence, its recommended to configure indexing in the background in a Rails application.

We shall now start a Rails console and see how this reacts to a duplicate key index by creating two Author objects with the same name.

irb> Author.create(name: "Gautam")
=> #<Author _id: 5143678345db7ca255000001, name: "Gautam">
irb> Author.create(name: "Gautam")
Moped::Errors::OperationFailure: The operation:
#<Moped::Protocol::Command
@length=83
@request_id=3
@response_to=0
@op_code=2004
@flags=[]
@full_collection_name="sodibee_development.$cmd"
@skip=0
@limit=-1
@selector={:getlasterror=>1, :w=>1}
@fields=nil>
failed with error 11000: "E11000 duplicate key error index: sodibee_
development.authors.$name_1 dup key: { : "Gautam" }"

As we can see, it has raised a duplicate key error and the document is not saved. Now, let's have some fun. Let's change the write concern to unacknowledged:

development:
sessions:
default:
hosts:
- localhost:27017
options:
write:
w: 0

The write concern is now set to unacknowledged writes. That means we do not wait for the MongoDB write to eventually succeed, but assume that it will. Now let's see what happens with the same command that had failed earlier.

irb > Author.where(name: "Gautam").count
=> 1
irb > Author.create(name: "Gautam")
=> #<Author _id: 5287cba54761755624000000, name: "Gautam">
irb > Author.where(name: "Gautam").count
=> 1

There seems to be a discrepancy here. Though Mongoid create returned successfully, the data was not saved to the database. Since we specified background: true for the name index, the document creation seemed to succeed as MongoDB had not indexed it yet, and we did not wait for acknowledging the success of the write operation. So, when MongoDB tries to index the data in the background, it realizes that the index criterion is not met (since the index is unique), and it deletes the document from the database. Now, since that was in the background, there is no way to figure this out on the console or in our Rails application. This leads to an inconsistent result.

So, how can we solve this problem? There are various ways to solve this problem:

We leave the Mongoid default write concern configuration alone. By default, it is w: 1 and it will raise an exception. This is the recommended approach as prevention is better than cure!
Do not specify the background: true option. This will create indexes in the foreground. However, this approach is not recommended as it can cause a drop in performance because index creations block read and write access.
Add drop_dups: true. This deletes data, so you have to be really careful when using this option.

Other options to the index command create different types of indexes as shown in the following table:

Index Type

Example

Description

sparse

index({twitter_name: 1}, { sparse: true})

This creates sparse indexes, that is, only the documents containing the indexed fields are indexed. Use this with care as you can get incomplete results.

2dsphere

index({:location => "2dsphere"})

This creates a two-dimensional spherical index.

Text index

MongoDB 2.4 introduced text indexes that are as close to free text search indexes as it gets. However, it does only basic text indexing—that is, it supports stop words and stemming. It also assigns a relevance score with each search.

Text indexes are still an experimental feature in MongoDB, and they are not recommended for extensive use. Use ElasticSearch, Solr (Sunspot), or ThinkingSphinx instead.

The following code snippet shows how we can specify a text index with weightage:

index({ "name" => 'text',
"last_name" => 'text'
},
{
weights: {
'name' => 10,
'last_name' => 5,
},
name: 'author_text_index'
}
)

There is no direct search support in Mongoid (as yet). So, if you want to invoke a text search, you need to hack around a little.

irb> srch = Mongoid::Contextual::TextSearch.new(Author.collection,
Author.all, 'john')
=> #<Mongoid::Contextual::TextSearch
selector: {}
class: Author
search: john
filter: {}
project: N/A
limit: N/A
language: default>
irb> srch.execute
=> {"queryDebugString"=>"john||||||", "language"=>"english",
"results"=>[{"score"=>7.5, "obj"=>{"_id"=>BSON::ObjectId('51fc058345d
b7c843f00030b'), "name"=>"Bettye Johns"}}, {"score"=>7.5, "obj"=>{"_id
"=>BSON::ObjectId('51fc058345db7c843f00046d'), "name"=>"John Pagac"}},
{"score"=>7.5, "obj"=>{"_id"=>BSON::ObjectId('51fc058345db7c843f000578'),
"name"=>"Jeanie Johns"}}, {"score"=>7.5, "obj"=>{"_id"=>BSON::ObjectId('5
1fc058445db7c843f0007e7')
...
{"score"=>7.5, "obj"=>{"_id"=>BSON::ObjectId('51fc058a45db7c843
f0025f1'), "name"=>"Alford Johns"}}], "stats"=>{"nscanned"=>25,
"nscannedObjects"=>0, "n"=>25, "nfound"=>25, "timeMicros"=>31103},
"ok"=>1.0}

By default, text search is disabled in MongoDB configuration. We need to turn it on by adding setParameter = textSearchEnabled=true in the MongoDB configuration file, typically /usr/local/mongo.conf.

This returns a result with statistical data as well the documents and their relevance score. Interestingly, it also specifies the language. There are a few more things we can do with the search result. For example, we can see the statistical information as follows:

irb> a.stats
=> {"nscanned"=>25, "nscannedObjects"=>0, "n"=>25, "nfound"=>25,
"timeMicros"=>31103}

We can also convert the data into our Mongoid model objects by using project, as shown in the following command:

> a.project(:name).to_a
=> [#<Author _id: 51fc058345db7c843f00030b, name: "Bettye Johns",
last_name: nil, password: nil>, #<Author _id: 51fc058345db7c843f00046d,
name: "John Pagac", last_name: nil, password: nil>, #<Author _id:
51fc058345db7c843f000578, name: "Jeanie Johns", last_name: nil, password:
nil> ...

Some of the important things to remember are as follows:

Text indexes can be very heavy in memory.
They can return the documents, so the result can be large.
We can use multiple keys (or filters) along with a text search. For example, the index with index ({ 'state': 1, name: 'text'}) will mandate the use of the state for every text search that is specified.
A search for "john doe" will result in a search for "john"or "doe"or both.
A search for "john"and "doe" will search for all "john"and "doe"in a random order.
A search for ""john doe"", that is, with escaped quotes, will search for documents containing the exact phrase "john doe".

A lot more data can be found at http://docs.mongodb.org/manual/tutorial/search-for-text/

Summary

This article provides an excellent reference for using Mongoid. The article has examples with code samples and explanations that help in understanding the various features of Mongoid.