Home Data Tutorials Implementing RethinkDB Query Language

Implementing RethinkDB Query Language

0
826
5 min read

In this article by Shahid Shaikh, the author of the book Mastering RethinkDB, we will cover how you will perform geospatial queries (such as finding all the documents with locations within 5km of a given point).

(For more resources related to this topic, see here.)

Performing MapReduce operations

Learn Programming & Development with a Packt Subscription

MapReduce is the programming model to perform operations (mainly aggregation) on distributed set of data across various clusters in different servers. This concept was coined by Google and been used in Google file system initially and later been adopted in open source Hadoop project.

MapReduce works by processing the data on each server and then combine it together to form a result set. It actually divides into two operations namely map and reduce.

  • Map: It performs the transformation of the elements in the group or individual sequence
  • Reduce: It performs the aggregation and combine results from Map into meaningful result set

In RethinkDB, MapReduce queries operate in three steps:

  • Group operation: To process the data into groups. This step is optional
  • Map operation: To transform the data or group of data into sequence
  • Reduce operation: To aggregate the sequence data to form resultset

So mainly it is Group Map Reduce (GMR) operation. RethinkDB spread the mapreduce query across various clusters in order to improve efficiency. There is specific command to perform this GMR operation; however RethinkDB already integrated them internally to some aggregate functions in order to simplify the process.

Let us perform some aggregation operation in RethinkDB.

Grouping the data

To group the data on basis of field we can use group() ReQL function. Here is sample query on our users table to group the data on the basis of name:

rethinkdb.table("users").group("name").run(connection,function(err,cursor) {

if(err) {

throw new Error(err);

}

cursor.toArray(function(err,data) {

console.log(JSON.stringify(data));

});

}); 

Here is the output for the same:

[

   {

     "group":"John",

     "reduction":[

         {

           "age":24,

           "id":"664fced5-c7d3-4f75-8086-7d6b6171dedb",

           "name":"John"

         },

         {

           "address":{

               "address1":"suite 300",

               "address2":"Broadway",

               "map":{

                 "latitude":"116.4194W",

                 "longitude":"38.8026N"

               },

               "state":"Navada",

               "street":"51/A"

           },

           "age":24,

           "id":"f6f1f0ce-32dd-4bc6-885d-97fe07310845",

           "name":"John"

         }

     ]

   },

   {

     "group":"Mary",

     "reduction":[

         {

           "age":32,

           "id":"c8e12a8c-a717-4d3a-a057-dc90caa7cfcb",

           "name":"Mary"

         }

     ]

   },

   {

     "group":"Michael",

     "reduction":[

         {

           "age":28,

           "id":"4228f95d-8ee4-4cbd-a4a7-a503648d2170",

           "name":"Michael"

         }

     ]

   }

] 

If you observe the query response, data is group by the name and each group is associated with document. Every matching data for the group resides under reductionarray. In order to work on each reductionarray, you can use ungroup() ReQL function which in turns takes grouped streams of data and convert it into array of object. It’s useful to perform the operations such as sorting and so on, on grouped values.

Counting the data

We can count the number of documents present in the table or a sub document of a document using count() method. Here is simple example:

rethinkdb.table("users").count().run(connection,function(err,data) {

if(err) {

throw new Error(err);

}

console.log(data);

});

 It should return the number of documents present in the table. You can also use it count the sub document by nesting the fields and running count() function at the end.

Sum

We can perform the addition of the sequence of data. If value is passed as an expression then sums it up else searches in the field provided in the query.

For example, find out total number of ages of users:

rethinkdb.table("users")("age").sum().run(connection,function(err,data) {

if(err) {

throw new Error(err);

}

console.log(data);

});

 You can of course use an expression to perform math operation like this:

rethinkdb.expr([1,3,4,8]).sum().run(connection,function(err,data) {

if(err) {

throw new Error(err);

}

console.log(data);

});

 Should return 16.

Avg

Performs the average of the given number or searches for the value provided as field in query. For example:

rethinkdb.expr([1,3,4,8]).avg().run(connection,function(err,data) {

if(err) {

throw new Error(err);

}

console.log(data);

}); 

Min and Max

Finds out the maximum and minimum number provided as an expression or as field.

For example, find out the oldest users in database:

rethinkdb.table("users")("age").max().run(connection,function(err,data) {

if(err) {

throw new Error(err);

}

console.log(data);

}); 

Same way of finding out the youngest user:

rethinkdb.table("users")("age").min().run(connection,function(err,data) {

if(err) {

throw new Error(err);

}

console.log(data);

}); 

Distinct

Distinct finds and removes the duplicate element from the sequence, just like SQL one.

For example, find user with unique name:

rethinkdb.table("users")("name").distinct().run(connection,function(err,data) {

if(err) {

throw new Error(err);

}

console.log(data);

});

 

It should return an array containing the names:

[ 'John', 'Mary', 'Michael' ]

Contains

Contains look for the value in the field and if found return boolean response, true in case if it contains the value, false otherwise.

For example, find the user whose name contains John.

rethinkdb.table("users")("name").contains("John").run(connection,function(err,data) {

if(err) {

throw new Error(err);

}

console.log(data);

}); 

Should return true.

Map and reduce

Aggregate functions such as count(), sum() already makes use of map and reduce internally, if required then group() too. You can of course use them explicitly in order to perform various functions.

Summary

From this article we learned various RethinkDB query language as it will help the readers to know much more basic concept of RethinkDB.

Resources for Article:


Further resources on this subject:


NO COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here