Memory and cache

10 min read

(For more resources related to this topic, see here.)

You can find this instruction in the OGlobalConfiguration.java file in the autoConfig() method. Furthermore, you can enable/disable level 1 cache, level 2 cache, or both. You can also set the number of records that will be stored in each level as follows:

cache.level1.size: This sets the number of records to be stored in the level 1 caches (default -1, no limit)
cache.level2.size: This sets the number of records to be stored in the level 2 cache (default -1, no limit)
cache.level1.enabled: This is a boolean value, it enables/disables the level 1 cache (default, true)
cache.level2.enabled: This is a boolean value, it enables/disables the level 2 cache (default, true)

Mapping files

OrientDB uses NIO to map data files in memory. However, you can change the way this mapping is performed. This is achieved by modifying the file access strategy.

Mode 0: It uses the memory mapping for all the operations.
Mode 1 (default): It uses the memory mapping, but new reads are performed only if there is enough memory, otherwise the regular Java NIO file read/write is used.
Mode 2: It uses the memory mapping only if the data has been previously loaded.
Mode 3: It uses memory mapping until there is space available, then use regular JAVA NIO file read/write.
Mode 4: It disables all the memory mapping techniques.

To set the strategy mode, you must use the file.mmap.strategy configuration property.

Connections

When you have to connect with a remote database you have some options to improve your application performance. You can use the connection pools, and define the timeout value to acquire a new connection. The pool has two attributes:

minPool: It is the minimum number of opened connections
maxPool: It is the maximum number of opened connections

When the first connection is requested to the pool, a number of connections corresponding to the minPool attribute are opened against the server. If a thread requires a new connection, the requests are satisfied by using a connection from the pool. If all the connections are busy, a new one is created until the value of maxPool is reached. Then the thread will wait, so that a connection is freed. Minimum and maximum connections are defined by using the client.channel.minPool (default value 1) and client.channel.maxPool (default value 5) properties. However, you can override these values in the client code by using the setProperty() method of the connection class. For example:

database = new ODatabaseDocumentTx("remote:localhost/demo");
database.setProperty("minPool", 10);
database.setProperty("maxPool", 50);
database.open("admin", "admin");

You can also change the connection timeout values. In fact, you may experience some problem, if there are network latencies or if some server-side operations require more time to be performed. Generally these kinds of problems are shown in the logfile with warnings:

WARNING: Connection re-acquired transparently after XXXms and Y
retries: no errors will be thrown at application level

You can try to change the network.lockTimeout and the network.socketTimeout values. The first one indicates the timeout in milliseconds to acquire a lock against a channel (default is 15000), the second one indicates the TCP/IP socket timeout in milliseconds (default is 10000). There are some other properties you can try to modify to resolve network issues. These are as follows:

network.socketBufferSize: This is the TCP/IP socket buffer size in bytes (default 32 KB)
network.retry: This indicates the number of retries a client should do to establish a connection against a server (default is 5)
network.retryDelay: This indicates the number of milliseconds a client will wait before retrying to establish a new connection (default is 500)

Transactions

If your primary objective is the performance, avoid using transactions. However, if it is very important for you to have transactions to group operations, you can increase overall performance by disabling the transaction log. To do so just set the tx.useLog property to false.

If you disable the transaction log, OrientDB cannot rollback operations in case JVM crashes.

Other transaction parameters are as follows:

tx.log.synch: It is a Boolean value. If set, OrientDB executes a synch against the filesystem for each transaction log entry. This slows down the transactions, but provides reliability on non- reliable devices. Default value is false.
tx.commit.synch: It is a Boolean value. If set, it performs a storage synch after a commit. Default value is true.

Massive insertions

If you want to do a massive insertion, there are some tricks to speed up the operation. First of all, do it via Java API. This is the fastest way to communicate with OrientDB. Second, instruct the server about your intention:

db.declareIntent( new OIntentMassiveInsert() );
//your code here....
db.declareIntent( null );

Here db is an opened database connection.

By declaring the OIntentMassiveInsert() intent, you are instructing OrientDB to reconfigure itself (that is, it applies a set of preconfigured configuration values) because a massive insert operation will begin. During the massive insert, avoid creating a new ODocument instance for each record to insert. On the contrary, just create an instance the first time, and then clean it using the reset() method:

ODocument doc = new ODocument();
for(int i=0; i< 9999999; i++){
doc.reset(); //here you will reset the ODocument instance
doc.setClassName("Author");
doc.field("id", i);
doc.field("name", "John");
doc.save();
}

This trick works only in a non-transactional context.

Finally, avoid transactions if you can. If you are using a graph database and you have to perform a massive insertion of vertices, you can still reset just one vertex:

ODocument doc = db.createVertex();
...
doc.reset();
...

Moreover, since a graph database caches the most used elements, you may disable this:

db.setRetainObjects(false);

Datafile fragmentation

Each time a record is updated or deleted, a hole is created in the datafiles structure. OrientDB tracks these holes and tries to reuse them. However, many updates and deletes can cause a fragmentation of datafiles, just like in a filesystem. To limit this problem, it is suggested to set the oversize attribute of the classes you create. The oversize attribute is used to allocate more space for records once they are created, so as to avoid defragmentation upon updates. The oversize attribute is a multiplying factor where 1.0 or 0 means no oversize. The default values are 0 for document, and 2 for vertices. OrientDB has a defrag algorithm that starts automatically when certain conditions are verified. You can set some of these conditions by using the following configuration parameter:

file.defrag.holeMaxDistance: It defines the maximum distance in bytes between two holes that triggers the defrag procedure. The default is 32 KB, -1 means dynamic size. The dynamic size is computed in the ODataLocal class in the getCloserHole() method, as Math.max(32768 * (int) (size / 10000000), 32768), where size is the current size of the file.

The profiler

OrientDB has an embedded profiler that you can use to analyze the behavior of the server. The configuration parameters that act on the profiler are as follows:

profiler.enabled: This is a boolean value (enable/disable the profiler), the default value is false.
profiler.autoDump.interval: It is the number of seconds between profiler dump. The default value is 0, which means no dump.
profiler.autoDump.reset: This is a boolean value, reset the profile at every dump. The default is true.

The dump is a JSON string structured in sections. The first one is a huge collection of information gathered at runtime related to the configuration and resources used by each object in the database. The keys are structured as follows:

db.<db-name>: They are database-related metrics
db.<db-name>.cache: They are metrics about databases’ caching
db.<db-name>.data: They are metrics about databases’ datafiles, mainly data holes
db.<db-name>.index: They are metrics about databases’ indexes
system.disk: They are filesystem-related metrics
system.memory: They are RAM-related metrics
system.config.cpus: They are the number of the cores
process.network: They are network metrics
process.runtime: They provide process runtime information and metrics
server.connections.actives: They are number of active connections

The second part of the dump is a collection of chronos. A chrono is a log of an operation, for example, a create operation, an update operation, and so on. Each chrono has the following attributes:

last: It is the last time recorded
min: It is the minimum time recorded
max: It is the maximum time recorded
average: It is the average time recorded
total: It is the total time recorded
entries: It is the number of times the specific metric has been recorded

Finally, there are sections about many counters.

Query tips

In the following paragraphs some useful information on how to optimize the queries execution is given.

The explain command

You can see how OrientDB accesses the data by using the explain command in the console. To use this command simply write explain followed by the select statement:

orientdb> explain select from Posts

A set of key-value pairs are returned. Keys mean the following:

resultType: It is the type of the returned resultset. It can be collection, document, or number.
resultSize: It is the number of records retrieved if the resultType is collection.
recordReads: It is the number of records read from datafiles.
involvedIndexes: They are the indices involved in the query.
indexReads: It is the number of records read from the indices.
documentReads: They are the documents read from the datafiles. This number could be different from recordReads, because in a scanned cluster there can be different kinds of records.
documentAnalyzedCompatibleClass: They are the documents analyzed belonging to the class requested by the query. This number could be different from documentReads, because a cluster may contain several different classes.
elapsed: This time is measured in nanoseconds, it is the time elapsed to execute the statement.

As you can see, OrientDB can use indices to speed up the reads.

Indexes

You can define indexes as we do in a relational database using the create index statement or via Java API using the createIndex() method of the OClass class:

create index <class>.<property> [unique|notunique|fulltext] [field type]

Or for composite index (an index on more than one property):

create index <index_name> on <class> (<field1>,<field2>)
[unique|notunique|fulltext]

If you create a composite index, OrientDB will use it also when in a where clause you don’t specify a criteria against all the indexed fields. So you can avoid this to build an index for each field you use in the queries if you have already built a composite one. This is the case of a partial match search and further information about it can be found in the OrientDB wiki at https://github.com/nuvolabase/orientdb/wiki/Indexes#partial-match-search.

Generally, the indexes don’t work with the like operator. If you want to perform the following query:

select from Authors where name like 'j%'

And you want use an index, you must define on the field name a FULLTEXT index.

FULLTEXT indices permit to index string fields. However keep in mind that indices slow down the insert, update, and delete operations.

Summary

In this article we have seen some strategies that try to optimize both the OrientDB server installation and queries.

Resources for Article:

Further resources on this subject:

Comparative Study of NoSQL Products [Article]
Connecting to Microsoft SQL Server Compact 3.5 with Visual Studio [Article]
Microsoft SQL Azure Tools [Article]

Packt

Next SQL Server Integration Services (SSIS) »

Previous « Skype automation

Top life hacks for prepping for your IT certification exam

I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…

3 years ago

Artificial Intelligence

Learn Transformers for Natural Language Processing with Denis Rothman

Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…

3 years ago

Servers

Learning Essential Linux Commands for Navigating the Shell Effectively

Once we learn how to deploy an Ubuntu server, how to manage users, and how…

3 years ago

Interviews

Clean Coding in Python with Mariano Anaya

Key-takeaways: Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…

3 years ago

Front-End Web Development

Exploring Forms in Angular – types, benefits and differences   

While developing a web application, or setting dynamic pages and meta tags we need to deal with…

3 years ago

Featured

Gain Practical Expertise with the Latest Edition of Software Architecture with C# 9 and .NET 5

Software architecture is one of the most discussed topics in the software industry today, and…

3 years ago

Memory and cache

Mapping files

Connections

Transactions

Massive insertions

Datafile fragmentation

The profiler

Query tips

The explain command

Indexes

Summary

Resources for Article:

Related Post

Recent Posts

Top life hacks for prepping for your IT certification exam

Learn Transformers for Natural Language Processing with Denis Rothman

Learning Essential Linux Commands for Navigating the Shell Effectively

Clean Coding in Python with Mariano Anaya

Exploring Forms in Angular – types, benefits and differences

Gain Practical Expertise with the Latest Edition of Software Architecture with C# 9 and .NET 5

Exploring Forms in Angular – types, benefits and differences