Querying the Data Grid in Coherence 3.5: Obtaining Query Results and Using Indexes

The easiest way to obtain query results is to invoke one of the QueryMap.entrySet methods:

Filter filter = ...;
Set<Map.Entry> results = cache.entrySet(filter);

This will return a set of Map.Entry instances representing both the key and the value of a cache entry, which is likely not what you want. More often than not you need only values, so you will need to iterate over the results and extract the value from each Map.Entry instance:

List values = new ArrayList(results.size());
for (Map.Entry entry : entries) {
values.add(entry.getValue());
}

After doing this a couple times you will probably want to create a utility method for this task. Because all the queries should be encapsulated within various repository implementations, we can simply add the following utility methods to our AbstractCoherenceRepository class:

public abstract class AbstractCoherenceRepository<K, V extends
Entity<K>> {
...
protected Collection<V> queryForValues(Filter filter) {
Set<Map.Entry<K, V>> entries = getCache().entrySet(filter);
return extractValues(entries);
}
protected Collection<V> queryForValues(Filter filter,
Comparator comparator) {
Set<Map.Entry<K, V>> entries =
getCache().entrySet(filter, comparator);
return extractValues(entries);
}
private Collection<V> extractValues(Set<Map.Entry<K, V>> entries) {
List<V> values = new ArrayList<V>(entries.size());
for (Map.Entry<K, V> entry : entries) {
values.add(entry.getValue());
}
return values;
}

What happened to the QueryMap.values() method?
Obviously, things would be a bit simpler if the QueryMap interface also had an overloaded version of the values method that accepts a filter and optionally comparator as arguments.
I'm not sure why this functionality is missing from the API, but I hope it will be added in one of the future releases. In the meantime, a simple utility method is all it takes to provide the missing functionality, so I am not going to complain too much.

Controlling query scope using data affinity

Data affinity can provide a significant performance boost because it allows Coherence to optimize the query for related objects. Instead of executing the query in parallel across all the nodes and aggregating the results, Coherence can simply execute it on a single node, because data affinity guarantees that all the results will be on that particular node. This effectively reduces the number of objects searched to approximately C/N, where C is the total number of objects in the cache query is executed against, and N is the number of partitions in the cluster.

However, this optimization is not automatic—you have to target the partition to search explicitly, using KeyAssociatedFilter:

Filter query = ...;
Filter filter = new KeyAssociatedFilter(query, key);

In the previous example, we create a KeyAssociatedFilter that wraps the query we want to execute. The second argument to its constructor is the cache key that determines the partition to search.

To make all of this more concrete, let's look at the final implementation of the code for our sample application that returns account transactions for a specific period. First, we need to add the getTransactions method to our Account class:

public Collection<Transaction> getTransactions(Date from, Date to) {
return getTransactionRepository().findTransactions(m_id, from, to);
}

Finally, we need to implement the findTransactions method within the CoherenceTransactionRepository:

public Collection<Transaction> findTransactions(
Long accountId, Date from, Date to) {
Filter filter = new FilterBuilder()
.equals("id.accountId", accountId)
.between("time", from, to)
.build();
return queryForValues(
new KeyAssociatedFilter(filter, accountId),
new DefaultTransactionComparator());
}

As you can see, we target the query using the account identifier and ensure that the results are sorted by transaction number by passing DefaultTransactionComparator to the queryForValues helper method we implemented earlier. This ensures that Coherence looks for transactions only within the partition that the account with the specified id belongs to.

Querying near cache

One situation where a direct query using the entrySet method might not be appropriate is when you need to query a near cache.

Because there is no way for Coherence to determine if all the results are already in the front cache, it will always execute the query against the back cache and return all the results over the network, even if some or all of them are already present in the front cache. Obviously, this is a waste of network bandwidth.

What you can do in order to optimize the query is to obtain the keys first and then retrieve the entries by calling the CacheMap.getAll method:

Filter filter = ...;
Set keys = cache.keySet(filter);
Map results = cache.getAll(keys);

The getAll method will try to satisfy as many results as possible from the front cache and delegate to the back cache to retrieve only the missing ones. This will ensure that we move the bare minimum of data across the wire when executing queries, which will improve the throughput.

However, keep in mind that this approach might increase latency, as you are making two network roundtrips instead of one, unless all results are already in the front cache. In general, if the expected result set is relatively small, it might make more sense to move all the results over the network using a single entrySet call.

Another potential problem with the idiom used for near cache queries is that it could return invalid results. There is a possibility that some of the entries might change between the calls to keySet and getAll. If that happens, getAll might return entries that do not satisfy the filter anymore, so you should only use this approach if you know that this cannot happen (for example, if objects in the cache you are querying, or at least the attributes that the query is based on, are immutable).

Sorting the results

We have already seen that the entrySet method allows you to pass a Comparator as a second argument, which will be used to sort the results. If your objects implement the Comparable interface you can also specify null as a second argument and the results will be sorted based on their natural ordering. For example, if we defined the natural sort order for transactions by implementing Comparable within our Transaction class, we could've simply passed null instead of a DefaultTransactionComparator instance within the findTransactions implementation shown earlier.

On the other hand, if you use near cache query idiom, you will have to sort the results yourself. This is again an opportunity to add utility methods that allow you to query near cache and to optionally sort the results to our base repository class. However, there is a lot more to cover in this article, so I will leave this as an exercise for the reader.