24 min read

In this article by, Suhas Chatekar, author of the book Learning NHibernate 4, we would dig deeper into that statement and try to understand what those downsides are and what can be done about them. In our attempt to address the downsides of repository, we would present two data access patterns, namely specification pattern and query object pattern. Specification pattern is a pattern adopted into data access layer from a general purpose pattern used for effectively filtering in-memory data.

Before we begin, let me reiterate – repository pattern is not bad or wrong choice in every situation. If you are building a small and simple application involving a handful of entities then repository pattern can serve you well. But if you are building complex domain logic with intricate database interaction then repository may not do justice to your code. The patterns presented can be used in both simple and complex applications, and if you feel that repository is doing the job perfectly then there is no need to move away from it.

(For more resources related to this topic, see here.)

Problems with repository pattern

A lot has been written all over the Internet about what is wrong with repository pattern. A simple Google search would give you lot of interesting articles to read and ponder about. We would spend some time trying to understand problems introduced by repository pattern.

Generalization

FindAll takes name of the employee as input along with some other parameters required for performing the search. When we started putting together a repository, we said that Repository<T> is a common repository class that can be used for any entity. But now FindAll takes a parameter that is only available on Employee, thus locking the implementation of FindAll to the Employee entity only. In order to keep the repository still reusable by other entities, we would need to part ways from the common Repository<T> class and implement a more specific EmployeeRepository class with Employee specific querying methods. This fixes the immediate problem but introduces another one. The new EmployeeRepository breaks the contract offered by IRepository<T> as the FindAll method cannot be pushed on the IRepository<T> interface. We would need to add a new interface IEmployeeRepository. Do you notice where this is going? You would end up implementing lot of repository classes with complex inheritance relationships between them. While this may seem to work, I have experienced that there are better ways of solving this problem.

Unclear and confusing contract

What happens if there is a need to query employees by a different criteria for a different business requirement? Say, we now need to fetch a single Employee instance by its employee number. Even if we ignore the above issue and be ready to add a repository class per entity, we would need to add a method that is specific to fetching the Employee instance matching the employee number. This adds another dimension to the code maintenance problem. Imagine how many such methods we would end up adding for a complex domain every time someone needs to query an entity using a new criteria. With several methods on repository contract that query same entity using different criteria makes the contract less clear and confusing for new developers. Such a pattern also makes it difficult to reuse code even if two methods are only slightly different from each other.

Leaky abstraction

In order to make methods on repositories reusable in different situations, lot of developers tend to add a single method on repository that does not take any input and return an IQueryable<T> by calling ISession.Query<T> inside it, as shown next:

public IQueryable<T> FindAll()
{
   return session.Query<T>();
}

IQueryable<T> returned by this method can then be used to construct any query that you want outside of repository. This is a classic case of leaky abstraction. Repository is supposed to abstract away any concerns around querying the database, but now what we are doing here is returning an IQueryable<T> to the consuming code and asking it to build the queries, thus leaking the abstraction that is supposed to be hidden into repository. IQueryable<T> returned by the preceding method holds an instance of ISession that would be used to ultimately interact with database. Since repository has no control over how and when this IQueryable would invoke database interaction, you might get in trouble. If you are using “session per request” kind of pattern then you are safeguarded against it but if you are not using that pattern for any reason then you need to watch out for errors due to closed or disposed session objects.

God object anti-pattern

A god object is an object that does too many things. Sometimes, there is a single class in an application that does everything. Such an implementation is almost always bad as it majorly breaks the famous single responsibility principle (SRP) and reduces testability and maintainability of code. A lot can be written about SRP and god object anti-pattern but since it is not the primary topic, I would leave the topic with underscoring the importance of staying away from god object anti-pattern. Avid readers can Google on the topic if they are interested.

Repositories by nature tend to become single point of database interaction. Any new database interaction goes through repository. Over time, repositories grow organically with large number of methods doing too many things. You may spot the anti-pattern and decide to break the repository into multiple small repositories but the original single repository would be tightly integrated with your code in so many places that splitting it would be a difficult job.

For a contained and trivial domain model, repository pattern can be a good choice. So do not abandon repositories entirely. It is around complex and changing domain that repositories start exhibiting the problems just discussed. You might still argue that repository is an unneeded abstraction and we can very well use NHibernate directly for a trivial domain model. But I would caution against any design that uses NHibernate directly from domain or domain services layer. No matter what design I use for data access, I would always adhere to “explicitly declare capabilities required” principle. The abstraction that offers required capability can be a repository interface or some other abstractions that we would learn.

Specification pattern

Specification pattern is a reusable and object-oriented way of applying business rules on domain entities. The primary use of specification pattern is to select subset of entities from a larger collection of entities based on some rules. An important characteristic of specification pattern is combining multiple rules by chaining them together.

Specification pattern was in existence before ORMs and other data access patterns had set their feet in the development community. The original form of specification pattern dealt with in-memory collections of entities. The pattern was then adopted to work with ORMs such as NHibernate as people started seeing the benefits that specification pattern could bring about. We would first discuss specification pattern in its original form. That would give us a good understanding of the pattern. We would then modify the implementation to make it fit with NHibernate.

Specification pattern in its original form

Let’s look into an example of specification pattern in its original form. A specification defines a rule that must be satisfied by domain objects. This can be generalized using an interface definition, as follows:

public interface ISpecification<T>
{
bool IsSatisfiedBy(T entity);
}

ISpecification<T> defines a single method IsSatisifedBy. This method takes the entity instance of type T as input and returns a Boolean value depending on whether the entity passed satisfies the rule or not. If we were to write a rule for employees living in London then we can implement a specification as follows:

public class EmployeesLivingIn : ISpecification<Employee>
{
public bool IsSatisfiedBy(Employee entity)
{
   return entity.ResidentialAddress.City == "London";
}
}

The EmployeesLivingIn class implements ISpecification<Employee> telling us that this is a specification for the Employee entity. This specification compares the city from the employee’s ResidentialAddress property with literal string “London” and returns true if it matches. You may be wondering why I have named this class as EmployeesLivingIn. Well, I had some refactoring in mind and I wanted to make my final code read nicely. Let’s see what I mean. We have hardcoded literal string “London” in the preceding specification. This effectively stops this class from being reusable. What if we need a specification for all employees living in Paris? Ideal thing to do would be to accept “London” as a parameter during instantiation of this class and then use that parameter value in the implementation of the IsSatisfiedBy method. Following code listing shows the modified code:

public class EmployeesLivingIn : ISpecification<Employee>
{
private readonly string city;
 
public EmployeesLivingIn(string city)
{
   this.city = city;
}
 
public bool IsSatisfiedBy(Employee entity)
{
   return entity.ResidentialAddress.City == city;
}
}

This looks good without any hardcoded string literals. Now if I wanted my original specification for employees living in London then following is how I could build it:

var specification = new EmployeesLivingIn("London");

Did you notice how the preceding code reads in plain English because of the way class is named? Now, let’s see how to use this specification class. Usual scenario where specifications are used is when you have got a list of entities that you are working with and you want to run a rule and find out which of the entities in the list satisfy that rule. Following code listing shows a very simple use of the specification we just implemented:

List<Employee> employees = //Loaded from somewhere
List<Employee> employeesLivingInLondon = new List<Employee>();
var specification = new EmployeesLivingIn("London");
 
foreach(var employee in employees)
{
if(specification.IsSatisfiedBy(employee))
{
   employeesLivingInLondon.Add(employee);
}
}

We have a list of employees loaded from somewhere and we want to filter this list and get another list comprising of employees living in London.

Till this point, the only benefit we have had from specification pattern is that we have managed to encapsulate the rule into a specification class which can be reused anywhere now. For complex rules, this can be very useful. But for simple rules, specification pattern may look like lot of plumbing code unless we overlook the composability of specifications. Most power of specification pattern comes from ability to chain multiple rules together to form a complex rule. Let’s write another specification for employees who have opted for any benefit:

public class EmployeesHavingOptedForBenefits : ISpecification<Employee>
{
public bool IsSatisfiedBy(Employee entity)
{
   return entity.Benefits.Count > 0;
}
}

In this rule, there is no need to supply any literal value from outside so the implementation is quite simple. We just check if the Benefits collection on the passed employee instance has count greater than zero. You can use this specification in exactly the same way as earlier specification was used.

Now if there is a need to apply both of these specifications to an employee collection, then very little modification to our code is needed. Let’s start with adding an And method to the ISpecification<T> interface, as shown next:

public interface ISpecification<T>
{
bool IsSatisfiedBy(T entity);
ISpecification<T> And(ISpecification<T> specification);
}

The And method accepts an instance of ISpecification<T> and returns another instance of the same type. As you would have guessed, the specification that is returned from the And method would effectively perform a logical AND operation between the specification on which the And method is invoked and specification that is passed into the And method. The actual implementation of the And method comes down to calling the IsSatisfiedBy method on both the specification objects and logically ANDing their results. Since this logic does not change from specification to specification, we can introduce a base class that implements this logic. All specification implementations can then derive from this new base class. Following is the code for the base class:

public abstract class Specification<T> : ISpecification<T>
{
public abstract bool IsSatisfiedBy(T entity);
 
public ISpecification<T> And(ISpecification<T> specification)
{
   return new AndSpecification<T>(this, specification);
}
}

We have marked Specification<T> as abstract as this class does not represent any meaningful business specification and hence we do not want anyone to inadvertently use this class directly. Accordingly, the IsSatisfiedBy method is marked abstract as well. In the implementation of the And method, we are instantiating a new class AndSepcification. This class takes two specification objects as inputs. We pass the current instance and one that is passed to the And method. The definition of AndSpecification is very simple.

public class AndSpecification<T> : Specification<T>
{
private readonly Specification<T> specification1;
private readonly ISpecification<T> specification2;
 
public AndSpecification(Specification<T> specification1,
ISpecification<T> specification2)
{
   this.specification1 = specification1;
   this.specification2 = specification2;
}
 
public override bool IsSatisfiedBy(T entity)
{
   return specification1.IsSatisfiedBy(entity) &&
   specification2.IsSatisfiedBy(entity);
}
}

AndSpecification<T> inherits from abstract class Specification<T> which is obvious. IsSatisfiedBy is simply performing a logical AND operation on the outputs of the ISatisfiedBy method on each of the specification objects passed into AndSpecification<T>.

After we change our previous two business specification implementations to inherit from abstract class Specification<T> instead of interface ISpecification<T>, following is how we can chain two specifications using the And method that we just introduced:

List<Employee> employees = null; //= Load from somewhere
List<Employee> employeesLivingInLondon = new List<Employee>();
var specification = new EmployeesLivingIn("London")
                                    .And(new EmployeesHavingOptedForBenefits());
 
foreach (var employee in employees)
{
if (specification.IsSatisfiedBy(employee))
{
   employeesLivingInLondon.Add(employee);
}
}

There is literally nothing changed in how the specification is used in business logic. The only thing that is changed is construction and chaining together of two specifications as depicted in bold previously.

We can go on and implement other chaining methods but point to take home here is composability that the specification pattern offers. Now let’s look into how specification pattern sits beside NHibernate and helps in fixing some of pain points of repository pattern.

Specification pattern for NHibernate

Fundamental difference between original specification pattern and the pattern applied to NHibernate is that we had an in-memory list of objects to work with in the former case. In case of NHibernate we do not have the list of objects in the memory. We have got the list in the database and we want to be able to specify rules that can be used to generate appropriate SQL to fetch the records from database that satisfy the rule. Owing to this difference, we cannot use the original specification pattern as is when we are working with NHibernate. Let me show you what this means when it comes to writing code that makes use of specification pattern.

A query, in its most basic form, to retrieve all employees living in London would look something as follows:

var employees = session.Query<Employee>()
               .Where(e => e.ResidentialAddress.City == "London");

The lambda expression passed to the Where method is our rule. We want all the Employee instances from database that satisfy this rule. We want to be able to push this rule behind some kind of abstraction such as ISpecification<T> so that this rule can be reused. We would need a method on ISpecification<T> that does not take any input (there are no entities in-memory to pass) and returns a lambda expression that can be passed into the Where method. Following is how that method could look:

public interface ISpecification<T> where T : EntityBase<T>
{
Expression<Func<T, bool>> IsSatisfied();
}

Note the differences from the previous version. We have changed the method name from IsSatisfiedBy to IsSatisfied as there is no entity being passed into this method that would warrant use of word By in the end. This method returns an Expression<Fund<T, bool>>. If you have dealt with situations where you pass lambda expressions around then you know what this type means. If you are new to expression trees, let me give you a brief explanation. Func<T, bool> is a usual function pointer. This pointer specifically points to a function that takes an instance of type T as input and returns a Boolean output. Expression<Func<T, bool>> takes this function pointer and converts it into a lambda expression. An implementation of this new interface would make things more clear. Next code listing shows the specification for employees living in London written against the new contract:

public class EmployeesLivingIn : ISpecification<Employee>
{
private readonly string city;
 
public EmployeesLivingIn(string city)
{
   this.city = city;
}
 
public override Expression<Func<Employee, bool>> IsSatisfied()
{
   return e => e.ResidentialAddress.City == city;
}
}

There is not much changed here compared to the previous implementation. Definition of IsSatisfied now returns a lambda expression instead of a bool. This lambda is exactly same as the one we used in the ISession example. If I had to rewrite that example using the preceding specification then following is how that would look:

var specification = new EmployeeLivingIn("London");
var employees = session.Query<Employee>()
               .Where(specification.IsSatisfied());

We now have a specification wrapped in a reusable object that we can send straight to NHibernate’s ISession interface. Now let’s think about how we can use this from within domain services where we used repositories before. We do not want to reference ISession or any other NHibernate type from domain services as that would break onion architecture. We have two options. We can declare a new capability that can take a specification and execute it against the ISession interface. We can then make domain service classes take a dependency on this new capability. Or we can use the existing IRepository capability and add a method on it which takes the specification and executes it.

We started this article with a statement that repositories have a downside, specifically when it comes to querying entities using different criteria. But now we are considering an option to enrich the repositories with specifications. Is that contradictory? Remember that one of the problems with repository was that every time there is a new criterion to query an entity, we needed a new method on repository. Specification pattern fixes that problem. Specification pattern has taken the criterion out of the repository and moved it into its own class so we only ever need a single method on repository that takes in ISpecification<T> and execute it. So using repository is not as bad as it sounds. Following is how the new method on repository interface would look:

public interface IRepository<T> where T : EntityBase<T>
{
void Save(T entity);
void Update(int id, Employee employee);
T GetById(int id);
IEnumerable<T> Apply(ISpecification<T> specification);
}

The Apply method in bold is the new method that works with specification now. Note that we have removed all other methods that ran various different queries and replaced them with this new method. Methods to save and update the entities are still there. Even the method GetById is there as the mechanism used to get entity by ID is not same as the one used by specifications. So we retain that method.

One thing I have experimented with in some projects is to split read operations from write operations. The IRepository interface represents something that is capable of both reading from the database and writing to database. Sometimes, we only need a capability to read from database, in which case, IRepository looks like an unnecessarily heavy object with capabilities we do not need. In such a situation, declaring a new capability to execute specification makes more sense. I would leave the actual code for this as a self-exercise for our readers.

Specification chaining

In the original implementation of specification pattern, chaining was simply a matter of carrying out logical AND between the outputs of the IsSatisfiedBy method on the specification objects involved in chaining. In case of NHibernate adopted version of specification pattern, the end result boils down to the same but actual implementation is slightly more complex than just ANDing the results. Similar to original specification pattern, we would need an abstract base class Specification<T> and a specialized AndSepcificatin<T> class. I would just skip these details. Let’s go straight into the implementation of the IsSatisifed method on AndSpecification where actual logical ANDing happens.

public override Expression<Func<T, bool>> IsSatisfied()
{
var p = Expression.Parameter(typeof(T), "arg1");
return Expression.Lambda<Func<T, bool>>(Expression.AndAlso(
         Expression.Invoke(specification1.IsSatisfied(), p),
         Expression.Invoke(specification2.IsSatisfied(), p)), p);
}

Logical ANDing of two lambda expression is not a straightforward operation. We need to make use of static methods available on helper class System.Linq.Expressions.Expression. Let’s try to go from inside out. That way it is easier to understand what is happening here. Following is the reproduction of innermost call to the Expression class:

Expression.Invoke(specification1.IsSatisfied(), parameterName)

In the preceding code, we are calling the Invoke method on the Expression class by passing the output of the IsSatisfied method on the first specification. Second parameter passed to this method is a temporary parameter of type T that we created to satisfy the method signature of Invoke. The Invoke method returns an InvocationExpression which represents the invocation of the lambda expression that was used to construct it. Note that actual lambda expression is not invoked yet. We do the same with second specification in question. Outputs of both these operations are then passed into another method on the Expression class as follows:

Expression.AndAlso(
Expression.Invoke(specification1.IsSatisfied(), parameterName),
Expression.Invoke(specification2.IsSatisfied(), parameterName)
)

Expression.AndAlso takes the output from both specification objects in the form of InvocationExpression type and builds a special type called BinaryExpression which represents a logical AND between the two expressions that were passed to it. Next we convert this BinaryExpression into an Expression<Func<T, bool>> by passing it to the Expression.Lambda<Func<T, bool>> method.

This explanation is not very easy to follow and if you have never used, built, or modified lambda expressions programmatically like this before, then you would find it very hard to follow. In that case, I would recommend not bothering yourself too much with this.

Following code snippet shows how logical ORing of two specifications can be implemented. Note that the code snippet only shows the implementation of the IsSatisfied method.

public override Expression<Func<T, bool>> IsSatisfied()
{
var parameterName = Expression.Parameter(typeof(T), "arg1");
return Expression.Lambda<Func<T, bool>>(Expression.OrElse(
Expression.Invoke(specification1.IsSatisfied(), parameterName),
Expression.Invoke(specification2.IsSatisfied(), parameterName)),
parameterName);
}

Rest of the infrastructure around chaining is exactly same as the one presented during discussion of original specification pattern. I have avoided giving full class definitions here to save space but you can download the code to look at complete implementation.

That brings us to end of specification pattern. Though specification pattern is a great leap forward from where repository left us, it does have some limitations of its own. Next, we would look into what these limitations are.

Limitations

Specification pattern is great and unlike repository pattern, I am not going to tell you that it has some downsides and you should try to avoid it. You should not. You should absolutely use it wherever it fits. I would only like to highlight two limitations of specification pattern.

Specification pattern only works with lambda expressions. You cannot use LINQ syntax. There may be times when you would prefer LINQ syntax over lambda expressions. One such situation is when you want to go for theta joins which are not possible with lambda expressions. Another situation is when lambda expressions do not generate optimal SQL. I will show you a quick example to understand this better. Suppose we want to write a specification for employees who have opted for season ticket loan benefit. Following code listing shows how that specification could be written:

public class EmployeeHavingTakenSeasonTicketLoanSepcification :Specification<Employee>
{
public override Expression<Func<Employee, bool>> IsSatisfied()
{
   return e => e.Benefits.Any(b => b is SeasonTicketLoan);
}
}

It is a very simple specification. Note the use of Any to iterate over the Benefits collection to check if any of the Benefit in that collection is of type SeasonTicketLoan. Following SQL is generated when the preceding specification is run:

SELECT employee0_.Id           AS Id0_,
       employee0_.Firstname     AS Firstname0_,
       employee0_.Lastname     AS Lastname0_,
       employee0_.EmailAddress AS EmailAdd5_0_,
       employee0_.DateOfBirth   AS DateOfBi6_0_,
       employee0_.DateOfJoining AS DateOfJo7_0_,
       employee0_.IsAdmin       AS IsAdmin0_,
       employee0_.Password     AS Password0_
FROM   Employee employee0_
WHERE EXISTS
(SELECT benefits1_.Id
FROM   Benefit benefits1_
LEFT OUTER JOIN Leave benefits1_1_
ON benefits1_.Id = benefits1_1_.Id
LEFT OUTER JOIN SkillsEnhancementAllowance benefits1_2_
ON benefits1_.Id = benefits1_2_.Id
LEFT OUTER JOIN SeasonTicketLoan benefits1_3_
ON benefits1_.Id = benefits1_3_.Id
WHERE employee0_.Id = benefits1_.Employee_Id
AND CASE
WHEN benefits1_1_.Id IS NOT NULL THEN 1
     WHEN benefits1_2_.Id IS NOT NULL THEN 2
     WHEN benefits1_3_.Id IS NOT NULL THEN 3
      WHEN benefits1_.Id IS NOT NULL THEN 0
     END = 3)

Isn’t that SQL too complex? It is not only complex on your eyes but this is not how I would have written the needed SQL in absence of NHibernate. I would have just inner-joined the Employee, Benefit, and SeasonTicketLoan tables to get the records I need. On large databases, the preceding query may be too slow. There are some other such situations where queries written using lambda expressions tend to generate complex or not so optimal SQL.

If we use LINQ syntax instead of lambda expressions, then we can get NHibernate to generate just the SQL. Unfortunately, there is no way of fixing this with specification pattern.

Summary

Repository pattern has been around for long time but suffers through some issues. General nature of its implementation comes in the way of extending repository pattern to use it with complex domain models involving large number of entities. Repository contract can be limiting and confusing when there is a need to write complex and very specific queries. Trying to fix these issues with repositories may result in leaky abstraction which can bite us later. Moreover, repositories maintained with less care have a tendency to grow into god objects and maintaining them beyond that point becomes a challenge.

Specification pattern and query object pattern solve these issues on the read side of the things. Different applications have different data access requirements. Some applications are write-heavy while others are read-heavy. But there are a minute number of applications that fit into former category. A large number of applications developed these days are read-heavy. I have worked on applications that involved more than 90 percent database operations that queried data and only less than 10 percent operations that actually inserted/updated data into database. Having this knowledge about the application you are developing can be very useful in determining how you are going to design your data access layer.

That brings use to the end of our NHibernate journey. Not quite, but yes, in a way.

Resources for Article:


Further resources on this subject:


LEAVE A REPLY

Please enter your comment!
Please enter your name here