(For more resources on Microsoft products, see here.)

Change is not always good

Any change to existing code means it has the potential to change the external behavior of the system. When we refactor code, we explicitly intend not to change the external behavior of system. But how do we perform our refactorings while being reasonably comfortable that we haven't changed external behavior?

The first step to validating that external behavior hasn't been affected is to define the criteria by which we can validate that the external behavior hasn't changed.

Automated testing

Every developer does unit testing. Some developers write a bit of test code, maybe an independent project that uses the code to verify it in some way then promptly forgets about the project. Or even worse, they throw away that project. For the purposes of this text, when I use the term "testing", I mean "automated testing".

Test automation is the practice of using a testing framework to facilitate and execute tests. A test automation framework promotes the automatic execution of multiple tests. Generally these frameworks include some sort of Graphical User Interface that helps manage tests and their execution. Passing tests are "Green" and failing tests are "Red", which is where the "Red, Green, Refactor" mantra comes from.

Unit tests

If we're refactoring, there's a chance that what we want to refactor isn't currently under test. This means that if we do perform refactoring on the code, we'll have to manually test the system through the established user interfaces to verify that the code works. Realistically, this doesn't verify the code; this verifies that the external behavior hasn't changed. There could very well be a hidden problem in the code that won't manifest itself until the external behavior has been modified—distancing detection of the defect from when it was created. Our goal is to not affect external behavior when refactoring, so verification through the graphical user interface doesn't fully verify our changes and is time consuming and more prone to human error.

What we really want to do is unit test the code. The term "unit test" has become overloaded over the years. MSDN describes unit testing as taking:

... the smallest piece of testable software in the application, [isolating] from the remainder of the code, and [determining] whether it behaves exactly as [expected].

This smallest piece of software is generally at the method level—unit testing is effectively about ensuring each method behaves as expected. Originally, it meant to test an individual unit of code. "Unit test" has evolved to mean any sort of code-based automated test, tests that developers write and execute within the development process. With various available frameworks, the process of testing the graphical user interface can also be automated in a code-based test, but we won't focus on that.

It's not unusual for some software projects to have hundreds and thousands of individual unit tests. Given the granularity of some of the tests, it's also not unusual for the lines of code in the unit tests to outnumber the actual production code. This is expected.

At the lowest level, we want to perform true "unit-testing", we want to test individual units of code, independently, to verify that unit of code functions as expected—especially in the presence of refactoring. To independently test these units of code we often have to separate them from their dependant code. For example, if I want to verify the business logic to uniquely generate an entity ID, there's no real need for me to access the database to verify that code. That code to generate a unique ID may depend on a collection of IDs to fully verify the algorithm to generate a unique ID—but that collection of IDs, for the purposes of verification, doesn't need to come from a database. So, we want to separate out use of some dependencies like the database from some of our tests.

Techniques for loosely-coupled design like Dependency Inversion and Dependency Injection allow for a composable design. This composable design aids in the flexibility and agility of our software system, but it also aids in unit testing.

Other testing

Useful and thorough information about all types of testing could easily reach enough information to take up several tomes. We're focusing on the developer task of refactoring, so we're limiting our coverage of testing to absolute essential developer testing: unit testing.

The fact that we're focusing on unit tests with regard to refactoring doesn't mean that other types of testing is neither useful nor needed. The fact that developers are performing unit tests doesn't preclude that they also need to perform a certain level of integration testing and the QA personnel are performing other levels of integration testing, user interface testing, user acceptance testing, system testing, and so on.

Integration testing is combining distinct modules in the system to verify that they interoperate exactly as expected. User interface testing is testing that the user interface is behaving exactly as expected. User acceptance testing is verifying that specific user requirements are being met—which could involve unit testing, integration testing, user interface testing, verifying non-functional requirements, and so on.

Mocking

Mocking is a general term that usually refers the substitution of Test Doubles for dependencies within a system under test that aren't the focus of the test. "Mocking" generally encompasses all types of test doubles, not just Mock test doubles.

Test Double is any object that takes the place of a production object for testing purposes.
Mock is a type of Test Double that stands in for a production object whose behavior or attributes are directly used within the code under test and within the verification.

Test Doubles allow an automated test to gather the criteria by which the code is verified. Test Doubles allow isolation of the code under test. There are several different types of Test Doubles: Mock, Dummy, Stub, Fake, and Spy.

Dummy is a type of Test Double that is only passed around within the test but not directly used by the test. "null" is an example of a dummy—use of "null" satisfies the code, but may not be necessary for verification.
Stub is a type of Test Double that provides inputs to the test and may accept inputs from the test but does not use them. The inputs a Stub provides to the test are generally "canned".
Fake is a type of Test Double that is used to substitute a production component for a test component. A Fake generally provides an alternate implementation of that production component that isn't suitable for production but useful for verification. Fakes are generally used for components with heavy integration dependencies that would otherwise make the test slow or heavily reliant on configuration.
Spy is a type of Test Double that effectively records the actions performed on it. The recorded actions can then be used for verification. This is often used in behavioral-based—rather than state-based—testing.

Test doubles can be created manually, or they can be created automatically through the use of mocking frameworks. Frameworks like Rhino Mocks provide the ability to automatically create test doubles. Mocking framework generally rely on a loosely-coupled design so that the generated test doubles can be substituted for other objects based upon an interface.

Let's look at an example of writing a unit test in involving mocking. If we return to one of our decoupling examples—InvoiceRepository—we can now test the internals of InvoiceRepository without testing our Data Access Layer (DAL). We would start by creating a test for the InvoiceRepository.Load method:

[TestClass()]
public class InvoiceRepositoryTest
{
[TestMethod()]
public void LoadTest()
{
 DateTime expectedDate = DateTime.Now;
 IDataAccess dataAccess =
 new InvoiceRepositoryDataAccessStub(expectedDate);
 InvoiceRepository target = new
 InvoiceRepository(dataAccess);
 Guid invoiceId = Guid.NewGuid();
 Invoice actualInvoice = target.Load(invoiceId);
 Assert.AreEqual(expectedDate, actualInvoice.Date);
 Assert.AreEqual(invoiceId, actualInvoice.Id);
 Assert.AreEqual("Test", actualInvoice.Title);
 Assert.AreEqual(InvoiceStatus.Posted,
 actualInvoice.Status);
 Assert.AreEqual(1, actualInvoice.LineItems.Count());
 InvoiceLineItem actualLineItem =
 actualInvoice.LineItems.First();
 Assert.AreEqual("Description",
 actualLineItem.Description);
 Assert.AreEqual(1F, actualLineItem.Discount);
 Assert.AreEqual(2F, actualLineItem.Price);
 Assert.AreEqual(3F, actualLineItem.Quantity);
}
}

Here, we're creating an instance of our repository passing it a Stub IDataAccess class. We then invoke the Load method and verify the various attributes of the resulting Invoice object. We, of course, don't have a class named InvoiceRepositoryDataAccesStub, so we'll have to create one. This class, for the purposes of this test, will look like this.

class InvoiceRepositoryDataAccesStub : IDataAccess
{
 private DateTime constantDate;
 public InvoiceRepositoryDataAccesStub(DateTime date)
{
 constantDate = date;
}
 public System.Data.DataSet LoadInvoice(Guid invoiceId)
{
 DataSet invoiceDataSet = new DataSet("Invoice");
 DataTable invoiceTable =
 invoiceDataSet.Tables.Add("Invoices");
 DataColumn column = new DataColumn("Id",
 typeof(Guid));
 invoiceTable.Columns.Add(column);
 column = new DataColumn("Date", typeof(DateTime));
 invoiceTable.Columns.Add(column);
 column = new DataColumn("Title", typeof(String));
 invoiceTable.Columns.Add(column);
 column = new DataColumn("Status", typeof(int));
 invoiceTable.Columns.Add(column);
 DataRow invoiceRow =
 invoiceTable.NewRow();
 invoiceRow["Id"] = invoiceId;
 invoiceRow["Date"] = constantDate;
 invoiceRow["Status"] = InvoiceStatus.Posted;
 invoiceRow["Title"] = "Test";
 invoiceTable.Rows.Add(invoiceRow);
 return invoiceDataSet;
}
public System.Data.DataSet LoadInvoiceLineItems(
Guid invoiceId)
{
 DataSet lineItemDataSet = new DataSet("LineItem");
 DataTable lineItemTable =
 lineItemDataSet.Tables.Add("LineItems");
 DataColumn column =new DataColumn("InvoiceId", typeof(Guid));
 lineItemTable.Columns.Add(column);
 column = new DataColumn("Price", typeof(Decimal));
 lineItemTable.Columns.Add(column);
 column = new DataColumn("Quantity", typeof(int));
 lineItemTable.Columns.Add(column);
 column = new DataColumn("Discount", typeof(double));
 lineItemTable.Columns.Add(column);
 column = new DataColumn("Description", typeof(String));
 lineItemTable.Columns.Add(column);
 column = new DataColumn("TaxRate1", typeof(String));
 lineItemTable.Columns.Add(column);
 column = new DataColumn("TaxRate2", typeof(String));
 lineItemTable.Columns.Add(column);
 DataRow lineItemRow =
 lineItemDataSet.Tables["LineItems"].NewRow();
 lineItemRow["InvoiceId"] = invoiceId;
 lineItemRow["Discount"] = 1F;
 lineItemRow["Price"] = 2F;
 lineItemRow["Quantity"] = 3;
 lineItemRow["Description"] = "Description";
 lineItemTable.Rows.Add(lineItemRow);
 return lineItemDataSet;
}
public void SaveInvoice(System.Data.DataSet dataSet)
{
 throw new NotImplementedException();
}
}

Here, we're manually creating DataSet object and populating rows with canned data that we're specifically checking for in the validation code within the test. It's worth noting that we haven't implemented SaveInvoice in this class. This is mostly because we haven't implemented this in the production code yet; but, in the case of testing Load, an exception would be thrown should it call SaveInvoice— adding more depth to the validation of the Load method, since it shouldn't be using SaveInvoice to load data.

In the InvoiceRepositoryTest.LoadTest method, we're specifically using the InvoiceRepositoryDataAccessStub. InvoiceRepositoryDataAccessStub is a Stub of and IDataAccess specifically for use with InvoiceRepository. If you recall, a Stub is a Test Double that substitutes for a production component but inputs canned data into the system under test. In our test, we're just checking for that canned data to verify that the InvoiceRepository called our InvoiceRepositoryDataAccessStub instance in the correct way.

Priorities

In a project with little or no unit tests, it can be overwhelming to begin refactoring the code. There can be the tendency to want to first establish unit tests for the entire code base before refactoring starts. This, of course, is linear thinking. An established code base has been verified to a certain extent. If it's been deployed, the code effectively "works". Attempting to unit test every line of code isn't going to change that fact.

It's when we start to change code that we want to verify that our change doesn't have an unwanted side-effect. To this effect, we want to prioritize unit-testing to avoid having unit-testing become the sole focus of the team. I find that the unit-testing priorities when starting out with unit-testing are the same as when a system has had unit tests for some time. The focus should be that any new code should have as much unit-testing code coverage as realistically possible and any code that needs to be refactored should have code coverage as high as realistically possible.

The priority here is to ensure that any new code is tested and verified, and accept the fact that existing code has been verified in its own way. If we're not planning on immediately changing certain parts of code, they don't need unit-tests and should be of lower priority.

Code coverage

Something that often goes hand-in-hand with unit testing is Code Coverage. The goal of code coverage is to get as close to 100% coverage as reasonably possible.

Code Coverage is the measure of the percentage of code that is executed (covered) by automated tests.

Code coverage is a metric generally used in teams that are performing unit tests on a good portion of their code. Just starting out with unit testing, code coverage is effectively anecdotal. It doesn't tell you much more than you are doing some unit tests.

One trap to get into as teams start approaching majority code coverage is to strive for 100% code coverage. This is both problematic and counterproductive. There is some code that is difficult to test and even harder to verify. The work involved to test this code is simply to increase code coverage percentages.

I prefer to view the code coverage delta over time. In other words, I concentrate on how the code coverage percentage changes (or doesn't). I want to ensure that it's not going down. If the code coverage percentage is really low (say 25%) then I may want to see it increasing, but not at the risk of supplanting other work.