Transforming Data

0
52
16 min read

(For more resources related to this topic, see here.)

Process overview

Let’s demonstrate this data transformation on an example. Imagine that we need to convert some legacy customer data into our new system.

Since not all legacy data is perfect, we’ll need a way to report data that, for some reason, we’ve failed to transfer. For example, our system allows only one address per customer. Legacy customers with more than one address will be reported.

In this article we’ll:

  • Load customer data from the legacy system. The data will include address and account information.
  • Run transformation rules over this data and build an execution report.
  • Populate the domain model with transformed data, running validation rules (from the previous section) and saving it into our system.

Getting the data

As a good practice, we’ll define an interface for interacting with the other system. We’ll introduce a LegacyBankService interface for this purpose. It will make it easier to change the way we communicate with the legacy system, and also the tests will be easier to write.

package droolsbook.transform.service;
import java.util.List;
import java.util.Map;
public interface LegacyBankService {
/**
* @return all customers
*/
List<Map<String, Object>> findAllCustomers();
/**
* @return addresses for specified customer id
*/
List<Map<String, Object>> findAddressByCustomerId(
Long customerId);
/**
* @return accounts for specified customer id
*/
List<Map<String, Object>> findAccountByCustomerId(
Long customerId);
}

Code listing 1: Interface that abstracts the legacy system interactions

The interface defines three methods. The first one can retrieve a list of all customers, and the second and third ones retrieve a list of addresses and accounts for a specific customer. Each list contains zero or many maps. One map represents one object in the legacy system. The keys of this map are object property names (for example, addressLine1), and the values are the actual properties.

We’ve chosen a map because it is a generic data type that can store almost any data, which is ideal for a data transformation task. However, it has a slight disadvantage in that the rules will be a bit harder to write.

The implementation of this interface will be defined at the end of this article.

Loading facts into the rule session

Before writing some transformation rules, the data needs to be loaded into the rule session. This can be done by writing a specialized rule just for this purpose,as follows:

package droolsbook.transform;
import java.util.*;
import droolsbook.transform.service.LegacyBankService;
import droolsbook.bank.model.Address;
import droolsbook.bank.model.Address.Country;
global LegacyBankService legacyService;
rule findAllCustomers
when
$customerMap : Map( )
fromlegacyService.findAllCustomers()
then
$customerMap.put("_type_", "Customer");
insert( $customerMap );
end

Code listing 2: Rule that loads all Customers into the rule session (the dataTransformation.drl file).

The preceding findAllCustomers rule matches on a Map instance that is obtained from our legacyService. In the consequence part, it adds the type (so that we can recognize that this map represents a customer) and inserts this map into the session.

There are a few things to be noted here, as follows:

  • A rule is being used to insert objects into the rule session; this just shows a different way of loading objects into the rule session.
  • Every customer returned from the findAllCustomers method is being inserted into the session. This is reasonable only if there is a small number of customers. If it is not the case, we can paginate, that is, process only N customers at once, then start over with the next N customers, and so on. Alternatively, the findAllCustomers rule can be removed and customers could be inserted into the rule session at session-creation time. We’ll now focus on this latter approach (for example, only one Customer instance is in the rule session at any given time); it will make the reporting easier.
  • A type of the map is being added to the map. This is a disadvantage of using a Map object for every type (Customer, Address, and so on): the type information is lost. It can be seen in the following rule that finds addresses for a customer:

    rule findAddress
    dialect "mvel"
    when
    $customerMap : Map( this["_type_"] == "Customer" )
    $addressMap : Map( )
    fromlegacyService.findAddressByCustomerId(
    (Long) $customerMap["customer_id"] )
    then
    $addressMap.put("_type_", "Address");
    insert( $addressMap )
    end

Code listing 3: Rule that loads all Addresses for a Customer into the rule session (the dataTransformation.drl file)

Let’s focus on the first condition line. It matches on customerMap. It has to test if this Map object contains a customer’s data by executing [“_type_”] == “Customer”. To avoid doing these type checks in every condition, a new map can be extended from HashMap, for example LegacyCustomerHashMap. The rule might look like the following line of code:

$customerMap :LegacyCustomerHashMap( )

The preceding line of code performs matching on the customer map without doing the type check.

We’ll continue with the second part of the condition. It matches on addressMap that comes from our legacyService as well. The from keyword supports parameterized service calls. customer_id is passed to the findAddressByCustomerId method. Another nice thing about this is that we don’t have to cast the parameter to java.lang.Long; it is done automatically.

The consequence part of this rule just sets the type and inserts addressMap into the knowledge session. Please note that only addresses for loaded customers are loaded into the session. This saves memory but it could also cause a lot of “chattiness” with the LegacyBankService interface if there are many child objects. It can be fixed by pre-loading those objects. The implementation of this interface is the right place for this.

Similar data-loading rules can be written for other types, for example Account.

Writing transformation rules

Now that all objects are in the knowledge session, we can start writing some transformation rules. Let’s imagine that in the legacy system there are many duplicate addresses. We can write a rule that removes such duplication:

rule twoEqualAddressesDifferentInstance
when
$addressMap1 : Map( this["_type_"] == "Address" )
$addressMap2 : Map( this["_type_"] == "Address",
eval( $addressMap1 != $addressMap2 ),
this == $addressMap1 )
then
retract( $addressMap2 );
validationReport.addMessage(
reportFactory.createMessage(Message.Type.WARNING,
kcontext.getRule().getName(), $addressMap2));
end

Code listing 4: Rule that loads all Addresses for a Customer into the rule session (file dataTransformation.drl)

The rule matches two addresses. It checks that they don’t have the same object identities by doing eval( $addressMap1 != $addressMap2 ). Otherwise, the rule could match on a single address instance. The next part, this == $addressMap1 , translates behind the scenes to $addressMap1.equal($addressMap2) . If this equal check is true that means one of the addresses is redundant and can be removed from the session. The address map that is removed is added to the report as a warning message.

Testing

Before we’ll continue with the rest of the rules, we’ll set up unit tests. We’ll still use a stateless session:

session = knowledgeBase.newStatelessKnowledgeSession();
session.setGlobal("legacyService",
newMockLegacyBankService());

Code listing 5: Section of the test setUpClass method (the DataTransformationTest file)

The legacyService global is set to a new instance of MockLegacyBankService. It is a dummy implementation that simply returns null from all methods. In most tests we’ll insert objects directly into the knowledge session (and not through legacyService ).

We’ll now write a helper method for inserting objects into the knowledge session and running the rules. The helper method will create a list of commands, execute them, and the returned object, BatchExecutionResults, will be returned back from the helper method. The following command instances will be created:

  • One for setting the global variable – validationReport; a new validation report will be created.
  • One for inserting all objects into the session.
  • One for firing only rules with a specified name. This will be done through an AgendaFilter. It will help us isolate a rule that we’ll be testing.

    org.drools.runtime.rule.AgendaFilter

    When a rule is activated the AgendaFilter determines if this rule can be fired or not. The AgendaFilter interface has one accept method that returns true/false. We’ll create our own RuleNameEqualsAgendaFilter that fires only rules with a specific name.

  • One command for getting back all objects in a knowledge session that are of a certain type – filterType method parameter. These objects will be returned from the helper method as part of the results object under a given key – filterOut method parameter.

The following is the helper method:

/**
* creates multiple commands, calls session.execute and
* returns results back
*/
protected ExecutionResults execute(Collection<?> objects,
String ruleName, final String filterType,
String filterOut) {
ValidationReport validationReport = reportFactory
.createValidationReport();
List<Command<?>> commands = new ArrayList<Command<?>>();
commands.add(CommandFactory.newSetGlobal(
"validationReport", validationReport, true));
commands.add(CommandFactory.newInsertElements(objects));
commands.add(new FireAllRulesCommand(
new RuleNameEqualsAgendaFilter(ruleName)));
if (filterType != null && filterOut != null) {
GetObjectsCommand getObjectsCommand =
new GetObjectsCommand( new ObjectFilter() {
public boolean accept(Object object) {
return object instanceof Map
&& ((Map) object).get("_type_").equals(
filterType);
}
});
getObjectsCommand.setOutIdentifier(filterOut);
commands.add(getObjectsCommand);
}
ExecutionResults results = session
.execute(CommandFactory.newBatchExecution(commands));
return results;
}

Code listing 6: Test helper method for executing the transformation rules (the DataTransformationTest file).

To write a test for the redundant address rule, two address maps will be created. Both will have their street set to “Barrack Street”. After we execute rules, only one address map should be in the rule session. The test looks as follows:

@Test
public void twoEqualAddressesDifferentInstance()
throws Exception {
Map addressMap1 = new HashMap();
addressMap1.put("_type_", "Address");
addressMap1.put("street", "Barrack Street");
Map addressMap2 = new HashMap();
addressMap2.put("_type_", "Address");
addressMap2.put("street", "Barrack Street");
assertEquals(addressMap1, addressMap2);
ExecutionResults results = execute(Arrays.asList(
addressMap1, addressMap2),
"twoEqualAddressesDifferentInstance", "Address",
"addresses");
Iterator<?> addressIterator = ((List<?>) results
.getValue("addresses")).iterator();
Map addressMapWinner = (Map) addressIterator.next();
assertEquals(addressMap1, addressMapWinner);
assertFalse(addressIterator.hasNext());
reportContextContains(results,
"twoEqualAddressesDifferentInstance",
addressMapWinner == addressMap1 ? addressMap2
: addressMap1);
}

Code listing 7: Test for the redundant address rule

The execute method is called with the two address maps, the agenda filter rule name is set to twoEqualAddressesDifferentInstance (only this rule will be allowed to fire), and after the rules are executed all maps of the Address type are returned as part of the result. We can access them by results.getValue(“addresses”). The test verifies that there is only one such map.

Another test helper method – reportContextContains verifies that the validationReport contains expected data. The implementation method, reportContextContains, is shown as follows:

/**
* asserts that the report contains one message with
* expected context (input parameter)
*/
void reportContextContains(ExecutionResults results,
String messgeKey, Object object) {
ValidationReport validationReport = (ValidationReport)
results.getValue("validationReport");
assertEquals(1, validationReport.getMessages().size());
Message message = validationReport.getMessages()
.iterator().next();
List<Object> messageContext = message.getContextOrdered();
assertEquals(1, messageContext.size());
assertSame(object, messageContext.iterator().next());
}

Code listing 8: Helper method, which verifies that report the contains a supplied object

Address normalization

Our next rule will be a type conversion rule. It will take a String representation of country and it will convert it into Address.Countryenum. We’ll start with a test:

@Test
public void addressNormalizationUSA() throws Exception {
Map addressMap = new HashMap();
addressMap.put("_type_", "Address");
addressMap.put("country", "U.S.A");
execute(Arrays.asList(addressMap),
"addressNormalizationUSA", null, null);
assertEquals(Address.Country.USA, addressMap
.get("country"));
}

Code listing 9: Test for the country type conversion rule

The test creates an address map with country set to “U.S.A”. It then calls the execute method, passing in the addressMap and allowing only the addressNormalizationUSA rule to fire (no filter is used in this case). Finally, the test verifies that the address map has the correct country value. Next, we’ll write the rule:

rule addressNormalizationUSA
dialect "mvel"
when
$addressMap : Map( this["_type_"] == "Address",
this["country"] in ("US", "U.S.", "USA", "U.S.A"))
then
modify( $addressMap ) {
put("country", Country.USA)
}
end

Code listing 10: Rule that converts String representation of country into enum representation(the dataTransformation.drl file)

The rule matches an address map. The in operator is used to capture various country representations. The rule’s consequence is interesting in this case. Instead of doing simply update( $addressMap ), the modify construct is being used. Modify takes an argument and a block of code. Before executing the block of code it retracts the argument from the rule session, then it executes the block of code, and finally the argument is inserted back into the session. This has to be done because the argument’s identity is modified. If we look at the implementation of HashMap equals or the hashCode method, they take into account every element in the map. By doing $addressMap.put(“country”, Country.USA), we change the address map identity.

Fact’s identity

As a general rule, do not change the object identity while it is in the knowledge session, otherwise the rule engine behavior will be undefined (same as changing an object while it is in java.util.HashMap).

Testing the findAddress rule

Before continuing, let’s write a test for the findAddress rule from the third rule in the Loading facts into the rule session section. The test will use a special LegacyBankService mock implementation that will return the provided addressMap .

public class StaticMockLegacyBankService extends
MockLegacyBankService {
private Map addressMap;
public StaticMockLegacyBankService(Map addressMap) {
this.addressMap = addressMap;
}
public List findAddressByCustomerId(Long customerId) {
return Arrays.asList(addressMap);
}
}

Code listing 11: StaticMockLegacyBankService which returns provided addressMap

StaticMockLegacyBankService extends MockLegacyBankService and overrides the findAddressByCustomerId method. The findAddress test looks as follows:

@Test
public void findAddress() throws Exception {
final Map customerMap = new HashMap();
customerMap.put("_type_", "Customer");
customerMap.put("customer_id", new Long(111));
final Map addressMap = new HashMap();
LegacyBankService service =
new StaticMockLegacyBankService(addressMap);
session.setGlobal("legacyService", service);
ExecutionResults results = execute(Arrays
.asList(customerMap), "findAddress", "Address",
"objects");
assertEquals("Address", addressMap.get("_type_"));
Iterator<?> addressIterator = ((List<?>) results
.getValue("objects")).iterator();
assertEquals(addressMap, addressIterator.next());
assertFalse(addressIterator.hasNext());
// clean-up
session.setGlobal("legacyService",
new MockLegacyBankService());
}

Code listing12: Test for the findAddress rule

The test then verifies that the address map is really in the knowledge session. It also verifies that it has the “_type_” key set and that there is no other address map.

Unknown country

The next rule will create an error message if the country isn’t recognizable by our domain model. The test creates an address map with some unknown country, executes rules, and verifies that the report contains an error.

@Test
public void unknownCountry() throws Exception {
Map addressMap = new HashMap();
addressMap.put("_type_", "Address");
addressMap.put("country", "no country");

ExecutionResults results = execute(Arrays
.asList(addressMap), "unknownCountry", null, null);
ValidationReport report = (ValidationReport) results
.getValue("validationReport");
reportContextContains(results, "unknownCountry",
addressMap);
}

Code listing 13: Test for the unknownCountry rule

The rule implementation will test if the country value from the addressMap is of the Address.Country type. If it isn’t, an error is added to the report.

rule unknownCountry
salience -10 //should fire after address normalizations
when
$addressMap : Map( this["_type_"] == "Address",
!($addressMap.get("country") instanceof
Address.Country))
then
validationReport.addMessage(
reportFactory.createMessage(Message.Type.ERROR,
kcontext.getRule().getName(), $addressMap));
end

Code listing 14: Rule that reports unknown countries (the dataTransformation.drl file)

The type checking is done with MVEL’s instanceof operator. Note that this rule needs to be executed after all address normalization rules, otherwise we could get an incorrect error message.

Currency conversion

As a given requirement, the data transformation process should convert all accounts to EUR currency. The test for this rule might look like the following code snippet:

@Test
public void currencyConversionToEUR() throws Exception {
Map accountMap = new HashMap();
accountMap.put("_type_", "Account");
accountMap.put("currency", "USD");
accountMap.put("balance", "1000");
execute(Arrays.asList(accountMap),
"currencyConversionToEUR", null, null);
assertEquals("EUR", accountMap.get("currency"));
assertEquals(new BigDecimal("780.000"), accountMap
.get("balance"));
}

Code listing 15: Test for the EUR conversion rule

At the end of the code snippet the test verified that currency and balance were correct. The exchange rate of 0.780 was used. The rule implementation is as follows:

rule currencyConversionToEUR
when
$accountMap : Map( this["_type_"] == "Account",
this["currency"] != null && != "EUR" )
$conversionAmount : String() from
getConversionToEurFrom(
(String)$accountMap["currency"])
then
modify($accountMap) {
put("currency", "EUR"),
put("balance", new BigDecimal(
$conversionAmount).multiply(new BigDecimal(
(String)$accountMap.get("balance"))))
}
end

Code listing 16: Rule that converts account balance and currency to EUR (the dataTransformation.drl file) .

The rule uses the default ‘java’ dialect. It matches on an account map and retrieves the conversion amount using the from conditional element. In this case it is a simple function that returns hardcoded values. However, it can be easily replaced with a service method that could, for example, call some web service in a real bank.

function String getConversionToEurFrom(String currencyFrom) {
String conversion = null;
if ("USD".equals(currencyFrom)) {
conversion = "0.780";
} else if ("SKK".equals(currencyFrom)) {
conversion = "0.033";
}
return conversion;
}

Code listing 17: Dummy function for calculating the exchange rate (the dataTransformation.drl file)

Notice how we’re calling the function. Instead of calling it directly in the consequence, it is called from a condition. This way our rule will fire only if the function returns some non-null result.

The rule then sets the currency to EUR and multiplies the balance with the exchange rate. This rule doesn’t cover currencies for which the getConversionToEurFrom function returns null. We have to write another rule that will report unknown currencies.

rule unknownCurrency
when
$accountMap : Map( this["_type_"] == "Account",
this["currency"] != null && != "EUR" )
not( String() from
getConversionToEurFrom(
(String)$accountMap["currency"]) )
then
validationReport.addMessage(
reportFactory.createMessage(Message.Type.ERROR,
kcontext.getRule().getName(), $accountMap));
end

Code listing 18: Rule that adds an error message to the report if there is no conversion for a currency (the dataTransformation.drl file)

Note that in this case the getConversionToEurFrom function is called from within the not construct.

One account allowed

Imagine that we have a business requirement that only one account from the legacy system can be imported into the new system. Our next rule will remove redundant accounts while aggregating their balances.

The test inserts two accounts of the same customer into the rule session and verifies that one of them was removed and the balance has been transferred.

@Test
public void reduceLegacyAccounts() throws Exception {
Map accountMap1 = new HashMap();
accountMap1.put("_type_", "Account");
accountMap1.put("customer_id", "00123");
accountMap1.put("balance", new BigDecimal("100.00"));
Map accountMap2 = new HashMap();
accountMap2.put("_type_", "Account");
accountMap2.put("customer_id", "00123");
accountMap2.put("balance", new BigDecimal("300.00"));
ExecutionResults results = execute(Arrays.asList(
accountMap1, accountMap2), "reduceLegacyAccounts",
"Account", "accounts");
Iterator<?> accountIterator = ((List<?>) results
.getValue("accounts")).iterator();
Map accountMap = (Map) accountIterator.next();
assertEquals(new BigDecimal("400.00"), accountMap
.get("balance"));
assertFalse(accountIterator.hasNext());
}

Code listing 19: Test for the reduceLegacyAccounts rule

Before we can write this rule we have to ensure that the Account instance’s balance is of the BigDecimal type. This is partially (non-EUR accounts) done by the currency conversion rules. For the EUR accounts a new rule can be written that simply converts the type to BigDecimal (we can even update the unknownCurrency rule to handle this situation).

rule reduceLegacyAccounts
when
$accountMap1 : Map( this["_type_"] == "Account" )
$accountMap2 : Map( this["_type_"] == "Account",
eval( $accountMap1 != $accountMap2 ),
this["customer_id"] ==$accountMap1["customer_id"],
this["currency"] == $accountMap1["currency"])
then
modify($accountMap1) {
put("balance", (
(BigDecimal)$accountMap1.get("balance")).add(
(BigDecimal)$accountMap2.get("balance")))
}
retract( $accountMap2 );
end

Code listing 20: Rule that removes redundant accounts and accumulates their balances (the dataTransformation.drl file)

The rule matches on two accountMap instances; it ensures that they represent two different instances (eval( $accountMap1 != $accountMap2) – note that eval is important here), which both belong to the same customer (this[“customer_id”] ==$accountMap1[“customer_id”]) and have the same currency (this[“currency”] == $accountMap1[“currency”]). The consequence sums up the two balances and retracts the second accountMap.

Note that the rule should fire after all currency conversion rules. This is creating dependencies between rules. In this case it is tolerable, as only a few rules are involved. However, with more complex dependencies we’ll have to introduce a ruleflow.

LEAVE A REPLY

Please enter your comment!
Please enter your name here