Measuring SOA Complexity

7 min read

In this article, we play the number game once more. Let’s build a formula to calculate the complexity of processes. The purpose is not to teach how to design good processes (the earlier chapters tackled that), but to ‘score’ them on their control-flow complexity and flag those that exceed a particular threshold. Processes with lower scores are more readable, maintainable, and testable than those with higher scores. Processes with high scores need rework!

Most SOA developers, today, struggle with process complexity. They write impeccable, highly-structured Java or C# code, but their processes tend to branch out and meander in every possible direction. They would never dream of embedding an if-then inside a for loop inside a while loop inside of a catch inside a do-while in Java, but they are quick to bury, deep in an SOA process, a pick inside a sequence inside a flow inside of switch inside a scope. Their processes are often far too complex.

On the other hand, when they review each others’ processes, they know intuitively what ‘too complex’ means. Process (b) in the following figure, for example, appears much more complex than Process (c), because it has far too many arrows and is difficult to navigate. Process (c) looks tidier and more structured by comparison. Process (a) is harder to judge. Although it is well-structured and easy to navigate, it is also absurdly long; having so many steps in a sequence is poor design.

In this article, we quantify ‘complex’. We build a formula that rules in favor of Process (c), and penalizes (a) for its excessive sequence and (b) for its nesting and excessive branching. In addition, we demonstrate that processes designed in ‘flat form’ (introduced in Chapter 6) score lower than ‘naïve’ processes.

Applying McCabe’s Formula for BPEL and TIBCO BusinessWorks

In this section, we study McCabe’s formula for complexity, and describe how it can be used to measure the complexity of processes in BPEL and BusinessWorks.

Calculating McCabe Complexity

The best-known measure of programmatic complexity is Thomas McCabe’s cyclomatic complexity. In his landmark paper, published in 1976 (A Complexity Measure, IEEE Transactions on Software Engineering, v. SE-2, no. 4, pp. 308—320, http://www.literateprogramming.com/mccabe.pdf), McCabe shows how to score the complexity of computer programs (FORTRAN is McCabe’s preferred language) using concepts from graph theory. McCabe’s approach is twofold:

He shows how to represent a computer program as a directed graph, with nodes representing steps and arrows representing the flow of control between these steps. The program must have exactly one entry point, from which all nodes are reachable, and exactly one exit point, which is reachable from all nodes. If the program has several modules or procedures, each is represented as a separate directed graph.
The complexity of this graph, and thus the complexity of the computer program, is E – N + 2P, where E is the number of edges, N the number of nodes, and P the number of modules. Assuming the program is a single unit with no separate modules, its complexity is E ‑ N + 2. Alternatively, the complexity of the program is A + 1, where A is the number of decisions or alternate paths in the graph; if a node branches in D directions, it has 1 normal path and D-1 alternate paths. This second method is easier to calculate: start with 1, then for each decision node add the number of outgoing branches less one. For example, add 1 for each binary decision, 2 for each ternary decision, and so on.

Process (a) in the figure above has 16 edges and 17 nodes, so its complexity is 16 – 17 + 2, or 1. Alternatively, it has no decisions, so its complexity is 0 + 1, or 1. Process (b) has 23 edges and 10 nodes, and thus, has a complexity of 23 – 10 + 2, or 15. Alternatively, in process (b), node D3 has one alternate path (that is, it is a binary decision), D1, D4, and D5 each have two alternative paths, D6 has three alternate paths, and D2 has four alternate paths; the total number of alternate paths is thus 1 + 2 + 2 + 2 + 3 + 4, or 14. So the complexity of Process (b) is 14 + 1, or 15. Process (c) has 19 edges and 14 nodes, and thus a complexity of 19 – 14 + 2, or 7; alternatively, it has six alternate paths—five for D1, one for D2—so its complexity is 6 + 1, or 7.

McCabe advocates the use of his cyclomatic measure on actual software projects. The team should agree on an acceptable upper bound for complexity (say 20), score each program, and decide how to rework those that score too high.

McCabe’s measure is not perfect. For one thing, it does not penalize processes for having too many nodes. The rather preposterous sequence of 17 activities in Process (a) in the preceding figure has a perfect McCabe score of 1. A process consisting of a million consecutive activities, or even one with as many activities as there are particles in the universe, would also score 1, and would pass review!

Secondly, McCabe does not penalize for nested branching. The two processes shown in the following figure have the same complexity, although the process on the bottom is intuitively more complex than that on the top. Each process scores 7, because each contains three ternary (or 3-way) decisions: D1, D2, and D3. In the top process, those decisions come consecutively, whereas in the bottom, they are nested three levels deep.

McCabe Complexity for BPEL

Despite its flaws, McCabe’s formula forms a part of our scoring mechanism for SOA processes; it is a component of a larger measure we develop in the last section of this article. As a pre-requisite for that discussion, we now consider how to apply the McCabe score to processes developed in BPEL and in TIBCO’s BusinessWorks.

BPEL processes are mapped to McCabe’s directed graph form as follows:

A BPEL sequence maps easily to a line of nodes, such as fragment shown in the next figure, where the BPEL sequence of activities A, B, and C is drawn as three nodes—A, B, and C—with arrows from A to B and from B to C. A sequence does not add to the complexity of the process.
A BPEL pick, flow, or switch is represented as an N-ary decision in the following figure. The beginning and end points of the structure are designated by the nodes labeled Split and Join. For a pick, the branches are onMessage or onAlarm handlers. For a switch, the branches are case or otherwise structures. For a flow, the branches are the activities to be run in parallel (A, B, and C in the figure). (If the flow has inter-activity links, the links are represented as edges.) A pick, switch, or flow adds N-1 to the overall complexity.
A BPEL while activity is represented as a GOTO-style loop. The loop begins with a conditional check called Test (as shown in the following figure) and then branches either to A (to perform the while activity if the condition is true), or out of the loop to End. When A completes, it loops back to Test for another iteration. As it contains one binary decision, the while structure adds 1 to the complexity.
Error handling is dicier. A single unnested scope with a single error handler and N basic activities (at any level within the scope) adds as much as N to the complexity, assuming each of those basic activities might encounter the error that the handler is meant to catch. The reason for this is that each basic activity must have a binary choice whether to continue down the happy path or route directly to the handler. The scope shown in process (a) in the following figure, contains activities A, B, and C in a sequence (Sc denotes the start of the scope, End its end), but each of these activities has an error path to the handler Error, thus adding 3 to the complexity. In cases with nested scopes and multiple handlers, things get more complicated. An example of this is shown in Process (b). The outer scope, bounded by Sc1 and End1, has two handlers: E11, which is accessible from activity A; and E12, accessible from E. The inner scope, bounded by Sc2 and End2, has its own two handlers, E21 and E22, both of which are accessible from the activities B and C. The E21 handler, in turn, can throw an exception to the outer handler E11. The complexity introduced by all of this is 7: one for the decision at A, one for the decision at E, one for the decision at E21, and two each for the decisions at B and C.