20 min read

In this article by Debasish Ray Chawdhuri, author of the book Java 9 Data Structures and Algorithms, will take a deeper look into the following ideas:

  • Measuring the performance of an algorithm
  • Asymptotic complexity
  • Why asymptotic complexity matters
  • Why an explicit study of algorithms is important

(For more resources related to this topic, see here.)

The performance of an algorithm

No one wants to wait forever to get something done. Making a program run faster surely is important, but how do we know whether a program runs fast? The first logical step would be to measure how many seconds the program takes to run. Suppose we have a program that, given three numbers, a, b, and c, determines the remainder when a raised to the power b is divided by c.

For example, say a=2, b=10, and c = 7. A raised to the power b = 210 = 1024, 1024 % 7 = 2. So, given these values, the program needs to output 2. The following code snippet shows a simple and obvious way of achieving this:

public static long computeRemainder(long base, long power, long divisor){ 
  long baseRaisedToPower = 1;
  for(long i=1;i<=power;i++){ 
    baseRaisedToPower *= base;
  }
  return baseRaisedToPower % divisor;
}

We can now estimate the time it takes by running the program a billion times and checking how long it took to run it, as shown in the following code:

public static void main(String [] args){
  long startTime = System.currentTimeMillis();
  for(int i=0;i<1_000_000_000;i++){
    computeRemainder(2, 10, 7);
  }
  long endTime = System.currentTimeMillis();
  System.out.println(endTime - startTime);
}

On my computer, it takes 4,393 milliseconds. So the time taken per call is 4,393 divided by a billion, that is, about 4.4 nanoseconds. Looks like a very reasonable time to do any computation. But what happens if the input is different? What if I pass power = 1,000? Let’s check that out. Now it takes about 420,000 milliseconds to run a billion times, or about 420 nanoseconds per run. Clearly, the time taken to do this computation depends on the input, and that means any reasonable way to talk about the performance of a program needs to take into account the input to the program.

Okay, so we can say that the number of nanoseconds our program takes to run is 0.42 X power, approximately.

If you run the program with the input (2, 1000, 7), you will get an output of 0, which is not correct. The correct output is 2. So, what is going on here? The answer is that the maximum value that a long type variable can hold is one less than 2 raised to the power 63, or 9223372036854775807L. The value 2 raised to the power 1000 is, of course, much more than this, causing the value to overflow, which brings us to our next point: how much space does a program need in order to run?

In general, the memory space required to run a program can be measured in terms of the bytes required for the program to operate. Of course, it requires the space to store the input and the output, at the least. It may as well need some additional space to run, which is called auxiliary space. It is quite obvious that just like time, the space required to run a program also, in general, would be dependent on the input.

In the case of time, apart from the fact that the time depends on the input, it also depends on which computer you are running it on. The program that takes 4 seconds to run on my computer may take 40 seconds on a very old computer from the nineties and may run in 2 seconds in yours. However, the actual computer you run it on only improves the time by a constant multiplier. To avoid getting into too much details about specifying the details of the hardware the program is running on, instead of saying the program takes 0.42 X power milliseconds approximately, we can say, the time taken is a constant times the power, or simply say it is proportional to the power.

Saying the computation time is proportional to the power actually makes it so non-specific to hardware, or even the language the program is written in, that we can estimate this relationship by just looking at the program and analyzing it. Of course, the running time is sort of proportional to the power because there is a loop that executes power number of times, except, of course, when the power is so small that the other one-time operations outside the loop actually start to matter.

Analysis of asymptotic complexity

We seem to have hit upon an idea, an abstract sense of the running time. Let’s spell it out. In an abstract way, we analyze the running time of and the space required by a program by using what is known as the asymptotic complexity.

We are only interested in what happens when the input is very large because it really does not matter how long it takes for a small input to be processed; it’s going to be small anyway. So, if we have x3 + x2, and if x is very large, it’s almost the same as x3. We also don’t want to consider constant factors of a function, as we have pointed out earlier, because it is dependent on the particular hardware we are running the program on and the particular language we have implemented it in. An algorithm implemented in Java will perform a constant times slower than the same algorithm written in C. Finally, we want to consider our measure as an upper bound. Why? Because, even the complexity can vary based on the particular input we have chosen and not just the size of the input, and in that case we are only interested in its worst case performance. If we are going to boast about the performance of our awesome algorithm, we would better be covering its worst case performance.

Asymptotic upper bound of a function

Keeping in mind all the the above concerns, we dive into how exactly we can define an asymptotic upper bound.

For a function f, we define the notation O, called big O, in the following ways:

  1. f(x) = O(f(x)).
  2. For example, x3 = O(x3).
  3. If f(x) = O(g(x)), then k f(x) = O(g(x)) for any non-zero constant k.
  4. For example, 5×3 = O(x3) and 2 log x = O(log x) and -x3 = O(x3) (taking k= -1).
  5. If f(x) = O(g(x)) and |h(x)|<|f(x)| for all sufficiently large x, then f(x) + h(x) = O(g(x)).
  6. For example, 5×3 – 25×2  + 1 = O(x3) because for sufficiently large x, |- 25×2 + 1| = 25×2 – 1 is much less that | 5×3| = 5×3. So, f(x) + g(x) = 5×3 – 25×2 + 1 = O(x3) as f(x) = 5×3 = O(x3).
  7. We can prove by similar logic that x3 = O( 5×3 – 25×2 + 1).
  8. if f(x) = O(g(x)) and |h(x)| > |g(x)| for all sufficiently large x, then f(x) = O(h(x)).
  9. For example, x3 = O(x4), because if x is sufficiently large, x4 > x3

Note that whenever there is an inequality on functions, we are only interested in what happens when x is large; we don’t bother about what happens for small x.

To summarize the above definition, you can drop constant multipliers (rule 2) and ignore lower order terms (rule 3). You can also overestimate (rule 4). You can also do all combinations for those because rules can be applied any number of times.

We had to consider the absolute values of the function to cater to the case when values are negative, which never happens in running time, but we still have it for completeness.

There is something about the sign = that is not usual. Just because f(x) = O(g(x)), it does not mean, O(g(x)) = f(x). In fact, the last one does not even mean anything.

It is enough for all purposes to just know the preceding definition of the big O notation. You can read the following formal definition if you are interested. Otherwise you can skip the rest of this subsection.

The preceding idea can be summarized in a formal way. We say the expression f(x) = O(g(x)) means there exist positive constants M and x0 such that |f(x)| < M|g(x)| whenever x > x0. Remember that you just have to find one example of M and x0 that satisfy the condition, to make the assertion f(x) = O(g(x)).

To see that it’s the same thing as the previous four points, first think of x0 as the way to ensure that x is sufficiently large. I leave it up to you to prove the above four conditions from the formal definition.

I will, however, show some examples of using the formal definition:

  • 5x2 = O(x2) because we can say, for example, x0 = 10 and M = 10 and thus f(x) < Mg(x) whenever x > x0, that is, 5x2 < 10x2 whenever x > 10.
  • It is also true that 5×2 = O(x3) because we can say, for example, x0 = 10 and M = 10 and thus f(x) < Mg(x) whenever x > x0, that is, 5×2 < 10×3 whenever x > 10. This highlights a point that if f(x) = O(g(x)), it is also true that f(x) = O(h(x)) if h(x) is some function that grows at least as fast as f(x).
  • How about the function f(x) = 5x2 – 10x + 3? We can easily see that when x is sufficiently large, 5x2 will far surpass the term 10x. To prove my point, I can simply say x>5, 5x2 > 10x. Every time we increment x by one, the increment in 5x2 is 10x + 1 and the increment in 10x is just a constant, 10. 10x+1 > 10 for all positive x, so it is easy to see why 5x2 is always going to stay above 10x as x goes higher and higher.

In general, any polynomial of the form anxn + an-1xn-1 + an-2xn-2 + … + a0 = O(xn). To show this, we will first see that a0 = O(1). This is true because we can have x0 = 1 and M = 2|a0|, and we will have |a0| < 2|a0| whenever x > 1.

Now, let us assume it is true for any n. Thus, anxn + an-1xn-1 + an-2xn-2 + … + a0 = O(xn). What it means, of course, is that there exists some Mn and x0 such that |anxn + an-1xn-1 + an-2xn-2 + … + a0 | < Mnxn whenever x>x0. We can safely assume that x0 >2, because if it is not so, we can simply add 2 to it to get a new x0 , which is at least 2.

Now, |anxn + an-1xn-1 + an-2xn-2 + … + a0| < Mnxn implies |an+1xn+1 + anxn + an-1xn-1 + an-2xn-2 + … + a0| ≤ |an+1xn+1| + |anxn + an-1xn-1 + an-2xn-2 + … + a0| < |an+1xn+1| + Mnxn.

If we take Mn+1= |an+1| + Mn, we can see that Mn+1 xn+1 = |an+1| xn+1 + Mn xn+1 =|an+1 xn+1| + Mn xn+1> |an+1 xn+1| + Mn xn > |an+1 xn+1 + anxn + an-1xn-1 + an-2xn-2 + … + a0|.

That is to say, |an+1 xn+1 + an-1xn-1 + an-2xn-2 + … + a0 |< Mn+1 xn+1 for all x > x0, that is, an+1 xn+1 + anxn + an-1xn-1 + an-2xn-2 + … + a0 = O(xn+1).

Now, we have it true for n=0, that is, a0 = O(1). This means, by our last conclusion, a1x + a0 = O(x). This means, by the same logic, a2 x2 + a1x + a0 = O(x2), and so on. We can easily see that this means it is true for all polynomials of positive integral degrees.

Asymptotic upper bound of an algorithm

Okay, so we figured out a way to sort of abstractly specify an upper bound on a function that has one argument. When we talk about the running time of a program, this argument has to contain information about the input. For example, in our algorithm, we can say, the execution time equals O(power). This scheme of specifying the input directly will work perfectly fine for all programs or algorithms solving the same problem because the input will be the same for all of them. However, we might want to use the same technique to measure the complexity of the problem itself: it is the complexity of the most efficient program or algorithm that can solve the problem. If we try to compare the complexity of different problems, though, we will hit a wall because different problems will have different inputs. We must specify the running time in terms of something that is common among all problems, and that something is the size of the input in bits or bytes. How many bits do we need to express the argument, power, when it’s sufficiently large? Approximately log2 (power). So, in specifying the running time, our function needs to have an input that is of the size log2 (power) or lg (power). We have seen that the running time of our algorithm is proportional to the power, that is, constant times power, which is constant times 2 lg(power) = O(2x),where x= lg(power), which is the the size of the input.

Asymptotic lower bound of a function

Sometimes, we don’t want to praise an algorithm, we want to shun it; for example, when the algorithm is written by someone we don’t like or when some algorithm is really poorly performing. When we want to shun it for its horrible performance, we may want to talk about how bad it performs even for the best input. Asymptotic lower bound can be defined just like how greater-than-or-equal-to can be defined in terms of less-than-or-equal-to.

A function f(x) = Ω(g(x)) if and only if g(x) = O(f(x)). The following list shows a few examples:

  • Since x3 = O(x3), x3 = Ω(x3)
  • Since x3 = O(5×3), 5×3 = Ω(x3)
  • Since x3 = O(5×3 – 25×2 + 1), 5×3 – 25×2 + 1 = Ω(x3)
  • Since x3 = O(x4), x4 = O(x3)

Again, for the interested read for those of you who are interested, we say the expression f(x) = Ω(g(x)) means there exist positive constants M and x0 such that |f(x)| > M|g(x)| whenever x > x0, which is the same as saying |g(x)| < (1/M)|f(x)| whenever x > x0, that is, g(x) = O(f(x)).

The above definition was introduced by Donand Knuth, which was a stronger and more practical definition used in computer science. Earlier, there was a different definition of the lower bound Ω that is more complicated to understand and covers a few more edge cases. We will not talk about edge cases here.

For an algorithm, we can use lower bound to talk about its best performance the same way we had used the upper bound to specify the worst performance.

Asymptotic tight bound of a function

There is another kind of bound that sort of means equality in terms of asymptotic complexity. A theta bound is specified as f(x) = Ͽ(g(x)) if and only if f(x) = O(g(x)) and f(x) = Ω(g(x)). Let’s see some examples to understand this even better:

  • Since 5×3=O(x3) and also 5×3=Ω(x3), we have 5×3=Ͽ(x3)
  • Since 5×3 + 4×2=O(x3) and 5×3 + 4×2=Ω(x3), we have 5×3 + 4×2=O(x3)
  • However, even though 5×3 + 4×2 =O(x4), since it is not Ω(x4), it is also not Ͽ(x4)
  • Similarly, 5×3 + 4×2 is not Ͽ(x2) because it is not O(x2)

In short, you can ignore constant multipliers and lower order terms while determining the tight bound, but you cannot choose a function which grows either faster or slower than the given function. The best way to check whether the bound is right is to check the O and the condition separately, and say it has a theta bound only if they are the same.

Note that since the complexity of an algorithm depends on the particular input, in general, the tight bound is used when the complexity remains unchanged by the nature of the input.

In some cases, we try to find the average case complexity, especially when the upper bound really happens only in the case of an extremely pathological input. But since the average must be taken in accordance with the probability distribution of the input, it is not just dependent on the algorithm itself. The bounds themselves are just bounds for particular functions and not for algorithms. However, the total running time of an algorithm can be expressed as a grand function that changes it’s formula as per the input, and that function may have different upper and lower bounds. There is no sense in talking about an asymptotic average bound because, as we discussed, the average case is not just dependent on the algorithm itself, but also on the probability distribution of the input. The average case is thus stated as a function that would be a probabilistic average running time for all inputs, and, in general, the asymptotic upper bound of that average function is reported.

Optimization of our algorithm

Fixing the problem with large powers

Equipped with all the toolboxes of asymptotic analysis, we will start optimizing our algorithm. However, since we have already seen that our program does not work properly for even moderately large values of power, let’s first fix that. There are two ways of fixing this; one is to actually give the amount of space it requires to store all the intermediate products, and the other is to do a trick to limit all the intermediate steps to be within the range of values that the long datatype can support. We will use binomial theorem to do this part.

If you do not remember, binomial theorem says (x+y)n = xn + nC1xn-1y + nC2xn-2y2 + nC3xn-3y3 + nC4xn-4y4 + … nCn-1x1yn-1 + yn for positive integral values of n. Suppose, r is the remainder when we divide a by b. This makes a = kb + r true for some positive integer k. This means r = a-kb, and rn = (a-kb)n.

If we expand this using binomial theorem, we have rn = an – nC1 an-1.kb + nC2an-2.(kb)2 – nC3an-3.(kb)3 + nC4an-4.(kb)4 + … nCn-1a1.(kb)n-1 ± (kb)n.

Note that apart from the first term, all other terms have b as a factor. Which means that we can write rn = an + bM for some integer M. If we divide both sides by b now and take the remainder, we have rn % b = an % b, where % is the java operator for finding the remainder.

The idea now would be to take the remainder by the divisor every time we raise the power. This way, we will never have to store more than the range of the remainder:

public static long computeRemainderCorrected(long base, long 
power, long divisor){
  long baseRaisedToPower = 1;
  for(long i=1;i<=power;i++){
    baseRaisedToPower *= base;
    baseRaisedToPower %= divisor;
  }
  return baseRaisedToPower;
}

This program obviously does not change the time complexity of the program; it just fixes the problem with large powers. The program also maintains a constant space complexity.

Improving time complexity

The current running time complexity is O(2x), where x is the size of the input as we have already computed. Can we do better than this? Let’s see.

What we need to compute is (basepower) % divisor. This is, of course, same as (base2)power/2 % divisor. If we have an even power, we have reduced the number of operations by half. If we can keep doing this, we can raise the power of base by 2n in just n steps, which means our loop only has to run lg(power) times, and hence, the complexity is O(lg(2x)) = O(x), where x is the number of bits to store power. This is a substantial reduction in the number of steps to compute the value for large powers.

However, there is a catch. What happens if the power is not divisible by 2? Well, then we can write (basepower)% divisor = (base ((basepower-1))%divisor = (base ((base2)power-1)%divisor, and power-1 is, of course, even and the computation can proceed. We will write up this code in a program. The idea is to start from the most significant bit and move towards less and less significant bits. If a bit with 1 has n bits after it, it represents multiplying the result by the base and then squaring n times after this bit. We accumulate this squaring by squaring for the subsequent steps. If we find a zero, we keep squaring for the sake of accumulating squaring for the earlier bits:

public static long computeRemainderUsingEBS(long base, long power, 
long divisor){
  long baseRaisedToPower = 1;
  long powerBitsReversed = 0;
  int numBits=0;

First reverse the bits of our power so that it is easier to access them from the least important side, which is more easily accessible. We also count the number of bits for later use:

 

while(power>0){
    powerBitsReversed <<= 1;
    powerBitsReversed += power & 1;
    power >>>= 1;
    numBits++;
  }

Now we extract one bit at a time. Since we have already reversed the order of bit, the first one we get is the most significant one. Just to get an intuition on the order, the first bit we collect will eventually be squared the maximum number of times and hence will act like the most significant bit:

 

while (numBits-->0){
    if(powerBitsReversed%2==1){
      baseRaisedToPower *= baseRaisedToPower * base;
    }else{
      baseRaisedToPower *= baseRaisedToPower;
    }
    baseRaisedToPower %= divisor;
    powerBitsReversed>>>=1;
  }
  return baseRaisedToPower;
}

We test the performance of the algorithm; we compare the time taken for the same computation with the earlier and final algorithms with the following code:

public static void main(String [] args){
  System.out.println(computeRemainderUsingEBS(13, 10_000_000, 7));

  long startTime = System.currentTimeMillis();
  for(int i=0;i<1000;i++){
    computeRemainderCorrected(13, 10_000_000, 7);
  } 
  long endTime = System.currentTimeMillis();
  System.out.println(endTime - startTime);

  startTime = System.currentTimeMillis();
  for(int i=0;i<1000;i++){
    computeRemainderUsingEBS(13, 10_000_000, 7);
  }
  endTime = System.currentTimeMillis();
  System.out.println(endTime - startTime);
}

The first algorithm takes 130,190 milliseconds to complete all 1,000 times execution on my computer and the second one takes just 2 milliseconds to do the same. This clearly shows the tremendous gain in performance for a large power like 10 million. The algorithm for squaring the term repeatedly to achieve exponentiation like we did is called… well, exponentiation by squaring. This example should be able to motivate you to study algorithms for the sheer obvious advantage it can give in improving the performance of computer programs.

Summary

In this article, you saw how we can think about measuring the running time of and the memory required by an algorithm in seconds and bytes, respectively. Since this depends on the particular implementation, the programming platform, and the hardware, we need a notion of talking about running time in an abstract way. Asymptotic complexity is a measure of the growth of a function when the input is very large. We can use it to abstract our discussion on running time. This is not to say that a programmer should not spend any time to run a program twice as fast, but that comes only after the program is already running at the minimum asymptotic complexity.

We also saw that the asymptotic complexity is not just a property of the problem at hand that we are trying to solve, but also a property of the particular way we are solving it, that is, the particular algorithm we are using. We also saw that two programs solving the same problem while running different algorithms with different asymptotic complexities can perform vastly differently for large inputs. This should be enough motivation to study algorithms explicitly.

Resources for Article:


Further resources on this subject:


LEAVE A REPLY

Please enter your comment!
Please enter your name here