My debugging and bug prevention strategies

Programmers, such as myself, frequently encounter bugs in our program code. Sometimes, we are able to resolve them quickly, sometimes it take hours. Worse still, after resolving the current set of bugs, another pops up!

As a programmer, I have experienced the pain of debugging my program code, for hours on end, not knowing what is wrong with my logic. Over time, I have developed a debugging toolbox that has proven to be effective for myself. I have to mention though, that this only applies to Competitive Programming, not professional programming, where circumstances could vary greatly. Another point to note is, my discussion is more skewed towards C++ programmers so issues like segmentation fault may not apply to languages such as Java.

The 5 most common bugs/errors that I have seen are (in no particular order):

1. Segmentation fault.
2. Integer overflow
3. Division by zero
4. Strange output values with multiple test cases
5. Logic Errors

No. 1 is mostly easy to deal with as it usually involves an out-of-bounds access, null-pointer access or stack-overflow. I would usually start by checking my loop conditions and fitting assertions at critical points in my code where I think there may be such faults. Furthermore, stack-overflow is mostly caused by infinite recursion. This can be mitigated by verifying that your recursion will always reach one of its base cases. Fitting your code with assertions would also be a good strategy.

The worst situation that I have seen in this category is stack-overflow due to insufficient stack space allocation. For example, we try to do DFS to compute some properties of a tree. The recursion might get so deep that it overflows! I think that this can be avoided in Linux/MAC by using the command "ulimit -s unlimited". On the other hand, I don't usually worry about this situation as it rarely occurs.

No 2. is trivial to spot. Competitive programming tasks often ask for sums of elements, which may run into huge values! The solution to this is to use long long data type. However, from my experience, it is not advisable to use long long for every variable in your program because it might strangely cause your program to get the TLE verdict. Here is an example of this (I find this strange too!). Hence, I often append a 1LL*_____ or 0LL+______ in front of my calculation statements if I need to calculate long long values. These 2 expressions will convert 32-bit integers to 64-bit integers automatically so that I don't have to store 32-bit integers as long long types.

On another note, integer overflow may also occur when we accidentally add "infinity" to another value. For example, we want to compute shortest paths. The classical relax function would be:

if(shortestpathto[a] + d(a,b)  < shortestpathto[b]) relax(a,b)

This seems correct. But what if shortestpath[a] was set to the largest value that would fit into an integer type (e.g. INT_MAX)? The expression on the left hand side of the inequality would become negative and unexpected stuff would follow... Thus, I will always check whether a value could be infinity before using it in an arithmetic operation.

No. 3 is a mistake that people usually make when they don't consider constraints carefully. For example, given a problem that requires us to print the answer for 1/x where -10 <= x <= 10. A hasty programmer would naively print 1/x. I am sure you can see where the edge case is!

No. 4 occurs when we forget to reset data structures and important variables. Thus, it is important to check that our variables are all cleared/reset in every iteration when we are dealing with problems that involve multiple test cases.

No. 5 has infinitely many causes and is the most difficult to resolve. I combat this in 3 ways:

1. Fuzzing my algorithm with random test cases
2. Using AC (accepted) solutions to get sample answers
3. Questioning my assumptions

It is sometimes easy to generate random test cases for problems. I am usually too lazy to write a generator so I will turn to SPOJ's random test case generator. It generates a lot of different random stuff such as graphs, trees, arrays, etc. So it would be really useful during a contest where you have internet available. Do note, however, that this method works only if you have a method to obtain the correct answers to the generated test cases (e.g. naive but correct brute force solution). If you don't have such a method, the next point would be helpful.

Use AC solutions to provide correct outputs as a last resort where it is difficult to compute the correct answer for test cases (e.g. optimization problems). Find the smallest test case (using random test cases) where your solution fails and that should give you some direction in your debugging. Of course, this only applies to Competitive Programming, which is why I limited my scope at the start of this article.

Last but not least, when all else fails, question your assumptions point by point. Humans love to make assumptions (unconsciously). Hence, we often overlook some detail. In order to weed out elusive bugs, we have to question each and every assumption unless they are obviously true! For baseless assumptions, we have to devise a formal/rigorous proof to prove their correctness. Conversely, we could try to devise counterexamples. While counterexamples don't prove correctness of a method, they could still provide us with some clues about why our logic fails.

With these measures in place, we are in a better position to isolate errors and fine-tune our solutions.

I hope that this article has been informative. Do post comments if you have any questions for me.

Comments

Popular posts from this blog

USACO - Broken Necklace

HackerRank - Search in a 2-D grid

Efficient learning in Competitive Programming