These are programming techniques that are generically applicable to most computer languages.
I spent a lot of May 2021 acting as a Section Leader in Stanford University's "CodeInPlace" project, which aimed to teach Python remotely to a very large number of students world-wide (about 12,000), staffed largely by volunteers. It was a great experience and I am posting here some of the general advice I gave to my students.
The word “Bug” suggest that we are suffering from little gremlins that crawl into our code and jam up the works - and that it really is not our fault. (Indeed, in the very early days of computing there are records of insects crawling into mechanical relays and causing breakdowns.) Let us be honest: we are the ones introducing the mistakes into the code.
A defect is a latent design error, sitting there in the code, waiting to be activated. (It may never happen - it requires the right conditions to occur.)
Unfortunately the cause of the failure and the actual failure are usually not in the same place in the code. In a "CodeInPlace Section Room" I demonstrated what happens when you force Python to make an error and give a “traceback” which shows the code line where the error was detected. As it happens, this was a this was line that was unable to cope when we gave it a character string that could not be converted in to a number. (Perhaps not really an error at all: we wrote a program that was supposed to deal only with numbers.) In general, however, the actual root cause of a program failure may be in code that is a long way away from where the failure is reported because the error has set up a condition that will later cause a conflict. (This is why the course leaders keep going on about thinking in terms of pre- and post-conditions.) We leave a rake out on the lawn in the afternoon, but it is not until we step on it in the dark that it smacks us in the face.
So, every computer scientist and software engineer knows in his heart that the best way to eliminate bugs is not to make the design mistake in the first place. This is much easier said than done, so all programmers need to learn to diagnose the cause of program failures.
This is detective work: firstly we need to identify the condition that produced the failure, then we must deduce how that condition came about, working back to source of the original error. This often involves using our mind to try to trace in detail how the computer responds to every individual statement in our program.
The code designer, however, always finds it difficult to see faults in their own work. It takes a mental effort to make yourself accept that your code is probably full of errors.
Many of us find that asking a colleague for help breaks the blockage. We start explaining our program to the colleague and not infrequently see the answer to our problem before we have finished the explanation. Some software houses actually practice “pair-programming” with just one shared computer screen between two programmers: one typing, one doing continuous critical assessment.
Sometimes, unfortunately, we are working on our own, so what do we do to force a different mindset? I often find that it helps to take a break and do something else for a while (go for a walk?) returning with a fresh viewpoint.
A possibly apocryphal story once appeared on one of the American professional computing journals, of a Californian programmer who took his dog to work, because he found that explaining code to his dog was just as effective as talking to a human. (Cats, by the way, are useless. They are clearly not interested, at most they just sit on the keyboard until you supply food, whereas dogs pay attention and look as though they understand.) When the dog died he just pinned a photo of the dog beside his computer and claimed that worked almost as well. This story has appeared in various forms since - including “Rubber Duck Debugging”, as described in a well-known and excellent book called The Pragmatic Programmer. This version suggests that explaining your code to a rubber duck living by your screen is also highly effective. Make of that what you will: the point is that we can use verbalisation to force our mind to take a fresh viewpoint.
If I have a strong suspicion about where the error may be occurring, I may run a modified version of the code in which I have placed print() statements that show me the values held in the computer memory, or to indicate which way a conditional statement (if …) actually goes. A more sophisticated version of this is actively checking pre and post conditions around the area where you suspect an error. This works quite well for shortish programs, such as those answering teaching exercises, but with real-life codes you can end up with unmanageable amount of output in which you cannot find what you need.
Professionals therefore find that is worth becoming familiar with debugging tools that are part of “IDEs” (Interactive Development Environments). These allow one to step through the code line by line and actually watch what it does. We can also, for example, pause execution and look at the values being held in memory and see whether our pre- and post-conditions are actually holding. They are better than the “print()” method because we can tell the tool to run the code normally until it reaches a certain line, or enters a certain function, for example. It then pauses for us to take control.
I spent a lot of May 2021 acting as a Section Leader in Stanford University's "CodeInPlace" project, which aimed to teach Python remotely to a very large number of students world-wide (about 12,000), staffed largely by volunteers. It was a great experience and I am posting here some of the general advice I gave to my students.
Programming is always a battle against human psychology. First we have to force our minds into thinking logically and analytically but also creatively in order to design and build our “castle in the air”, then, if we really want to be sure that it works, we have to do our best to destroy it with testing.
A conference keynote speaker in my hearing once described his best tester as “a mean, bitter and twisted individual intent on humiliating his colleagues”. It was intended as hyperbole to make the point that creators and breakers are often different types of people and both are valuable.
Here are a few bits of practical advice in order to get into the right testing mindset.
Accept that your program almost certainly contains errors. A competent and experienced professional programmer has probably failed to find about one mistake per 100 lines at the point were he or she thinks that they have done a decent job. Novices typically do not do that well. (In order to get taken on as "CodeInPlace" Section Leaders we had to to do a “spot the errors - explain the corrections” exercise in a student submissions from the previous year. I got four in six lines. Programming tutors sometimes report examples with more than one error per line of code. That takes talent - of an unusual sort.)
First test each and every function definition separately: We usually start from the bottom up, with functions that do not call on other functions. We write another function whose sole purpose is to call the function under test with a representative set of input parameters intended to exercise its capabilities, particularly with “corner cases” as described below. Clearly stated pre- and post-conditions are a great help when defining test cases. (Don't know what pre- and post-conditions are? Look it up! It is important.)
This bottom up focus is called “unit” testing. When the bottom level functions are working correctly we move up a layer, and so on. Leave testing the program as a whole till last and it will probably go quite smoothly because you know you can trust the individual components. If you try to start from the top you will likely come across an error somewhere deep in your function hierarchy and probably end up writing unit test cases anyway in order to find it.
Look at Typical Cases: we can often anticipate how our software will be used (especially if it for our own use). If we find the errors that affect the typical usage, then our users will mostly have a good experience. (There are ways of doing this in a statistically quantifiable way if you need to do that.)
Look at “Corner Cases”: if you think your code ought to work with floating point numbers from zero upwards, test it at least once with exactly 0.0. and then something which is just very slightly larger than zero. Do the same at the high end of the range. For example, multiplying a number by itself ought to be very straightforward, but if you give your program a very large number, close to the maximum number it can represent, the square will be a value beyond the computer’s ability to represent. Find out what happens. (It is actually a good idea to write down in documentation that “this code will work for numbers up to…”.)
Of course, many functions have more than one input parameter, say, x and y in z=my_calc(x,y), where both x and y will each have valid ranges for their input (say something like 10.0>x >=0.0 and 50.0 >=y >= 0.0). The “corner cases” are values such as x = 0.0 and y=0.0 at the same time. The values x=10.0 and y = 0.0 is another corner case. Experience shows that corners hide bugs, so get in there and do a good sweep.
Look at loops: Are the any conditions in which a loop-code-block does not get executed at all? Is this OK? (Sometimes it is, sometimes it is not. Check by seeing if you can set up a test case that proves it one way or the other.) Test the typical run through the loop (not the first or the last iteration). General experience suggests that first and last time round a loop are, however, more likely to reveal errors.
Look at Conditional Statements: can you devise a test case that forces the conditional test to go both ways (or every way with an if-elif-elif-else block). Errors often arise when you have not anticipated all the possibilities. (I use Venn diagrams to check my logic and then try to deduce how I can land in each of the separate regions of the diagram.)
Is there any code you feel that perhaps you do not quite understand? Examine its behaviour with test cases. You may also want to examine its actually behaviour by stepping through the section using an interactive debugging tool, such as you get as part of an Interactive Development Environment, such as PyCharm.
Keep your test cases: you will want to use them every time you change your code. (This is one of the uses for that strange line “if __name__ == __main__” at the bottom of the code file. We can call test functions from there. You will learn about this later in the course, I believe.) You should also update the test cases when you change the code.
Learn to use a code-coverage tool: you run your test cases under the control of one of these tools and it monitors the execution to count how many times each lines of code has been executed while you are do the testing. Most modern compilers have these tools built-in: you just have to read the documentation. With Python, they usually come as part of an Interactive Development Environment, such as PyCharm. The first feedback from using such a tool tends to be salutary, showing that your testing is much less complete than you think. In my own experience, people who give me a “tested” code for review have usually missed about 1/3 of the lines. That includes me. I then look at the bits which have been missed (usually an “if” block has never been covered) and work out how to change the input data to make the next execution go down that path.
Ideally, get someone else to do the testing: select someone who gets their kicks from breaking code, preferable someone with a pedantic, lawyerly type of mind who looks at the way you have described the purpose of the function and tries to find the loopholes in the contract. In professional teams, programmers do a lot of the “unit” testing themselves (that is looking at the individual functions) but “integration testing” (the program as a whole) may well be done independently. Recreational programmers working on their own do not have this option, of course. Nevertheless, with practice it is possible to achieve to some degree a personality switch into the testing mindset. Good luck with that.
I spent a lot of May 2021 acting as a Section Leader in Stanford University's "CodeInPlace" project, which aimed to teach Python to a very large number of students world-wide (about 12,000), remotely, and staffed largely by volunteers. It was a great experience and I am posting here some of the general advice I gave to my students.
First, let’s take a step back and think about the nature of programming. The fundamental problem of programming is dealing with complexity: even comparatively modest programs can be more complicated than you are able to understand all at once. In our training course we might, at first, write programs that we can comprehend as a whole. In general, however, if you double the size of a program you more than double the amount information that you have to hold in your mind, because you also need to understand the connections between different parts of your program.
Imagine you are doing some gardening, and you carelessly leave the rake lying on the lawn, points up. Later that evening, in the dark, you go out to find the cat and step on the rake…. That is a long range connection between things you did this morning and consequences that come to pass as a result of your actions in the evening. Long-range connections in programs are the sort of places errors like to lurk - because if there are too many and they are too obscure we can’t remember them all and take them into account when we add new code. Hence, the craft of programming (and the term craft is used advisedly) is largely about controlling connections so you allow only those that are strictly necessary and you make them as transparent, understandable and memorable as possible.
The are several mental disciplines that help us to do this in programming: decomposition, for example, is a “divide and conquer” approach. We saw this in our Karel hospital build, when we could write a “helper” function that could be called to build the hospital at the current location of the robot. The other part of the decomposition was searching for beepers on the bottom row to location where we needed to put hospitals.
Pre- and Post-conditions are another discipline that helps us to be sure that the “helper” functions are going to do exactly the right thing when they are invoked. I like to think of them as contracts which define the interaction between the helper function and the party invoking the function. When you call the helper, you just want to know that it will achieve a certain outcome, and you don’t necessarily need to know exactly how it achieves that outcome. It is all the same to you. (For example, our hospitals are 3 beepers high by 2 wide. We could place these up column 1 then down column 2, or by doing row 1, then 2 and 3. The final effect is the same - as long as we get the robot back to the bottom of column 2.) The description of the final effect of the function (but not how it was achieved) is the post-condition.
From the view point of the helper-function, however, its interest in the contract is ensuring that it has the right conditions to start work. What does the client have to give me at the start in order for me to successfully carry out the work? That is the pre-condition of the contract. What must I give back to the client at the end? That is the post-condition.
There is rarely only one way of dividing up your program using pre- and post-conditions to connect different parts. Setting up good contracts is a matter of design choice: What are you going to do? What am I going to do? As a matter of experience, however, if the contracts seem to be difficult to write (that is, overly complex themselves) you have probably made the wrong design choices.
Learning to make the right design choices without large amounts of trial and error takes experience, and this is what makes expert programmers. It does not come quickly. Most of this experience applies completely independently of the programming language that you are employing. You can master the syntax of a new computing language in 40 or 50 hours of practice but mastering the art of expressing algorithms clearly and economically is a lot harder.
One of the problems, however, that you will experience is that stating pre- and post-conditions with precision, using the English language, in itself adds a layer of difficulty. English is often ambiguous, and it is easy to leave stuff out. That is why legal contracts seem to be written is such convoluted language. Engineers who have to get it right (e.g. those working on nuclear reactor protection systems) learn to describe their software using mathematical statements, and it is quite salutary to find, when start from informal English, how much you have actually left out.
It is, however, important to remember that something is better than nothing: I usually find that the effort of trying to construct even an informal pre-condition statement makes me think about what I am missing. As with legal contracts, you have to try to cover all contingencies or you will at some point be caught out by the loophole.
All this is especially important if you are working with other people: they need to know how to deliver on the implied commitments. Even if, however, you never envisage doing that, you should remember the likelihood that a year or two down the road you yourself might want to modify one of your programs. You may well conclude that you are dealing with the work of a stranger! (Much of my own professional software turned out to be surprisingly long-lived: there are programs I constructed over 30 years ago still being used every day.)
“It’s not what you do, it’s the way that you do it! That’s what gets results!” says the classic Jazz-era song.
In programming exactly what you do is of primary importance, but the way that you do it - the style - becomes increasingly important as you progress, especially when working with other people. You are probably already aware that the style in which you present reports has an important impact on the reader: it can make the difference between getting your point across on a first read-through, or making your audience struggle to understand. Good style in programming helps you and the people you work with to understand your programs.
One of the important insights of Guido van Rossum, the inventor of Python, was that programs need to be read by humans as well as computers, because we can find many potential faults, just by reading the code, before they cause problems in execution. We do this by understanding what the code actually says and comparing it to what we intended.
A guiding principle underlying the design of Python is human understandability (the syntax is much less complicated and more logical than some other widely used languages): but it still needs help from good style. I always keep in mind a quote from a famous Oxford computer scientist, C.A.R.Hoare, who was a strong advocate of avoiding, rather than fixing errors:
There are two methods in software design. One is to make the program so simple, that there are obviously no errors. The other is to make it so complicated, that there are no obvious errors.
So, it behoves us to make our programs as readable and understandable as possible, and in that way we are likely make fewer errors. This is not just academics speaking here! I spend nearly 40 years working software in the nuclear industry, where we really needed to trust our programs. We paid attention to style as well as content because it meant we got more reliable software with less development time.
Programming languages are formal, in the sense that when they are interpreted on a computer they have exactly one precise meaning, and that meaning is the external effect that they have. Our major difficulty as programmers is to reach the same understanding of this meaning as the computer. It is a struggle with human psychology because, on the whole, we are not very good at logical thought, and there are a limited number of things we can hold in our mind at one time. So, the essential elements of good style are making the logic of the program as transparent as possible, and arranging the structure of the code so that, when reading through it, we only need to keep in mind a small number of concepts at one time. (These are, of course, exactly the same guidelines you would follow to construct an academic essay in natural language.) Confused minds lead to confusing code.
There are, however, two levels to software style. The first is relatively easy to grasp, and it involves adding visual clues in the way we write code to aid the human reader’s understanding. Some of it is just agreeing to use the same relatively arbitrary conventions. So, the widely respected document, PEP8 (https://www.python.org/dev/peps/pep-0008/), recommends using lower-case and underscores in function definitions, as in
def my_helper_function():
but recommends using “camel-case” in other circumstances, as in :
class MyClassDefinition:
The computer does not really care about the difference, but when we see in the code:
x = my_helper_function()
we should be able to grasp immediately that these statements are doing different things. (Classes are a more advanced programming concept that are probably not covered in this Python course.) There is nothing to prevent you doing it the other way round - other than the possibility of confusing another Python programmer looking at your code.
The second part of style is more subtle: it is an issue of being selective about which parts of the language you are going to use and for what purpose. There are features in the Python grammar (such as Dictionaries) that if used correctly allow you to replace with one statement, dozens of lines of code that would be required in another language. This is less likely to contain an error because it says precisely and compactly what you want to say. Expert programmers build up a repertoire of algorithm “lego-blocks” which they stick together into complicated assemblies, and know enough about the language they are using to choose the most compact way of expressing the algorithm in that language. (In fact, expert programmers generally know several computing languages and may choose to employ the one in which it is easiest to express their ideas on a particular project.) Good designs look as simple as possible.
I nearly always write everything twice: the first is my prototype, where I discover the important design challenges. The second version rationalises my design. (“Plan to throw one away!” said Fred Brooks in his classic book on software engineering management “The Mythical Man Month”. )
One of best ways to acquire the right taste is reading code written by acknowledged experts. Just as with natural language, you pick up good style by reading good authors.
One important aspect of style is the use of comment lines, which are, of course, completely ignored by the computer and are only of interest to human code readers. (Please remember: if you come back to some code that you constructed even a year ago, you will probably feel that you are reading something written by a stranger. Nothing evaporates faster than understanding of a computer program.)
Bad comments are worse than no comments because they mislead the reader. (So, when you change the code you change the comments. It is essential.)
Comments that simply describe in natural language what is more precisely described by the code itself are redundant and merely add visual clutter. The code itself says how it is done, we say what will be done.
I spent a lot of May 2021 acting as a Section Leader in Stanford University's "CodeInPlace" project, which aimed to teach Python to a very large number of students world-wide (about 12,000), remotely, and staffed largely by volunteers. It was a great experience and I am posting here some of the general advice I gave to my students
I spent a lot of May 2021 acting as a Section Leader in Stanford University's "CodeInPlace" project, which aimed to teach Python remotely to a very large number of students world-wide (about 12,000), staffed largely by volunteers. It was a great experience and I am posting here some of the general advice I gave to my students.
I used to believe that the purpose of testing was eliminating errors from my code before I released it into the wild. It turned out I was wrong. About 20 years ago, I found myself on a training workshop lead by Beverley Littlewood, a distinguished professor of software engineering at London’s City University, and was forced to go through the extremely painful process of changing my mind. Unfortunately, Professor Littlewood had extensive statistical evidence that supported his alternative view in a highly convincing way.
As it happens, there are organisations, such as IBM and major telecommunications suppliers, that develop software which routinely keeps very careful records of every single error that has ever occurred in their systems before and after release. For these companies software errors can have major impact and turn into public embarrassment, so any method of reducing errors gives them direct benefits. Hence, in a project from some years ago, they turned their databases over to Littlewood for careful statistical analysis of the effectiveness of testing.
Littlewood's conclusions seemed highly counter intuitive to me at first - but the data did not lie. He discovered that if the system testers find, say, 100 faults prior to release, it is likely that another 100 will eventually turn up during the system’s production service. On the other hand, if the testers found 1000 prior to release, then it is likely that a further 1000 would turn up during service.
Surely, you might think, if the testers work harder and find more problems, there are therefore likely to be fewer problems in the released code?
The alternative viewpoint, promoted by Prof Littlewood, is that in any real software project there is a finite amount of resource that will be allocated to testing, debugging and fixing problems. (Most software project managers budget for it to be about 25% of total project costs, and once it starts to climb to 50% they cross their fingers and ship the product regardless. Hence the demise of the Lotus spreadsheet system and company.)
Furthermore, the nature of software errors is such that at least 50% of errors are of a kind that are just extremely difficult to find using typical testing techniques. (They might be subtle errors in design assumptions that will only get revealed in extraordinary unanticipated circumstances. If the testers are making the same wrong assumption they will not produce the tests that probe those assumption.) Therefore, the number of errors found during the testing phase, in a reasonable length of time, is a measure of the quality of the construction process, a statistical indication of the rate at which this team has been introducing design errors into this product. Testing is therefore analogous to typical quality control on a production line, where we take a sample of the products for detailed examination (the quality people do not needs examine every single object coming off the production line to determine whether the overall quality is satisfactory).
It is, in fact, fairly easy to convince yourself that any software which is of a size required to do a useful job (say more than a few hundred lines of code) can never be tested in a way that will prove that the system is completely free from errors. You have to do that in a completely different way - using mathematics.
Every time you use a conditional statement (“if” and “while” in Python) you make a fork in the way the program execution may flow. Whether you get an error on either of these paths will depend on the state of the computer memory at the time. (You may, for example, get a divide-by-zero along one path if a certain variable has the value zero.) When you have completed an if-else block, the condition of the computer memory will be different, depending on which path you took (or else why did you have the branch-point?). That means that each time you introduce a branch point you double the number of potential memory states and the number of possible execution paths through the code. A fault is a combination of a particular memory state acting together with the specific code on a particular path, so we are doubling the number of potential faults at every branch.
Conditionals statements turn up quite frequently in code, perhaps every 10 lines, so in a program of 1000 lines (a very modest code by modern standards) there may be 100 such statements, and if you executed a million test cases every second for the age of the Universe you will still be a factor of a billion short on testing every possible execution path in your short program.
In fact, unless you are very systematic in the way you build test cases you would probably find that your first attempts at a suite of test cases probably did not even force every line of your program to get executed at some point. (There are tools that let you check this - most people’s unsystematic first attempts get up to perhaps only 60-70%. They are always surprised when you face them with the evidence.) Optimistic testing is one of the reasons why so much consumer facing software appears to fail so often.
High quality software testing mixes a number of different approaches, including getting inside the minds of the users, to anticipate the way they will use your product - which may not be the way you intended. (Yes, sometimes we even watch them use the stuff.) Ultimately, we want their experience of errors to be sufficiently infrequent that they will not cause inconvenience and harm. Engineering is not about perfection: it is about delivering something that is good enough at the right price and the right time.
It is hard to test your own code, because it is hard to step outside the assumptions you have made about the way the code-users will understand and apply the product. (It helps, of course, if your only user is yourself.) When things really matter we do not just get a colleague to test our code, it may even be given to an independent organisation who has a “code-breaking” mindset.
The trouble is that most of us are tempted to stop testing when we stop finding errors - but we are probably just not looking in all the right places. Well trained professionals use specialist monitoring tools to check whether our tests have at some point exercised every line of code, and forced every conditional statement to go both ways. (There are free tools available for Python - such as PyCharm.) This is still a long way from covering every possible path through the code, but I assure you that when I look at the test cases offered by the author of a program it is fairly rare to find that they even manage to execute more than 70% of the code lines, and the coverage of conditional branching is usually much lower. The statistics are sobering - but you will meet quite a lot of consumer-facing code in App stores that is of this level of quality. It explains a lot.
So, we sometimes use other software tools that take code in at one end, examine it, and write test cases out at the other which will automatically exercise a much larger fraction of the system’s behaviour. You may be satisfied with writing a few dozen test cases; doing things this more sophisticated way may give you 50,000 tests or so.
The real challenge here, known as the “Oracle” problem, is working out whether the tests are giving the right or wrong results. If we already know all the answers, why would we need the code? In fact, you can do various things that help a lot, but surveying them would take too long here. There are plenty of books! (One technique, for example, is working things backward: quite often, particularly in maths and physics, if the software gives an answer it can be relatively easy to check that it actually solves the specific problem you defined.)
The take home message: good testing is much harder than you think - even if you already think it is hard!