Testing your code

Testing using the doctest module

You should always test your code, since this is the only way to make sure that it does what you intend it to do.

There is a rich theory for how to write test for software, but for your use there isn't that much to it.

What you have to think about when testing your code is just this: Does the code work in typical cases, and does it work in special cases. The first part is the easy part. You have an idea about what a function should do, and if you call it with some typical input you expect it to give you the appropriate output. You will do this kind of test as you write the function, so you don't have to think too much about it.

Testing for the behaviour in special cases is a bit more tricky, because you first have to identify special cases.

Testing different cases

In your first hand-in, you had to solve linear equations. There are three cases for this: when there are no real roots, when there is one, and when there are two. You could call any of these "special" cases or "common" cases, depending on how you look at it, the important observation is just that there are three cases. You have not tested your code until you have run it with all three cases.

Another example is the factorial function you wrote at one of the exercises. We want this function to compute the factorial of all non-negative numbers, n. When we wrote the function recursively, we identified two different cases: the base case, where we do not call the function recursively, and the recursive case, where we do.

def factorial(n):
    "Calculate the factorial for all n >= 0."
    if n == 1:
        return 1 # the base case
    else:
        return n * factorial(n - 1) # the recursive case

When we call the function, we will always eventually hit the base case (if it completes at all) so you might think that you only need to test it on the "common" case, but there is a special case that we will not necessarily test here (and in fact the code above is incorrect, but we won't see this if we only call it on a common case).

Can you see the problem?

We want the function to work for all n >= 0, but if we call it with 0 it will not work; it will call the recursive case again and again until we grow impatient and kill the program.

Whenever you want a function to work at a sub-interval of numbers, always make sure that you call it with the endpoints of that interval. The special case for n >= 0 is 0, and if you haven't called it with 0, you don't know that it will work for that special case. The function above will not!

When you work with strings or lists, obvious special cases are empty lists and empty strings. If the function should work on those cases, you have to make sure that it does, by testing it with calling it on those cases.

Automatic tests

A test is only worth anything if you actually run it. You might run it when you are writing the function you are testing, but if you chance the function later on, you also have to make sure that all the tests you made when you wrote it the first time still works.

The best way to achieve this, is to have a script that runs all the tests. When you modify a function later on, you can just run the script, and all the tests will be run again. If your modifications broke any of the functionality, the tests will tell you.

Automating tests using the doctest module

There is a way that you can combine documentation of your functions with automatic testing.

The way you do this, is to add test cases to your documentation string. Then the module doctest, that is part of Python, will be able to run all your tests for you automatically.

The way it works is that you write, on a separate line, the string ">>> " and then the function call and on the line below, the value the function is supposed to return. For the factorial function, it could look like this:

def factorial(n):
    """Computes the factorial for all n >= 0.

    >>> factorial(0)
    0

    >>> factorial(4)
    24
    """
    if n <= 1:
        return 1
    else:
        return n * factorial(n-1)

This documents how the function is supposed to be called and what the expected result should be, and at the same time it gives doctest something to work with.

To run the doctest tests, you need to import the module and then call the testmod() function from it.

The easiest way, is to add these lines at the bottom of your file:

if __name__ == "__main__":
    import doctest
    doctest.testmod()

The if statement says that the tests should only be run when you run this particular file as a script, not when you include it as a module, and the two lines below then runs the tests.

Try it out for the factorial above. You will see that it identifies a problem — a test that fails — so you should try to fix this and now see that the tests are passed.

The rest of this learning path will be on writing tests for functions you wrote in previous learning paths.

Testing your string functions

Write tests for the functions you wrote in the strings learning path. Think about which special and common cases to test, add the tests to the documentation string, and run them with doctest. If your tests fail, find out why, and change your function (or test) until you get the right behavior.

The exercises are copied in below, with example usage that should reflect some of the common cases, but usually not any special cases.

Palindromes

A palindrome is a string that is spelled the same way backwards and forwards.

Write a function, isPalindrome(s), that returns True if s is a palindrome and False otherwise.

An approach to this is to run through s from the first to the middle character and for each character check if the character is equal to the character at the same index from the right rather than the left. Remember that the first character of a string is at index 0 and the last at index -1, the second character is at index 1 and the second last at index -2 and so forth.

Example usage:

print isPalindrome('abba')
True
print isPalindrome('foo')
False

Word count

Write a function, wordCount(s), that counts the number of words in a string, that is, the number of space-free sub-strings.

Example usage:

print wordCount('foo bar')
2

Useful methods:

split()

Re-formatting text

Write a function, reformat(s), that takes a string, s, and replaces all whitespaces (newlines, spaces, tabs) with a single space.

Example usage:

s = """
foo
   bar
baz"""
print reformat(s)
foo bar baz

Useful methods:

split()
join()

Finding overlapping occurrences

The method s.count(ss) counts the number of occurrences of sub-string ss in s

print "abcabc".count("abc")
2

but it does not count overlapping occurrences, so although there are two "aa" sub-strings in "aaa" (one starting at index zero and one starting at index one) it will only count one

print "aaa".count("aa")
1

Write a function, count(s,ss), that counts the number of overlapping occurrences of ss in s.

You can get the length of the sub-string using the function len(s)

print len("aa")
2

and you can use slicing to get the sub-strings at each index from s.

print "aaa"[0:2]
aa

Headers from FASTA files

Download the file utils.py.

It contains a single string containing three DNA segments in the FASTA format.

import utils
print utils.fasta_str
>foo
acttct
>bar
actcct
>baz
acttgt

Write a function, fastaHeaders(s), that extracts the headers from such a string. A header in the FASTA format is any line starting with the character >.

Example usage:

print fastaHeaders(utils.fasta_str)
>foo
>bar
>baz

Useful methods:

s.split() — the lines are separated by what is called the "new line" character, '\n', and s.split('\n') will split s into its separate lines.
s.startswith(ss)

Testing your list functions

Write tests for the functions you wrote in the lists learning path. Think about which special and common cases to test, add the tests to the documentation string, and run them with doctest. If your tests fail, find out why, and change your function (or test) until you get the right behavior.

The exercises are copied in below, with example usage that should reflect some of the common cases, but usually not any special cases.

Remove multiples of a number

Write a function, removeMultiples(L,n), that returns a list of the elements of the list L that are not multiples of n. Remember that you can use the % operator to check if a number m is a multiple of n.

Example usage:

print removeMultiples([2, 3, 7, 5, 8, 9, 6], 3)
[2, 7, 5, 8]

Useful methods:

append()

Sieve of Eratosthenes

Write a function, primes(n), that returns a list of all primes p such that p <= n.

You can use the following algorithm, called Sieve of Eratosthenes: Start with a list of all numbers from 2 up to and including n. We will manipulate the list as follows: take out the first, k, number and keep it in your list of primes to return later. Remove all multiples of k in the list, since these cannot be primes. The first element in the list will always be a prime if we follow this strategy, because we have removed all elements that are multiples of smaller numbers. When your initial list is empty, you are done, and you can remove all those elements that were, at one point, the first element in the list.

Example usage:

print primes(30)
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29]

Useful methods:

append()
reverse()
pop()

Remove duplicates

Write a function, removeDuplicates(L), that takes a list L of numbers and returns a list R with the same numbers, but if L contains the same number more than once, R should only contain it once.

You can approach this problem in two different ways, and I suggest you try both.

Sort the list and now any possible duplicates in the list will be neighbors, so you can just iterate through the list and remove elements that are next to their next neighbor.

Put the elements in as keys in a dictionary, D. Any duplicates will not appear as duplicates when they are keys, so getting the D.keys() will give you the list you want. Depending on which technique you use, you might end up modifying the input string using the sort approach. This is because sorting modifies the list, and it also means that if you call the method with a list, that list will be modified outside of the function as well.

This might seem a bit different from what you are used to, where changing a variable in a function doesn't change it outside, but this is because something else is going on. When you assign to a variable, you make the name of the variable point to the object that is the value. Assigning to a name in a function doesn't change what names outside the function are referring to. But if two names refer to the same object, like a list, and you modify the object in some way — like sorting the list — the object has changed and both names referring to the object will see a changed object.

An alternative to using a dictionary for the second approach is to use a set — and would usually be the right choice over dictionaries — but since we are not covering sets in this course you can just use a dictionary. You can read more of that here.

Example usage:

print removeDuplicates([1,2,2,1,4,2])
[1, 2, 4]

Useful methods:

append()
sort()

Testing your dictionary functions

The exercises are copied in below, with example usage that should reflect some of the common cases, but usually not any special cases.

Count letters

Write a function, countLetters(x), that, given a string x, returns a dictionary that maps each letter to the number of occurrences of that letter in x.

Example usage:

print countLetters('mississippi')
{'i': 4, 'p': 2, 's': 4, 'm': 1}

Anagrams

Write a function, isAnagram(x, y), that takes two strings as input and returns True if x and y are anagrams and False otherwise.

Example usage:

print isAnagram('saltine', 'entails')
True