Here we will have a look at the string data type in Python and how you use it.
This page contains both some explanation of the topics we cover and some exercises for you to do. Read through the page and complete the exercises at the end. Spend any remaining time playing around on the code examples in the explanatory text. Remember that the key here is learning by doing. Type some of the code examples into your IDLE editor and try them out yourself — and be curious. Try and see what happens if you change things a bit.
Creating strings
Strings are pretty much what you would expect — strings of characters like words or DNA sequences or pretty much whatever you can think of as a sequence of characters.
You create a string by putting it in quotes, either " or ' quotes.
"hello world" 'hello world'
Indexing strings
To get the ith element of a string, you can index it as s[i]. Indexing starts at zero, so s[0] is the first element, and if the string is n elements long, the last element is s[n-1].
You can index from the back by using negative numbers, so s[-1] is the last element in the string, s[-2] is the second last element and so on.
You cannot modify a string, though, so you cannot write
s[i] = 'x'
Slicing strings
You can get the sub-string starting at index i and ending at index j-1 using something called slicing, that looks like this s[i:j]. This can be handy at times.
String methods
Operations on strings are through something called "methods", which are essentially functions bundled with the data type. They differ from functions only in the way they are called. Where you would call a function on a string like this:
function(s)
you call a method on a string like this:
s.method()
so the only difference is really that you use the "dot notation" that you have seen for modules.
The clever thing about methods, as opposed to functions, is that you are explicit about which object (the string) you are manipulating, since it is part of the syntax for calling the method. Methods are an important part of something called Object Oriented Programming, a technique that has revolutionized programming back in the eighties and is pretty much dominant in programming these days. Unfortunately we do not have time in this course to cover Object Oriented Programming, so that is all I will say about it, except that you must use this different notation for manipulating strings.
Some of the most useful string methods are:
- s.split(sep) — splits up the string s into a list of strings. The split will be on occurrences of sep in s. If you do not provide sep, the split will be on white spaces.
- s.startswith(ss) — tests if s starts with the sub-string ss.
- s.join(L) — create a string from a list of strings, L, by putting s in between each element in L, so ''.join(['a','b','c']) will give you the string 'abc' and '-'.join(['a','b','c']) will give you the string 'a-b-c'.
There are plenty more, though go see for yourself.
Exercise
Write a function pairWiseDifferences(sequence1, sequence2) that computes the proportion of bases that differ between two DNA sequences of the same length. sequence1 and sequence2 are strings. Example usage:
print pairWiseDifferences("AGTC" "AGTT") 0.25
Remember that range(len(sequence1)) generates a list of the indexes you can use to index the string. So you need a for loop in your function and a variable that keeps track of how many differences you see have seen as you iterate through the sequences. This is the variable the function returns.