Brought to you by molecularsciences.org.
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 License.
This publication may not be redistributed without this notice.

Python

Why learn python? Because it packs a powerful punch. Python is is easy to learn, user-friendly, highly extensible and overall a very powerful language. Like Java, it is fully object-oriented and it is as fast as C++. It allows scripting. I find that development in Python is more rapid than C++ or Java.

Python is free and fully supported by Linux, Windows and MacOS.

Who is using python? Google. Need I say more?

Getting Started

To install on Linux:
$ yum install python
or
$ sudo apt-get python
or
install from source. To install from source, download python from http://www.python.org/

To test if the installation worked, type python on the terminal. You would get the following prompt:

Python 2.4.3 (#1, Jan 21 2009, 01:10:13)
[GCC 4.1.2 20071124 (Red Hat 4.1.2-42)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>

type 2+2 after the >>>

>>> 2 + 2

If you get 4 as the response, python is installed and working properly

To install on windows:
Download python from http://www.python.org/ and double click to install. When installation is complete, set the system path to the installation directory.

  1. Click on Start > Control Panel > System > Change Settings > Advanced > Environment Variables
  2. Choose Path from bottom window
  3. Click on Edit
  4. append ;c:\Python26 to the end of the line (assuming python installed at c:\Python26)

To test the installation, open the command prompt. If you do not see command prompt in the list of installed programs, search for cmd or click on start > run and type cmd.
type python in the command prompt
when you get a message followed by >>>, type 2+2
If you get 4 as the response, python is installed and working properly

If you just type python on the command line, you get >>> which allows you to run your scripts immediately. To exit python command prompt, type:

quit()

If you need to run large programs, type them with a text editor such as notepad, editplus, vi, gedit, emacs that does not add formatting to your text and simply save the file with a .py extension. They type python followed by the your python file name.

A few words about Python Syntax

  1. # is comment
  2. a value can be assigned to several variables simultaneously
  3. variables must be defined (a value assigned)
  4. PEMDAS rule Applies
  5. integers are converted to float in mixed float and integer operations
  6. int(), float(), long() are conversion functions

Python Example 1

# calculates volume of a box
# simultaneously value assignment
length = height = width = 5
volume = length * height * width
print volume		# 125
# integer + float = float
print volume + 0.0	# 125.0

Python Example 2

# PEMDAS Parenthesis Exponent Multiplication Division Addition Subtraction
print (2+2*2)/3

Python Example 3

a = 2
b = 3.0
print float(a)	# 2.0
print int(b)	# 3

Python Strings

  1. Python supports both single and double quotes
  2. backslash (\) is used to escape characters
  3. when in the last position, backslash is a string continuation character
  4. \n is the newline character, \t is the tab character
  5. Strings surrounded with """ or ''' do not need to be escaped
  6. + concatenates string
  7. * multiplies string
  8. string_name[] is used for substring
  9. len() returns string length
  10. u creates a Unicode string
  11. r creates a raw string i.e. ignores \n, \r, etc.

Python String Example 1

print 'my code'			# my code
print 'Mike\'s code'	# Mike's code
print "Mike\'s code"	# Mike's code
# "Veni, Vidi, Vici", Julius Cesar
print '"Veni, Vidi, Vici", Julius Cesar'
# "Veni, Vidi, Vici", Julius Cesar
print "\"Veni, Vidi, Vici\", Julius Cesar"

Python String Example 2

print """
Veni, Vidi, Vici
- Julius Cesar
"I came, I saw, I conquered"
"""

Python String Example 3

s = "Dividing a very \
long line.\n\
Another line."
print s

Python String Example 4

# Julius Cesar
print 'Julius ' + 'Cesar'
# Muslims have to say 'I do' three times to get married
print 'I do! ' * 3

Python String Example 5

s = 'Veni, Vidi, Vici'
# fourth character, index starts with 0
print s[3]
# start from first character, print 4 characters
print s[0:4]
# start from sixth character, print till end
print s[6:]
# print the first 4 charcters
print s[:4]
# print the last character
print s[-1]

Python String Example 6

s = 'Veni, Vidi, Vici'
print len(s)	# 16

Python Lists

Python Lists Example 1

a = ['zero', 1, 'two', 3]
print a		# ['zero', 1, 'two', 3]
print a[-1]		# 3
print a[:2]		# ['zero', 1]
a += ['four']	# add an element to the list
print a
a[2] = 2			# change the value of an element
print a				
a[:] = []			# delete all elements in the list
print a

Python Lists Example 2

a = ['zero', 1, 'two', 3]
print len(a)	# gives the number of elements in the list

Python Lists Example 3: list.count(x)

a = [1,2,2,4,2,3,4,4,'a','c','a']
print a.count(2)		# 3
print a.count(4)		# 2
print a.count('a')		# 2
print a.count('hello')	# 0
# the number 2 appears 3 times in the list
# the number 4 appears twice in the list
# the letter a appear twice in the list
# hello is not present in the list

Python Lists Example 4: list.insert(i,x)

a = [1,2,3,4,'a','c','a']

# insert element at index 3
a.insert(3, 'new')
print a		# [1,2,3,'new',4,'a','c','a']

# insert element at the start of the list
a.insert(0, 'nouveau')
print a		# ['nouveau',1,2,3,'new',4,'a','c','a']

# insert element at the end of the list
a.insert(len(a), 3000) 
print a		# ['nouveau',1,2,3,'new',4,'a','c','a',3000]

Python Lists Example 5: list.append(x)

# adds an element to the end of the list
a = [1,4,3,5,'a','c','a']
a.append(96)
print a		# [1,4,3,5,'a','c','a',96]

Python Lists Example 6: list.index(i)

# index function returns index of the first item x
a = [1,2,3,4,'a','c','a']
print a.index(3)	# 2
print a.index('a')	# 4
# note: index of the second 'a' is skipped

Python Lists Example 7: list.remove(x)

# remove function removes the first item x
a = [1,2,3,4,'a','c','a']
a.remove(2)
print a		# [1,3,4,'a','c','a']
a.remove('a')
print a		# [1,3,4,'c','a']
# note: the second 'a' is still in the list
a.remove('a')
print a		# [1,3,4,'c']

Python Lists Example 8: list.extend(list)

# appends another list to the list
a = [1,4,3,5,'a','c','a']
b = [8,9,'z']
a.extend(b)
print a		# [1,4,3,5,'a','c','a',8,9,'z']

Python Lists Example 9: list.reverse()

# reverses the list
a = [1,4,3,5,'a','c','a']
a.reverse()		
print a		# ['a','c','a',5,3,4,1]

Python Lists Example 10: list.sort

# sorts the list
a = [1,4,3,5,'a','c','a']
a.sort()	
print a		# [1,3,4,5,'a','a','c']

Equivalent functions
Adding an element to the end of the list

a.insert(len(a),5)
a.append(5)
a[len(a):] = 5

Appending a list to another

a.extend(b)
a[len(a):] = b

Python Stacks and Queues

In Python, lists can be used as stacks and queues. Stacks are like a box of pringles; the last chip to be placed inside the box is the first one to be taken out. This is called Last In First Out (LIFO). A queue is like the line up at the bus stop. The first person to get in the line is the first person to get on the bus. This is called First In First Out (FIFO).

Python Stacks

# list a is the stack
a = [1,2,3]
a.append(4)
print a
print a.pop()		# 4
print a.pop()		# 3

Python Queues

from collections import deque
queue = deque([1,2,3])
queue.append(4)					# deque([1,2,3,4])
queue.append(5)					# deque([1,2,3,4,5])
print queue.popleft()			# 1
print queue						# deque([2,3,4,5])

Python Conditional Statements

Python Conditional Statements Example 1

x = 1
if x > 0:
	print 'positive'

Python Conditional Statements Example 2

x = 1
if x < 0:
	print 'negative'
else:
	print 'positive'

Python Conditional Statements Example 3

x = 0
if x < 0:
	print 'negative'
elif x == 0:
	print 'zero'
else:
	print 'positive'

Python Iteration Statements

Python Iteration Statements Example 1

for i in range(10):
	print i

Python Conditional Statements Example 2

for i in [0,1,2,3,4,5,6,7,8,9]:
	print i

Python Conditional Statements Example 3

a = ['think', 'try']
for x in a:
	print x + 'ing'
# prints thinking and trying

Python Conditional Statements Example 4

# print odd numbers between 1 and 10
for i in range(10):
	if i % 2 == 0:
		continue
	print i

Python Conditional Statements Example 5

# finds the number 5 in the list
x = 5
for i in range(10):
	print i
	if i == x:
		print 'found it'
		break

Defining Functions in Python

Defining Python Functions Example 1

# return the maximum value
def maxx(m,n):
    if m > n:
        return m
    else: 
        return n

print maxx(12,45)

Defining Python Functions Example 2

# compute circumference
def circumference(r, pi = 3.14):
    return 2 * pi * r;

# function called with both parameters
print circumference(12,3.1)    # 74.4

# function called with one parameter, pi would assume default value
print circumference(10)          # 62.8

Applying functions on a lists using filter, map, reduce

filter(function,sequence)
Applies a function to every element in the sequence. Returns only when the item returns true.

def even_numbers(x):
	return x % 2 == 0

print "Even Numbers"
print filter(even_numbers, range(10,20))

output

Even Numbers
[10, 12, 14, 16, 18]

map(function, sequence)
Applies a function to every element in the sequence and returns the results of the function for each element.

def circumference(r):
	a = 2 * 3.14 * r
	return int(a)

print "Gives Circumference"
print map(circumference, range(10,20))

output

Gives Circumference
[62, 69, 75, 81, 87, 94, 100, 106, 113, 119]

reduce(function, sequence)
returns a single value created sliding window operation on a list

def adder(a,b):
	return a + b

expenses = (546,675,897,57,4,87,454)
# total expenses
print reduce(adder, expenses)

output

2720

Python Sets

A set is an unordered collection which does not allow duplicate elements.

a = [1,2,3,3,4,'yes','no']
print a
# [1, 2, 3, 3, 4, 'yes', 'no']
print set(a)
# set([1, 2, 3, 4, 'yes', 'no'])

The number 3 appeared only once in the set. Searching for items in set is also easy:

a = [1,2,3,3,4,'yes','no']
print set(a)
print 'yes' in a	# true
print 'n' in a		# false
print 2 in a		# true

Set Arithmetic

a = set('python')
b = set('php')
print b			# unique letters
# set(['p', 'h'])
print a - b		# difference
# set(['y', 't', 'o', 'n'])
print a | b		# union
# set(['p', 't', 'y', 'h', 'o', 'n'])
print a & b		# intersection
# set(['p', 'h'])
print a ^ b		# symetric difference
# set(['y', 't', 'o', 'n'])

Python Dictionaries

Dictionaries are called hashes or associative arrays in PHP and Perl. They are unordered set of key-value pairs. Lists have numerical indices. In a dictionary, the index is called a key and the key is always a string.

version = {'Python': 3, 'Perl': 6}

# add a key-value pair to the dictionary
version['PHP'] = 5
print version
# {'Python': 3, 'PHP': 5, 'Perl': 6}

# remove a key-value pair
del version['Perl']
# {'Python': 3, 'PHP': 5}

# print all the keys
print version.keys()
# ['Python', 'PHP']

# print all values
print version.values()
# [3, 5]

# does the key exist
print 'PHP' in version
# True

# looping through the dictionary
for k, v in version.iteritems():
	print 'I installed ', k, v
# I installed Python 3
# I installed PHP 5

# enumerating a dictionary
for i, v in enumerate(['Junk', 'Jan', 'Feb', 'Mar']):
	print i, v
# 0 Junk
# 1 Jan
# 2 Feb
# 3 Mar

# zip() function allows you to loop over multiple sequences
names = ['Alice', 'Bob', 'Carla']
dob = ['Jan 1, 2001', 'Feb 2, 2002', 'Mar 3, 2003']
lives = ['Australia', 'Belgium', 'Canada']
for a, b, c in zip(names, dob, lives):
	print  '{0} was born on {1} in {2}' . format(a, b, c)
# Alice was born on Jan 1, 2001 in Australia
# Bob was born on Feb 2, 2002 in Belgium
# Carla was born on Mar 3, 2003 in Canada

Working with Biological Sequences

Opening a FASTA file

fp = file('a.fasta')
a = fp.readlines()
fp.close()
print a

output

['>gi|88853329|emb|AJ628425.1| Fasciola gigantica ITS1, isolate FgGZB2\n',
 'ACCTGAAAATCTACTCTTACACAAGCGATACACGTGTGACCGTCATGTCATGCGATAAAAATTTGCGGAC\n',
 'GGCTATGCCTGGCTCATTGAGGTCACAGCATATCCGATCACTGATGGGGTGCCTACCTGTATGATACTCC\n',
 'GATGGTATGCTTGCGTCTCTCGGGGCGCTTGTCCAAGCCAGGAGAACGGGTTGTACTGCCATGATTGGTA\n',
 'GTGCTAGGCTTAAAGAGGAGATTTGGGCTACGGCCCTGCTCCCGCCCTATGAACTGTTTCATTACTACAA\n',
 'TTACACTGTTAAAGTGGTATTGAATGGCTTGCCATTCTTTGCCATTGCCCTCGCATGCACCCGGTCCTTG\n',
 'TGGCTGGACTGCACGTACGTCGCCCGGCGGTGCCTATCCCGGGTTGGACTGATAACCTGGTCTTTGACCA\n', 'TA']

Extracting Sequence from FASTA File

# open fasta file - alternate form of the previous example
a = file('a.fasta').readlines()
# remove \n and join all lines except the first
seq = ''.join(a[1:])
seq = seq.replace('\n','')
print seq

output

ACCTGAAAATCTACTCTTACACAAGCGATACACGTGTGACCGTCATGTCAT...CA

Extracting Sequence from a GenBank File

# read file
a = file('NC_001284.gbk').read()
# DNA starts a line after ORIGIN and ends a line before //
orgn = a.find('ORIGIN')
start = a.find('1', orgn)
end = a.find('//', orgn)
b = a[start:end].split('\n')
seq = ''
for i in b:
	subseq = i.split()
	seq += ''.join(subseq[1:])
print seq

run as:

python code.py > output.txt

Exercises

  1. Extract the header of a FASTA file
  2. Extract sequence from a file containing 5 FASTA sequences
  3. Convert a GenBank sequence to a FASTA file