Opening a FASTA file
fp = file('a.fasta')
a = fp.readlines()
fp.close()
print a
output
['>gi|88853329|emb|AJ628425.1| Fasciola gigantica ITS1, isolate FgGZB2\n', 'ACCTGAAAATCTACTCTTACACAAGCGATACACGTGTGACCGTCATGTCATGCGATAAAAATTTGCGGAC\n', 'GGCTATGCCTGGCTCATTGAGGTCACAGCATATCCGATCACTGATGGGGTGCCTACCTGTATGATACTCC\n', 'GATGGTATGCTTGCGTCTCTCGGGGCGCTTGTCCAAGCCAGGAGAACGGGTTGTACTGCCATGATTGGTA\n', 'GTGCTAGGCTTAAAGAGGAGATTTGGGCTACGGCCCTGCTCCCGCCCTATGAACTGTTTCATTACTACAA\n', 'TTACACTGTTAAAGTGGTATTGAATGGCTTGCCATTCTTTGCCATTGCCCTCGCATGCACCCGGTCCTTG\n', 'TGGCTGGACTGCACGTACGTCGCCCGGCGGTGCCTATCCCGGGTTGGACTGATAACCTGGTCTTTGACCA\n', 'TA']
Extracting Sequence from FASTA File
# open fasta file - alternate form of the previous example
a = file('a.fasta').readlines()
# remove \n and join all lines except the first
seq = ''.join(a[1:])
seq = seq.replace('\n','')
print seq
output
ACCTGAAAATCTACTCTTACACAAGCGATACACGTGTGACCGTCATGTCAT...CA
Extracting Sequence from a GenBank File
# read file
a = file('NC_001284.gbk').read()
# DNA starts a line after ORIGIN and ends a line before //
orgn = a.find('ORIGIN')
start = a.find('1', orgn)
end = a.find('//', orgn)
b = a[start:end].split('\n')
seq = ''
for i in b:
subseq = i.split()
seq += ''.join(subseq[1:])
print seq
run as:
python code.py > output.txt
Exercises
- Extract the header of a FASTA file
- Extract sequence from a file containing 5 FASTA sequences
- Convert a GenBank sequence to a FASTA file