EMBOSS (The European Molecular Biology Open Software Suite) is a free open source software suite which is capable of handling bioinformatics problems. It contains extensive libraries for bioinformatics including:
Following are the steps to install EMBOSS on a Linux system:
All Emboss programs run from the Unix command line and wossname is no exception. wossname produce a list of emboss applications.
> wossname
> wossname seqret
> wossname nucleotide
seqret is a program and nucleotide is a keyword. EMBOSS programs like wossname have many parameters. To see a list of parameters type:
wossname -opt
Emboss can access various biological databases automatically. However, this need to be configured. First of all type:
$ showdb
If you get line nothing in the list of databases, this document is for you. First we locate the emboss.default.template file.
$ locate emboss.default.template
Then cd to that directory and create copy.
$ cp emboss.default.template emboss.default
Open emboss.default and uncomment the databases you desire by deleting # symbol at the beginning of the concerned lines.
$ showdb
You should see your databases now.
dbiflat indexes a flat file database of one or more files and builds EMBL CD-ROM format index files. Major databases such as EBI, Swiss-Prot and TrEMBL distribute unindexed flat file versions. dbiflat indexes these databases. The benefit of using indexed flat files is that we can offer services built on data from major bioinformatics databases without having to connect to them for every query. The alternate is to install and configure SRS or MRS.
Go to your data directory inside EMBOSS.
$ cd /usr/local/share/EMBOSS/data
Create a directory called swissprot.
$ mkdir swissprot
Go inside this directory (otherwise you would need to manually type the address).
$ cd swissprot
Before we start, we need to download a flat file database. UniProt offers database downloads at http://www.uniprot.org/downloads. Download the swissprot TEXT version of UniProtKB/Swiss-Prot.
$ wget ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz
This would download a 300MB+ file named uniprot_sprot.dat.gz. Unzip this file.
$ gunzip uniprot_sprot.dat.gz $ ls uniprot_sprot.dat
Now that we have acquired the file file database, we index it with dbiflat:
$ dbiflat
Index a flat file database
Database name: swissprot
EMBL : EMBL
SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew
GB : Genbank, DDBJ
REFSEQ : Refseq
Entry format [SWISS]:
Database directory [.]:
Wildcard database filename [*.dat]: uniprot_sprot.dat
Release number [0.0]:
Index date [00/00/00]: 16/01/09
General log output file [outfile.dbiflat]:
Hit return for the empty prompts above as we need the default value. Once the indexing is complete, do the following to verify the values.
[root@hunza swissprot]# cat outfile.dbiflat
########################################
# Program: dbiflat
# Rundate: Fri 16 Jan 2009 13:34:06
# Dbname: swissprot
# Release: 0.0
# Date: 16/01/09
# CurrentDirectory: /usr/local/share/EMBOSS/data/swissprot/
# IndexDirectory: ./
# IndexDirectoryPath: /usr/local/share/EMBOSS/data/swissprot/
# Maxindex: 0
# Fields: 2
# Field 1: id
# Field 2: acc
# Directory: ./
# DirectoryPath: /usr/local/share/EMBOSS/data/swissprot/
# Filenames: uniprot_sprot.dat
# Exclude:
# Files: 1
# File 1: ./uniprot_sprot.dat
########################################
# Commandline: dbiflat
# -dbname swissprot
# -filenames uniprot_sprot.dat
# -date 16/01/09
########################################
filename: 'uniprot_sprot.dat'
id: 405506
acc: 554076
Index acc: maxlen 6 items 539564
Total 1 files 405506 entries (0 duplicates)
Our new indexed database would not be visible to other EMBOSS programs until we edit the emboss.default file. Open this file for editing. Look for the DB swissprot [...] block. If it exists, replace it with the following. If it doesn't exist, add the following.
##########################################################################
# SWISSPROT indexed with dbiflat
##########################################################################
# SWISSPROT: Set the directory to where the database is stored
# Assumed the dbiflat index files are in the same directory
DB swissprot [
type: P
comment: "SWISSPROT sequences"
method: emblcd
format: swiss
dbalias: swissprot
dir: /usr/local/share/EMBOSS/data/swissprot/
file: uniprot_sprot.dat
]
Save the emboss.default file and try the following example to verify that your database is working.
$ seqret Reads and writes (returns) sequences Input (gapped) sequence(s): swissprot:p11217 output sequence(s) [pygm_human.fasta]: $ ls pygm_human.fasta $ cat pygm_human.fasta >PYGM_HUMAN P11217 RecName: Full=Glycogen phosphorylase, muscle form; EC=2.4.1.1; AltName: Full=Myophosphorylase; MSRPLSDQEKRKQISVRGLAGVENVTELKKNFNRHLHFTLVKDRNVATPRDYYFALAHTV RDHLVGRWIRTQQHYYEKDPKRIYYLSLEFYMGRTLQNTMVNLALENACDEATYQLGLDM EELEEIEEDAGLGNGGLGRLAACFLDSMATLGLAAYGYGIRYEFGIFNQKISGGWQMEEA DDWLRYGNPWEKARPEFTLPVHFYGHVEHTSQGAKWVDTQVVLAMPYDTPVPGYRNNVVN TMRLWSAKAPNDFNLKDFNVGGYIQAVLDRNLAENISRVLYPNDNFFEGKELRLKQEYFV VAATLQDIIRRFKSSKFGCRDPVRTNFDAFPDKVAIQLNDTHPSLAIPELMRILVDLERM DWDKAWDVTVRTCAYTNHTVLPEALERWPVHLLETLLPRHLQIIYEINQRFLNRVAAAFP GDVDRLRRMSLVEEGAVKRINMAHLCIAGSHAVNGVARIHSEILKKTIFKDFYELEPHKF QNKTNGITPRRWLVLCNPGLAEVIAERIGEDFISDLDQLRKLLSFVDDEAFIRDVAKVKQ ENKLKFAAYLEREYKVHINPNSLFDIQVKRIHEYKRQLLNCLHVITLYNRIKREPNKFFV PRTVMIGGKAAPGYHMAKMIIRLVTAIGDVVNHDPAVGDRLRVIFLENYRVSLAEKVIPA ADLSEQISTAGTEASGTGNMKFMLNGALTIGTMDGANVEMAEEAGEENFFIFGMRVEDVD KLDQRGYNAQEYYDRIPELRQVIEQLSSGFFSPKQPDLFKDIVNMLMHHDRFKVFADYED YIKCQEKVSALYKNPREWTRMVIRNIATSGKFSSDRTIAQYAREIWGVEPSRQRLPAPDE AI
seqret was used to query for a protein with id P11217. This entry was retrieved and saved to pygm_human.fasta file.