EMBOOS是一系列处理核酸序列和蛋白质序列的命令行工具的集合. 它包括了序列的提取, 查找, 比对和简单的序列分析等若干小程序. 它们都是独立的命令行工具, 所以用起来小巧而高效, 避免了图形界面程序和R/Python等编程工具的繁琐.
可以交互式使用,也可以带参数使用(command infile outfile -para value
)
1. Write and show sequence
1.1. Write sequence
revseq
: reverse and complement sequence -[no]reverse -[no]complement
showseq
: display sequence showseq infile outfile -format 3
seqret
: read and write (return) sequences in a standard formatnewseq
: write new sequence in fasta formatpasteseq
: insert a sequence into another at a certain positionseqcount
:makenucseq
: create random nucleotide sequencemakeprotseq
: create random protein sequence
1.2. Sequence retrieval
seqret
-sequence x.fa -sbegin 1 -send 100 #无法从基因组提取某条染色体上的序列seqretsplit
:extractseq
-sequence x.fa -region 1-10,20-30 # 将不同片段连接起来samtools
faidx hg38.fa chr20:1-100 # 这可以是提取任意染色体上任意片段的最好方法nthseq
:nthseqset
:
1.3. Sequence searching
dreg
: search dna sequence with regular expression patternpreg
: search protein sequence with regular expression patternfuzznuc
: search dna sequence with PROSITE style patternfuzzpro
: earch protein sequence with PROSITE style patternfuzztran
:wordfinder
: match large sequences against one or more other sequenceswordcount
: count and extract unique words in molecular sequence(s)wordmatch
: find regions of identity (exact matches) of two sequences
2. Multiple sequences
2.1. Sequence Combination
merger|megamerger
: merge two overlapping sequencesdiffseq
: compare and report features of two similar sequencesunion
: concatenate multiple sequences into a single sequence
2.2. Pairwise alignment
dotmatcher
dottup
dotpath
needle
: global alignmentwater
: local alignmentmatcher|supermatcher
: local alignment when water is low-efficient for big sequence
wordmatch
: common words
2.3. Multiple alignment
polydot
: dotplot among sequencesemma
: interface to ClustalWinfoalign
: show information of alignment resultcons
: show consensusshowalign
: display alignment resultprettyplot
:
3. Sequence analysis
3.1. Statistics
infoseq
: display basic information about sequencescompseq
: calculate the composition of unique words in sequenceswordcount
: count and extract unique words in molecular sequence(s)freak
: generate residue/base frequency table or plotcpgplot
: identify and plot CpG islands in nucleotide sequence(s)
3.2. Feature parsing
extractfeat
: extract features from sequence(s)showfeat
: display features of a sequence in pretty formatcoderet
: extract CDS, mRNA and translations from feature tablesshoworf
: show ORF of a nucleotide sequenceplotorf
: plot ORFmarscan
: find matrix/scaffold recognition (MRS) signatures in DNA sequences
3.3. Repeat finding
einverted
: inverted repeatpalindrome
: palindromeetandem
:equicktandem
:
3.4. Restrition sites and vector map
restrict
:remap
:restover
:
3.5. siRNA and primer design
sirna
:eprimer3
:primersearch
:stsearch
:
3.6 motif finding
patmatmotifs
: scan a protein sequence with motifs from the PROSITE database
3.7. Translation
transeq
: translate DNA into protein in a fasta fileprettyseq
: write a nucleotide sequence and its translation to filebacktranseq
: from protein sequence to DNAsixpack
: six types of translation
3.8. change sequence
msbar
: mutate sequencemaskseq
: mask sequence
4. Nucleotide property
dan
: calculate nucleic acid melting temperaturebanana
: plot bending and curvature data for B-DNAbtwisted
: calculate the twisting in a B-DNA sequence
5. Protein property
pepstats
: calculate statistics of protein propertiespepinfo
: plot amino acid properties of a protein sequence in parallel.charge
: chargeiep
: isoelectric pointoctanol
: White-Wimley protein hydropathy plottmap
: predict and plot transmembrane segmentspepwindow
: draw a hydropathy plot for a protein sequencepepwheel
: draw a helical wheel diagram for a protein sequencepepnet
: draw a helical net for a protein sequencehmoment
: calculate and plot hydrophobic moment for protein sequence(s)garnier
: predict protein secondary structure using GOR method
6. other tools
nohtml
: remove html marknotab
:nospace
:noreturn
: