EMBOSS:序列处理的小工具集合

2016-10-02

EMBOOS是一系列处理核酸序列和蛋白质序列的命令行工具的集合. 它包括了序列的提取, 查找, 比对和简单的序列分析等若干小程序. 它们都是独立的命令行工具, 所以用起来小巧而高效, 避免了图形界面程序和R/Python等编程工具的繁琐.

可以交互式使用，也可以带参数使用(command infile outfile -para value)

1. Write and show sequence

1.1. Write sequence

revseq: reverse and complement sequence -[no]reverse -[no]complement
showseq: display sequence showseq infile outfile -format 3
seqret: read and write (return) sequences in a standard format
newseq: write new sequence in fasta format
pasteseq: insert a sequence into another at a certain position
seqcount:
makenucseq: create random nucleotide sequence
makeprotseq: create random protein sequence

1.2. Sequence retrieval

seqret -sequence x.fa -sbegin 1 -send 100 #无法从基因组提取某条染色体上的序列
seqretsplit:
extractseq -sequence x.fa -region 1-10,20-30 # 将不同片段连接起来
samtools faidx hg38.fa chr20:1-100 # 这可以是提取任意染色体上任意片段的最好方法
nthseq:
nthseqset:

1.3. Sequence searching

dreg: search dna sequence with regular expression pattern
preg: search protein sequence with regular expression pattern
fuzznuc: search dna sequence with PROSITE style pattern
fuzzpro: earch protein sequence with PROSITE style pattern
fuzztran:
wordfinder: match large sequences against one or more other sequences
wordcount: count and extract unique words in molecular sequence(s)
wordmatch: find regions of identity (exact matches) of two sequences

2. Multiple sequences

2.1. Sequence Combination

merger|megamerger: merge two overlapping sequences
diffseq: compare and report features of two similar sequences
union: concatenate multiple sequences into a single sequence

2.2. Pairwise alignment

dotmatcher
dottup
dotpath
needle: global alignment
water: local alignment
matcher|supermatcher: local alignment when water is low-efficient for big sequence
wordmatch: common words

2.3. Multiple alignment

polydot: dotplot among sequences
emma: interface to ClustalW
infoalign: show information of alignment result
cons: show consensus
showalign: display alignment result
prettyplot:

3. Sequence analysis

3.1. Statistics

infoseq: display basic information about sequences
compseq: calculate the composition of unique words in sequences
wordcount: count and extract unique words in molecular sequence(s)
freak: generate residue/base frequency table or plot
cpgplot: identify and plot CpG islands in nucleotide sequence(s)

3.2. Feature parsing

extractfeat: extract features from sequence(s)
showfeat: display features of a sequence in pretty format
coderet: extract CDS, mRNA and translations from feature tables
showorf: show ORF of a nucleotide sequence
plotorf: plot ORF
marscan: find matrix/scaffold recognition (MRS) signatures in DNA sequences

3.3. Repeat finding

einverted: inverted repeat
palindrome: palindrome
etandem:
equicktandem:

3.4. Restrition sites and vector map

restrict:
remap:
restover:

3.5. siRNA and primer design

sirna:
eprimer3:
primersearch:
stsearch:

3.6 motif finding

patmatmotifs: scan a protein sequence with motifs from the PROSITE database

3.7. Translation

transeq: translate DNA into protein in a fasta file
prettyseq: write a nucleotide sequence and its translation to file
backtranseq: from protein sequence to DNA
sixpack: six types of translation

3.8. change sequence

msbar: mutate sequence
maskseq: mask sequence

4. Nucleotide property

dan: calculate nucleic acid melting temperature
banana: plot bending and curvature data for B-DNA
btwisted: calculate the twisting in a B-DNA sequence

5. Protein property

pepstats: calculate statistics of protein properties
pepinfo: plot amino acid properties of a protein sequence in parallel.
charge: charge
iep: isoelectric point
octanol: White-Wimley protein hydropathy plot
tmap: predict and plot transmembrane segments
pepwindow: draw a hydropathy plot for a protein sequence
pepwheel: draw a helical wheel diagram for a protein sequence
pepnet: draw a helical net for a protein sequence
hmoment: calculate and plot hydrophobic moment for protein sequence(s)
garnier: predict protein secondary structure using GOR method

6. other tools

nohtml: remove html mark
notab:
nospace:
noreturn: