EMBOOS是一系列处理核酸序列和蛋白质序列的命令行工具的集合. 它包括了序列的提取, 查找, 比对和简单的序列分析等若干小程序. 它们都是独立的命令行工具, 所以用起来小巧而高效, 避免了图形界面程序和R/Python等编程工具的繁琐.
可以交互式使用,也可以带参数使用(command infile outfile -para value)
1. Write and show sequence
1.1. Write sequence
revseq: reverse and complement sequence -[no]reverse -[no]complementshowseq: display sequence showseq infile outfile -format 3seqret: read and write (return) sequences in a standard formatnewseq: write new sequence in fasta formatpasteseq: insert a sequence into another at a certain positionseqcount:makenucseq: create random nucleotide sequencemakeprotseq: create random protein sequence
1.2. Sequence retrieval
seqret -sequence x.fa -sbegin 1 -send 100 #无法从基因组提取某条染色体上的序列seqretsplit:extractseq -sequence x.fa -region 1-10,20-30 # 将不同片段连接起来samtools faidx hg38.fa chr20:1-100 # 这可以是提取任意染色体上任意片段的最好方法nthseq:nthseqset:
1.3. Sequence searching
dreg: search dna sequence with regular expression patternpreg: search protein sequence with regular expression patternfuzznuc: search dna sequence with PROSITE style patternfuzzpro: earch protein sequence with PROSITE style patternfuzztran:wordfinder: match large sequences against one or more other sequenceswordcount: count and extract unique words in molecular sequence(s)wordmatch: find regions of identity (exact matches) of two sequences
2. Multiple sequences
2.1. Sequence Combination
merger|megamerger: merge two overlapping sequencesdiffseq: compare and report features of two similar sequencesunion: concatenate multiple sequences into a single sequence
2.2. Pairwise alignment
dotmatcherdottupdotpathneedle: global alignmentwater: local alignmentmatcher|supermatcher: local alignment when water is low-efficient for big sequencewordmatch: common words
2.3. Multiple alignment
polydot: dotplot among sequencesemma: interface to ClustalWinfoalign: show information of alignment resultcons: show consensusshowalign: display alignment resultprettyplot:
3. Sequence analysis
3.1. Statistics
infoseq: display basic information about sequencescompseq: calculate the composition of unique words in sequenceswordcount: count and extract unique words in molecular sequence(s)freak: generate residue/base frequency table or plotcpgplot: identify and plot CpG islands in nucleotide sequence(s)
3.2. Feature parsing
extractfeat: extract features from sequence(s)showfeat: display features of a sequence in pretty formatcoderet: extract CDS, mRNA and translations from feature tablesshoworf: show ORF of a nucleotide sequenceplotorf: plot ORFmarscan: find matrix/scaffold recognition (MRS) signatures in DNA sequences
3.3. Repeat finding
einverted: inverted repeatpalindrome: palindromeetandem:equicktandem:
3.4. Restrition sites and vector map
restrict:remap:restover:
3.5. siRNA and primer design
sirna:eprimer3:primersearch:stsearch:
3.6 motif finding
patmatmotifs: scan a protein sequence with motifs from the PROSITE database
3.7. Translation
transeq: translate DNA into protein in a fasta fileprettyseq: write a nucleotide sequence and its translation to filebacktranseq: from protein sequence to DNAsixpack: six types of translation
3.8. change sequence
msbar: mutate sequencemaskseq: mask sequence
4. Nucleotide property
dan: calculate nucleic acid melting temperaturebanana: plot bending and curvature data for B-DNAbtwisted: calculate the twisting in a B-DNA sequence
5. Protein property
pepstats: calculate statistics of protein propertiespepinfo: plot amino acid properties of a protein sequence in parallel.charge: chargeiep: isoelectric pointoctanol: White-Wimley protein hydropathy plottmap: predict and plot transmembrane segmentspepwindow: draw a hydropathy plot for a protein sequencepepwheel: draw a helical wheel diagram for a protein sequencepepnet: draw a helical net for a protein sequencehmoment: calculate and plot hydrophobic moment for protein sequence(s)garnier: predict protein secondary structure using GOR method
6. other tools
nohtml: remove html marknotab:nospace:noreturn: