Alignment
Similarity-basedarrangementofDNA,RNAorproteinsequences.Inthiscontext,subjectandquerysequenceshouldbeorthologousandreflectevolutionary,notfunctionalorstructuralrelationships.
Annotation
Computationalprocessofattachingbiologicallyrelevantinformationtogenomesequencedata.
Assembly
Computationalreconstructionofalongersequencefromsmallersequencereads.
Barcode
Short-sequenceidentifierforindividuallabelling(barcoding)ofsequencinglibraries.
BAC
(Bacterialartificialchromosome)DNAconstructofvariouslength(–kb).
cDNA
ComplementaryDNAsynthesizedfromanmRNAtemplate
Contig
AcontiguouslinearstretchofDNAorRNAconsensussequence.Constructedfromanumberofsmaller,partiallyoverlapping,sequencefragments(reads).
Coverage
Alsoknownas‘sequencingdepth’.Sequencecoveragereferstotheaveragenumberofreadsperlocusanddiffersfromphysicalcoverage,atermoftenusedingenomeassemblyreferringtothecumulativelengthofreadsorreadpairsexpressedasamultipleofgenomesize.
Denovoassembly
Referstothereconstructionofcontiguoussequenceswithoutmakinguseofanyreferencesequence.
ESTlibrary
Expressedsequencetaglibrary.AshortsubsequenceofcDNAtranscriptsequence.
Fosmid
AvectorforbacterialcloningofgenomicDNAfragmentsthatusuallyholdsinsertsofaround40kb.
GCcontent
TheproportionofguanineandcytosinebasesinaDNA/RNAsequence
Geneontology
(GO)Structured,controlledvocabulariesandclassificationsofgenefunctionacrossspeciesandresearchareas.
InDel
Insertion/deletionpolymorphismInsertsizeLengthofrandomlyshearedfragments(fromthegenomeortranscriptome)sequencedfrombothends.
K-mer
Short,uniqueelementofDNAsequenceoflengthk,usedbymanyassemblyalgorithms.
Library
CollectionofDNA(orRNA)fragmentsmodifiedinawaythatisappropriatefordownstreamanalyses,suchashigh-throughputsequencinginthiscase.
Mapping
Atermroutinelyusedtodescribealignmentofshortsequencereadstoalongerreferencesequence
Masking
ConvertingaDNAsequence[A,C,G,T](usuallyrepetitiveoroflowquality)totheuninformativecharacterstateNortolowercasecharacters[a,c,g,t](softmasking).
Massivelyparallel(ornextgeneration)sequencingHigh-throughputsequencingnano-technologyusedtodeterminethebase-pairsequenceofDNA/RNAmoleculesatmuchlargerquantitiesthanpreviousend-termination(e.g.Sangersequencing)basedsequencingtechniques.
Mate-pair
SequenceinformationfromtwoendsofaDNAfragment,usuallyseveralthousandbase-pairslong.
N50
Astatisticofasetofcontigs(orscaffolds).Itisdefinedasthelengthforwhichthecollectionofallcontigsofthatlengthorlongercontainsatleasthalfofthetotalofthelengthsofthecontigs.
N90
EquivalenttotheN50statisticdescribingthelengthforwhichthecollectionofallcontigsofthatlengthorlongercontainsatleast90%ofthetotalofthelengthsofthecontigs.
Opticalmap
Genomewide,ordered,high-resolutionrestrictionmapderivedfromsingle,stainedDNAmolecules.Itcanbeusedtoimproveagenomeassemblybymatchingittothegenomewidepatternofexpectedrestrictionsites,asinferredfromthegenomesequence.
Paired-endsequencing
SequenceinformationfromtwoendsofashortDNAfragment,usuallyafewhundredbasepairslong.
Read
Shortbase-pairsequenceinferredfromtheDNA/RNAtemplatebysequencing.
RNA-Seq
High-throughputshotguntranscriptome(cDNA)sequencing.UsuallynotusedsynonymoustoRNA-sequencingwhichimpliesdirectsequencingofRNAmoleculesskippingthecDNAgenerationstep
Scaffold
Twoormorecontigsjoinedtogetherusingread-pairinformation
Transcriptome
SetofallRNAmoleculestranscribedfromaDNAtemplate
参考文献
Afieldguidetowhole-genomesequencing,assemblyandannotation