EzBioCloud

About MTP

MTP / Sample name

Memo

Target taxon

Database version

Region

About read counts

Total valid reads
The number of reads used for data analysis after passing QC. Non-specific amplicons, amplicons not assigned to the target taxon, and chimeras are removed in the QC process.

Removed

Low quality amplicons
The number of low quality amplicons (too short, too long, erroneous sequenced, non-specific products created during PCR). These sequences may lead to erroneous identification as spurious novel species if left unfiltered. Learn more

Non-target amplicons
The number of reads that do not match the PCR target taxa. (e.g. reads identified as Archaea or Eukarya when target taxa is Bacteria)

Chimeric amplicons
The number of chimeric reads created during PCR. These may lead to erroneous identification as novel species if left unfiltered. Learn more

Total reads after pre-filter
The number of reads after using a pre-filter to remove low quality reads from raw data produced by a NGS sequencing platform. Reads with short lengths and low Q-values are removed by the pre-filter, and in the case of paired-end sequencing, unmerged reads are also filtered out.

About read lengths

Min

Max

- bp

Average

- bp

About taxonomic assignment

No. of reads identified at the species level
The number of reads that were successfully identified against reference databases at the species level with a 97% similarity cutoff. This can indicate the taxonomic coverage of a database.

EzBioCloud
Greengenes

No. of species found
The number of unique species identified using reference databases.

EzBioCloud
Greengenes

OTU-picking

Method
This section indicates what clustering method was used to form OTUs from sequenced reads.

CL_OPEN_REF_UCLUST_MC2: each read is identified at the species-level against the reference database with a given similarity cutoff. Reads that fall below this cutoff are compiled and UCLUST is used to perform de novo clustering to generate additional OTUs. This strategy is called Open-reference OTU picking. Finally, OTUs with single reads (singletons) are omitted from further analysis.

* uclust : http://drive5.com/usearch/manual/uclust_algo.html
* cdhit : http://www.bioinformatics.org/cd-hit/

Cutoff
This is the sequence similarity value used for OTU calculation, species-level identification against the reference database, and de novo clustering. 97% is commonly used for Bacteria.

No. of OTUs found in the sample
Operational Taxonomic Unit (OTU) is a group of sequences clustered by sequence similarity. Because many bacterial species exhibit greater than 97% sequence similarity with other species, OTU count doesn't necessarily equate to the actual number of different species. This value represents the number of OTUs observed during experimentation, and may be different from the total number of OTUs (Species richness) in the sample.

Good's coverage of library(%)
This is an index of the extent to which the number of sequencing reads used for analysis represents the actual species population of the sample. The value can range from 0 to 100%, with 100% indicating a complete sampling of species, meaning that additional sequencing is unlikely to find any more new species.

Reference(s):
Good, I. J. "The population frequencies of species and the estimation of population parameters." Biometrika (1953): 237-264.

Diversity indices
Diversity indices are measures of species diversity, based on the number and pattern of OTUs observed in the sample. The indices include statistical estimates of species richness (Ace, Chao, Jackknife), and estimates of species evenness (Shannon, Simpson, NPShannon).

ACE
ACE is an indicator of species richness (total number of species in a sample) that is sensitive to rare OTUs (singletons and doubletons). Higher values indicate higher diversity.

Reference(s):
Chao, A., and Lee, S.-M. "Estimating the number of classes via sample coverage." Journal of the American statistical Association 87.417 (1992): 210-217.

LCI
Value
HCI

Chao1
Chao1 is an indicator of species richness (total number of species in a sample) that is sensitive to rare OTUs (singletons and doubletons). Higher values indicate higher diversity.

Reference(s):
Chao, A. "Estimating the population size for capture-recapture data with unequal catchability." Biometrics (1987): 783-791.

LCI
Value
HCI

Jackknife
Jackknife is an indicator of species richness (total number of species in a sample) that is sensitive to rare OTUs (singletons and doubletons) as well as to abundant OTUs (tripletons and more). Higher values indicate higher diversity.

Reference(s):
Burnham, K. P. & Overton, W. S. (1979) Robust estimation of population size when capture probabilities vary among animals. Ecology, 60, 927-936.

LCI
Value
HCI

Shannon
Shannon is an indicator of species evenness (proportional distribution of the number of each species in a sample) that exhibits values greater than 0.
Higher values indicate higher diversity, and the maximum value is achieved when all species are present in equal numbers.

Reference(s):
Magurran, A. E. (2013). Measuring biological diversity. John Wiley & Sons.

LCI
Value
HCI

Simpson
Simpson is an indicator of species evenness (proportional distribution of the number of each species in a sample) that displays the probability that two randomly selected sequences are of the same species.
Values range from 0 to 1, and lower values indicate higher diversity.

Reference(s):
Magurran, A. E. (2013). Measuring biological diversity. John Wiley & Sons.

LCI
Value
HCI

NPShannon
NPShannon is an indicator of species evenness (proportional distribution of the number of each species in a sample) that estimates diversity when there are unseen species and unknown abundance.
Values are greater than 0, and higher values indicate higher diversity.

Reference(s):
Chao, A., & Shen, T. J. (2003). Nonparametric estimation of Shannon’s index of diversity when there are unseen species in sample. Environmental and ecological statistics, 10(4), 429-443.

Value

Phylogenetic diversity
Phylogenetic diversity is a measure of biodiversity which incorporates phylogenetic difference between species. It is defined and calculated as "the sum of the lengths of all those branches
that are members of the corresponding minimum spanning path", in which 'branch' is a segment of a cladogram, and the minimum spanning path is the minimum distance between the two nodes.

Reference(s):
1. https://en.wikipedia.org/wiki/Phylogenetic_diversity
2. DP Faith. 1992. Conservation evaluation and phylogenetic diversity. Biological Conservation 61: 1-10

Value

Rarefaction curve
The rarefaction curve is a graph that expresses species diversity by plotting the correlation between the size of the sample data and the number of OTUs.
The x-axis represents the number of sampled reads, and the y-axis represents the number of OTUs discovered. In general, as the number of reads increases, the number of OTUs converges to the maximum value.
The steeper the slope of the curve, the higher the species diversity.

Reference(s):
Heck, K. L., van Belle, G., & Simberloff, D. (1975). Explicit calculation of the rarefaction diversity measurement and the determination of sufficient sample size. Ecology, 56(6), 1459-1461.

Rank abundance curve
The rank abundance graph can be used to observe species evenness. The x-axis represents the rank of OTUs, and the y-axis represents the relative abundance of OTUs at each rank. The graph converges to 0, and the steeper the slope of the curve, the lower the species diversity.

Reference(s):
Whittaker, R. H. (1965). Dominance and diversity in land plant communities. Science, 147(3655), 250-260.

Sequence Name

-
Count

-

Hit Top 5

1

Hit Species Name
- Similarity
- Qcov
- Taxonomy
2

Hit Species Name
- Similarity
- Qcov
- Taxonomy
3

Hit Species Name
- Similarity
- Qcov
- Taxonomy
4

Hit Species Name
- Similarity
- Qcov
- Taxonomy
5

Hit Species Name
- Similarity
- Qcov
- Taxonomy

About MTP

MTP / Sample name

Memo

Tag

Sequencing platform

Target taxon

Database version

Region

About read counts

Total valid reads The number of reads used for data analysis after passing QC. Non-specific amplicons, amplicons not assigned to the target taxon, and chimeras are removed in the QC process.

Removed

Low quality amplicons The number of low quality amplicons (too short, too long, erroneous sequenced, non-specific products created during PCR). These sequences may lead to erroneous identification as spurious novel species if left unfiltered. Learn more

Non-target amplicons The number of reads that do not match the PCR target taxa. (e.g. reads identified as Archaea or Eukarya when target taxa is Bacteria)

Chimeric amplicons The number of chimeric reads created during PCR. These may lead to erroneous identification as novel species if left unfiltered. Learn more

About read lengths

- bp

- bp

- bp

About taxonomic assignment

No. of reads identified at the species level The number of reads that were successfully identified against reference databases at the species level with a 97% similarity cutoff. This can indicate the taxonomic coverage of a database.

No. of species found The number of unique species identified using reference databases.

OTU-picking

Cutoff This is the sequence similarity value used for OTU calculation, species-level identification against the reference database, and de novo clustering. 97% is commonly used for Bacteria.

Diversity indices Diversity indices are measures of species diversity, based on the number and pattern of OTUs observed in the sample. The indices include statistical estimates of species richness (Ace, Chao, Jackknife), and estimates of species evenness (Shannon, Simpson, NPShannon).

EzBioCloud

Name

Greengenes

Contig A contig is a set of identical and sometimes overlapping sequences that together represent a consensus region of DNA.

Contig

Contig

Clone A clone is an individual sequence that was not included in contigs.

Clone

Clone

Phylum

EzBioCloud

Greengenes

Class

EzBioCloud

Greengenes

Order

EzBioCloud

Greengenes

Family

EzBioCloud

Greengenes

Genus

EzBioCloud

Greengenes

Species

EzBioCloud

Greengenes

Search

Pre-defined group

Abundance of a selected taxon

This browser is not supported. ×

Total valid reads
The number of reads used for data analysis after passing QC. Non-specific amplicons, amplicons not assigned to the target taxon, and chimeras are removed in the QC process.

Low quality amplicons
The number of low quality amplicons (too short, too long, erroneous sequenced, non-specific products created during PCR). These sequences may lead to erroneous identification as spurious novel species if left unfiltered. Learn more

Non-target amplicons
The number of reads that do not match the PCR target taxa. (e.g. reads identified as Archaea or Eukarya when target taxa is Bacteria)

Chimeric amplicons
The number of chimeric reads created during PCR. These may lead to erroneous identification as novel species if left unfiltered. Learn more

No. of reads identified at the species level
The number of reads that were successfully identified against reference databases at the species level with a 97% similarity cutoff. This can indicate the taxonomic coverage of a database.

No. of species found
The number of unique species identified using reference databases.

Cutoff
This is the sequence similarity value used for OTU calculation, species-level identification against the reference database, and de novo clustering. 97% is commonly used for Bacteria.

Diversity indices
Diversity indices are measures of species diversity, based on the number and pattern of OTUs observed in the sample. The indices include statistical estimates of species richness (Ace, Chao, Jackknife), and estimates of species evenness (Shannon, Simpson, NPShannon).

Contig
A contig is a set of identical and sometimes overlapping sequences that together represent a consensus region of DNA.

Clone
A clone is an individual sequence that was not included in contigs.

This browser is not supported.