The number of reads used for data analysis after passing QC. Non-specific amplicons, amplicons not assigned to the target taxon, and chimeras are removed in the QC process.
-
Removed
Low quality amplicons
The number of low quality amplicons (too short, too long, erroneous sequenced, non-specific products created during PCR). These sequences may lead to erroneous identification as spurious novel species if left unfiltered.
Learn more
-
Non-target amplicons
The number of reads that do not match the PCR target taxa. (e.g. reads identified as Archaea or Eukarya when target taxa is Bacteria)
-
Chimeric amplicons
The number of chimeric reads created during PCR. These may lead to erroneous identification as novel species if left unfiltered.
Learn more
-
Total reads after pre-filter
The number of reads after using a pre-filter to remove low quality reads from raw data produced by a NGS sequencing platform. Reads with short lengths and low Q-values are removed by the pre-filter, and in the case of paired-end sequencing, unmerged reads are also filtered out.
-
About read lengths
Min
Max
-bp
-bp
Average
-bp
About taxonomic assignment
No. of reads identified at the species level
The number of reads that were successfully identified against reference databases at the species level with a 97% similarity cutoff. This can indicate the taxonomic coverage of a database.
EzBioCloud
Greengenes
No. of species found
The number of unique species identified using reference databases.
EzBioCloud
Greengenes
OTU-picking
Method
This section indicates what clustering method was used to form OTUs from sequenced reads.
CL_OPEN_REF_UCLUST_MC2: each read is identified at the species-level against the reference database with a given similarity cutoff. Reads that fall below this cutoff are compiled and UCLUST is used to perform de novo clustering to generate additional OTUs. This strategy is called Open-reference OTU picking. Finally, OTUs with single reads (singletons) are omitted from further analysis.
This is the sequence similarity value used for OTU calculation, species-level identification against the reference database, and de novo clustering. 97% is commonly used for Bacteria.
No. of OTUs found in the sample
Operational Taxonomic Unit (OTU) is a group of sequences clustered by sequence similarity.
Because many bacterial species exhibit greater than 97% sequence similarity with other species, OTU count doesn't necessarily equate to the actual number of different species.
This value represents the number of OTUs observed during experimentation, and may be different from the total number of OTUs (Species richness) in the sample.
Good's coverage of library(%)
This is an index of the extent to which the number of sequencing reads used for analysis represents the actual species population of the sample. The value can range from 0 to 100%, with 100% indicating a complete sampling of species, meaning that additional sequencing is unlikely to find any more new species.
Reference(s): Good, I. J. "The population frequencies of species and the estimation of population parameters." Biometrika (1953): 237-264.
Diversity indices
Diversity indices are measures of species diversity, based on the number and pattern of OTUs observed in the sample. The indices include statistical estimates of species richness (Ace, Chao, Jackknife), and estimates of species evenness (Shannon, Simpson, NPShannon).
ACE
ACE is an indicator of species richness (total number of species in a sample) that is sensitive to rare OTUs (singletons and doubletons). Higher values indicate higher diversity.
Reference(s): Chao, A., and Lee, S.-M. "Estimating the number of classes via sample coverage." Journal of the American statistical Association 87.417 (1992): 210-217.
LCI
Value
HCI
Chao1
Chao1 is an indicator of species richness (total number of species in a sample) that is sensitive to rare OTUs (singletons and doubletons). Higher values indicate higher diversity.
Reference(s): Chao, A. "Estimating the population size for capture-recapture data with unequal catchability." Biometrics (1987): 783-791.
LCI
Value
HCI
Jackknife
Jackknife is an indicator of species richness (total number of species in a sample) that is sensitive to rare OTUs (singletons and doubletons) as well as to abundant OTUs (tripletons and more). Higher values indicate higher diversity.
Reference(s): Burnham, K. P. & Overton, W. S. (1979) Robust estimation of population size when capture probabilities vary among animals. Ecology, 60, 927-936.
LCI
Value
HCI
Shannon
Shannon is an indicator of species evenness (proportional distribution of the number of each species in a sample) that exhibits values greater than 0.
Higher values indicate higher diversity, and the maximum value is achieved when all species are present in equal numbers.
Reference(s): Magurran, A. E. (2013). Measuring biological diversity. John Wiley & Sons.
LCI
Value
HCI
Simpson
Simpson is an indicator of species evenness (proportional distribution of the number of each species in a sample) that displays the probability that two randomly selected sequences are of the same species.
Values range from 0 to 1, and lower values indicate higher diversity.
Reference(s): Magurran, A. E. (2013). Measuring biological diversity. John Wiley & Sons.
LCI
Value
HCI
NPShannon
NPShannon is an indicator of species evenness (proportional distribution of the number of each species in a sample) that estimates diversity when there are unseen species and unknown abundance.
Values are greater than 0, and higher values indicate higher diversity.
Reference(s): Chao, A., & Shen, T. J. (2003). Nonparametric estimation of Shannon’s index of diversity when there are unseen species in sample. Environmental and ecological statistics, 10(4), 429-443.
Value
Phylogenetic diversity
Phylogenetic diversity is a measure of biodiversity which incorporates phylogenetic difference between species. It is defined and calculated as "the sum of the lengths of all those branches
that are members of the corresponding minimum spanning path", in which 'branch' is a segment of a cladogram, and the minimum spanning path is the minimum distance between the two nodes.
The rarefaction curve is a graph that expresses species diversity by plotting the correlation between the size of the sample data and the number of OTUs.
The x-axis represents the number of sampled reads, and the y-axis represents the number of OTUs discovered. In general, as the number of reads increases, the number of OTUs converges to the maximum value.
The steeper the slope of the curve, the higher the species diversity.
Reference(s): Heck, K. L., van Belle, G., & Simberloff, D. (1975). Explicit calculation of the rarefaction diversity measurement and the determination of sufficient sample size. Ecology, 56(6), 1459-1461.
Rank abundance curve
The rank abundance graph can be used to observe species evenness.
The x-axis represents the rank of OTUs, and the y-axis represents the relative abundance of OTUs at each rank.
The graph converges to 0, and the steeper the slope of the curve, the lower the species diversity.
Reference(s): Whittaker, R. H. (1965). Dominance and diversity in land plant communities. Science, 147(3655), 250-260.
EzBioCloud
Name
Taxa included in the group
Greengenes
Contig
A contig is a set of identical and sometimes overlapping sequences that together represent a consensus region of DNA.
Contig
No data here.
Select a Taxonomic hierarchy from the left menu
Contig
No filtered results.
Top Hit-
Similarity
contig similarity
Count
contig count
Clone
A clone is an individual sequence that was not included in contigs.
Users affiliated with academic and non-profit institutions are entailed to
free use of EzBioCloud's database and bioinformatics applications with conditions. Reproduction or redistribution of EzBioCloud is strictly prohibited by
applicable law and regulations.
For the following cases, please contact bs.ngs@cj.net to obtain a license to
use our services.
Use in for-profit institutions or companies.
Use for commercial projects by academic or non-profit organizations
Please login to the site to download the database.
Click here to
login or register if you do not have an account
yet.
Your institution doesn't appear to be registered with us. Please request
institution verification by clicking the button below.
The email domain associated with your EzBioCloud account doesn't
appear to be an institution that we recognize. Please do not use gmail, hotmail
etc. Use your institutional email.
If you believe this is an error, please contact bs.ngs@cj.net for further
assistance.
By not agreeing to only use EzBioCloud for non-profit purposes, you will not be granted free access to our database and bioinformatics applications on EzBioCloud.
Please Confirm or Go back to change your status.
This browser is not supported.
We recommend to download the following browser from the link below.