Reproducibility and repeatability of six high-throughput 16S rDNA sequencing protocols for microbiota profiling
Culture-independent molecular techniques and advances in next generation sequencing (NGS) technologies make large-scale epidemiological studies on microbiota feasible. A challenge using NGS is to obtain high reproducibility and repeatability, which is mostly attained through robust amplification. We aimed to assess the reproducibility of saliva microbiota by comparing triplicate samples. The microbiota was produced with simplified in-house 16S amplicon assays taking advantage of large number of barcodes. The assays included primers with Truseq (TS-tailed) or Nextera (NX-tailed) adapters and either with dual index or dual index plus a 6-nt internal index. All amplification protocols produced consistent microbial profiles for the same samples. Although, in our study, reproducibility was highest for the TS-tailed method. Five replicates of a single sample, prepared with the TS-tailed 1-step protocol without internal index sequenced on the HiSeq platform provided high alpha-diversity and low standard deviation (mean Shannon and Inverse Simpson diversity was 3.19 ± 0.097 and 13.56 ± 1.634 respectively). Large-scale profiling of microbiota can consistently be produced by all 16S amplicon assays. The TS-tailed-1S dual index protocol is preferred since it provides repeatable profiles on the HiSeq platform and are less labour intensive.
Tools and techniques for computational reproducibility
When reporting research findings, scientists document the steps they followed so that others can verify and build upon the research. When those steps have been described in sufficient detail that others can retrace the steps and obtain similar results, the research is said to be reproducible. Computers play a vital role in many research disciplines and present both opportunities and challenges for reproducibility. Computers can be programmed to execute analysis tasks, and those programs can be repeated and shared with others. The deterministic nature of most computer programs means that the same analysis tasks, applied to the same data, will often produce the same outputs. However, in practice, computational findings often cannot be reproduced because of complexities in how software is packaged, installed, and executed—and because of limitations associated with how scientists document analysis steps. Many tools and techniques are available to help overcome these challenges; here we describe seven such strategies. With a broad scientific audience in mind, we describe the strengths and limitations of each approach, as well as the circumstances under which each might be applied. No single strategy is sufficient for every scenario; thus we emphasize that it is often useful to combine approaches.