“Speeding Up Science” in Environmental -Omics (Workshop #2): a hackathon for compiling reproducible Jupyter/Binder workflows, October 23-25, 2019
Event page: https://speeding-up-science-workshops.github.io/
Apply to participate: https://forms.gle/vHbWL2caLTMdJPQMA (all travel expenses covered for selected participants, workshop funding provided by the Gordon & Betty Moore Foundation) - application form closes Tues Sept 24, 2019 (11:59PM PST). Notification of application outcome will be sent out by Fri Sept 27, 2019.
Code of Conduct: All workshop attendees are expected to agree with the following code of
conduct. We will enforce this code as needed. We expect cooperation from all attendees to help ensuring a safe environment for everybody.
Workshop events are neither a dating scene nor an intellectual contest.
The UC Davis DSI is dedicated to providing a harassment-free experience for everyone, regardless of gender, gender identity and expression, age, sexual orientation, disability, physical appearance, body size, race, or religion (or lack thereof). We do not tolerate harassment of participants in any form. Sexual language and imagery is generally not appropriate for any DSI venue.
Holly and Titus are your primary contacts for Code of Conduct issues. Please contact Pamela Reynolds at firstname.lastname@example.org if you have any additional concerns.
Workshop Objectives: This 2.5-day workshop will focus on developing application-specific Jupyter notebooks which are executable/launchable via Binder. The goal of these workshops is to “reverse engineer” common data visualization approaches used for biological data analysis and commonly published in scientific journal articles (heatmaps, read/OTU summaries, etc.). Workshop goals/products include:
Compile reproducible code workflows for the three common environmental -Omics approaches (metabarcoding, metagenomics, or metatranscriptomics). Jupyter notebooks will contain functional code and customizable parameters (with documentation/explanation) that users can adapt and deploy on their own data sets, assuming import of “standard” data formats that are typically generated during standard -Omics pipelines (e.g. FASTQ files of reads/contigs, OTU tables for metabarcoding).
Gather information from end users in the life sciences about their computational needs and ongoing challenges. Where do the gaps exist in terms of your data analysis needs? Where do existing tools, pipelines, tutorials and trainings fall short?
Lay the foundations for new open-source online lessons focused on “analyze your own data” training (where participants come to a workshop to specifically learn to analyze their own in-hand environmental -Omics datasets).
What to Bring: Participants should plan on bringing the following along to the workshop:
A list of the most common tools, software packages, and database resources that you have used recently for your environmental -Omics research projects.
Code, scripts, and data analysis documentation from recent projects or papers in your field. This can be your own code or examples borrowed/reused from someone else.
One or two example datasets that you can work with during the workshop (metabarcoding, metagenomics, metatranscriptomics - we are trying to compile separate Jupyter/Binder workflows for these three categories of environmental -Omics approaches). Plan to bring along raw data as well as any downstream processed files/outputs you may have access to.
Workshop Rationale: There is a pressing need for bioinformatics tools and software pipelines that effectively reduce a researcher's “time to science”. The expanding complexity and diversity of common software pipelines are increasingly distant from day-to-day end user needs. Despite the proliferation of -Omics methods, software installation and data wrangling (e.g. demultiplexing, converting data formats, writing custom scripts for common tasks) continue to represent a significant hurdle for rapid, hypothesis-driven data exploration, particularly for end users with non-computational backgrounds in the life sciences. Bioinformatics tools and database resources are also driven by research areas with large funding allocations and a critical mass of researchers (e.g. bacteria-driven studies of the Human Microbiome, model organism communities such as C. elegans and Drosophila), leading to significant inequality in the taxonomic groups and ecosystems that are considered during tool development.
Furthermore, community initiatives and software tutorials are failing to address the needs of computational scientists with intermediate skill levels (e.g. researchers with some level of computational proficiency, who use a mixture of different -Omics approaches to conduct hypothesis-driven research). As scientific research becomes increasingly interdisciplinary, and trainees work on projects that straddle multiple traditional disciplines, there is a need for end-user driven, executable software pipelines to carry out commonplace data analysis tasks for routine -Omics approaches (metabarcoding, metagenomics, RNA-seq). Such software pipelines would take advantage of existing computational infrastructure (Jupyter Notebooks, Binder) and standard data visualization approaches (taxonomy/ontology summaries, counts of reads/OTUs, alpha- and beta- diversity analyses). By focusing on Jupyter Notebooks we will be able to mix and match code from different programming languages and pipelines (python, R, QA/QC tools, statistical/summary pipelines), enabling end users to quickly visualize and explore their datasets without needing immediate tool- or language-specific knowledge.