Research Themes

Research in the Bik Lab is intensely interdisciplinary, using high-throughput sequencing and diverse –Omics approaches to explore broad patterns in microbial eukaryote assemblages (biodiversity and phylogeography, functional roles for microbial taxa, and the relationship between species and environmental parameters), with an emphasis on free-living nematodes in marine sediments. Our long-term research interests lie at the interface between biology and computer science, using biological questions and evolutionary hypotheses to drive the development and refinement of –Omic approaches focused on marine microbial eukaryotes. Microbial eukaryotes (organisms <1mm, such as nematodes, fungi, protists, and other ‘minor’ metazoan phyla) are abundant and ubiquitous across every ecosystem on earth, performing key functions such as nutrient cycling and sediment stability in marine habitats. Yet, their unexplored diversity represents one of the major challenges in biology and currently limits our capacity to understand, mitigate and remediate the consequences of environmental change. Given this knowledge gap, the lab research themes span three key areas:

1) "Is Everything Everywhere?": Testing seminal biodiversity & biogeography hypotheses using –Omics
Marine ecosystems have been characterized by a number of overarching, large-scale patterns, including extremely high biodiversity, peaks of species diversity at intermediate depths, and latitudinal gradients of diversity in the deep-sea (e.g. decreasing towards the poles; Ramirez-Llodra et al. 2010, Biogeosciences, 7: 2851). To date, these patterns have been derived from work based on larger organisms (macrofauna and megafauna) and studies using non-molecular classical methods (e.g. morphological taxonomy in studies of microbial metazoa). One primary research aims is to revisit and reassess these reported biogeographic patterns for sediment-dwelling organisms, by amassing large volumes of molecular data that will be compared to the vast body of existing morphological studies (e.g. putatively “cosmopolitan” morphospecies). Our work aims to go far beyond previous studies by investigating the full breadth of microbial taxa in marine sediments (e.g. microbial eukaryotes, fungi, bacteria, archaea, viruses) and leveraging a global sample set that is representative of diverse geographic locations (North Atlantic, Eastern Pacific, Arctic, Antarctic, Gulf of Mexico, etc.), types of sediment habitats (e.g. abyssal plains, canyons, cold seeps, whale falls), and water depths (200-5000+m). These methods will allow us to address the questions: do molecular data and deep sequencing approaches agree with known patterns of diversity and biogeography in the marine sediment habitats? Does biogeography and alpha/beta species diversity vary across taxonomic groups (prokaryotes, fungi, protists, and microbial metazoa)?

2) “Systems Ecology” to investigate ecosystem function and species interactions, with an emphasis on microbial eukaryotes in marine sediments
Inspired by systems biology, a “systems ecology” approach aims to provide a more unrestricted view of natural ecosystems, by using large data volumes to enable the discovery and description of complex interactions within biological systems. A systems ecology approach is increasingly relevant because of the replenishing “microbial seed bank” that appears to persist in the oceans (Caporaso et al. 2012, ISME, 6:1089) and mounting evidence that functional genes are a more important factor for community assembly versus species themselves (Burke et al. 2011, PNAS, 108:14288). Although our research focuses on microbial eukaryotes (with an emphasis on metazoan groups such as nematodes), the nature of “systems ecology” enables exploration and analysis of a much wider taxonomic breadth. For example, shotgun metagenomics will inherently sequence DNA from bacteria, archaea, and viruses, quantifying functional potential and novel diversity in these groups, and in some cases even allowing for computational reconstruction of small genomes. By using such a systems ecology approach, our research goes beyond “Is Everything Everywhere?”, by also asking “are functional pathways (genes, metabolites) and host-microbe interactions conserved in similar habitats?” and “does taxonomic diversity correspond with functional diversity?”.

In order to move towards “systems ecology,” we must move beyond environmental rRNA surveys, incorporating other types of –Omic data to better quantify community assemblages, metabolic pathways, and putative ecological roles. The ultimate goal is to gain a holistic view of ecosystem function. However, making this leap will require new computational tools. Metagenomes currently represent one of the most complex type of environmental datasets we are able to generate, and “taxonomic silos” still persist for the analysis of environmental data; computational analyses of bacteria/archaea, microbial eukaryotes, and viruses must currently be carried out using separate databases and independent software tools. Thus, advancing the bioinformatics toolkit is inherently intertwined with our lab's hypotheses-driven research vision. One goal is to foster the development of increasingly flexible and inclusive bioinformatics pipelines, to enable rapid, integrative analyses of all microbial taxa across independent –Omics datasets. Such integrated pipelines — especially those harnessing phylogeny-driven algorithms — are key for “systems ecology.” A second goal is to develop intuitive, exploratory data visualization frameworks as a new paradigm for the analysis of large –Omic datasets spanning multiple taxonomic domains.

3) Comparative phylogenomics and targeted genome sequencing of microbial eukaryotes as a complement to environmental –Omics
Physical isolation and sequencing of individual microbial eukaryote specimens is now possible, via single-cell approaches and/or protocols designed for low amounts of DNA. Using targeted genome sequencing of individual specimens from marine sediments (focusing on marine metazoan species such as nematodes, tardigrades, kinorhynchs, and other ‘minor’ phyla), we are embarking on a large-scale effort to fill in the branches of the eukaryotic tree of life. Current eukaryotic databases are exceedingly sparse (compared to known species diversity), and this paucity of data critically limits our ability to analyze and interpret environmental –Omic datasets or reconstruct deep phylogeny. In particular, the interpretation of metagenomic data is still inherently reliant on comparisons to available genome sequences and their corresponding annotations. The taxonomic utility of phylogenetic marker genes is also constrained by the coverage of the original reference database used to build the markers. A robust, openly accessible collection of marine metazoan species in public databases will represent a significant long-term community resource that can be re-analyzed and mined in future environmental studies, and also used to inform predictive modeling approaches. Furthermore, genome sequences will allow us to link environmental contigs with a known taxonomic ID (e.g. “hypothetical” proteins), identify clade-specific marker genes for taxonomic classification and phylogenetics, and identify the genomic repertoire (thus inferring putative ecological functions) of each eukaryotic specimen. This final facet of our research vision aims to address questions such as: Are there divergent evolutionary lineages and endemic hot spots in unique or isolated seafloor habitats? Can we reconstruct the historical evolutionary trajectory of the deep sea (as a source vs. sink habitat)? Can we identify genomic signatures of adaptation and diversification in microbial metazoa?

Ongoing research projects in the lab are as follows:


Genomic responses to the Deepwater Horizon event and development of high-throughput biological assays for oil spills

This is a collaborative project led by Kelley Thomas at the University of New Hampshire; Co-PIs include the Bik Lab, Paul Montagna the Harte Research Institute for Gulf of Mexico Studies (Texas A&M), and Jon Norenburg at the Smithsonian National Museum of Natural History. Our grant is funded through the Gulf of Mexico Research Initiative, and a detailed overview of the award can be found on the GOMRI website.

The overall aim of this project is to improve the response, mitigation, detection, characterization and remediation associated with oil spills and accompanying release of gas.

The benthic environment in the Gulf of Mexico (GoM) is biologically hyper-diverse, performing critical ecosystem functions that have consequences for the ecology of the entire GoM region. Benthic communities are strongly impacted by oil spills, which render them a valuable tool for assaying and monitoring the impacts of contamination. However, detailed and extensive characterization of these communities has been impractical for due to the tedious and time-consuming nature of the taxonomic efforts required to accurately describe small benthic fauna. Our project leverages high-throughput sequencing technologies that now enable rapid, accurate, and cheap assays of community biodiversity. To achieve these goals, our GOMRI project team brings together the interdisciplinary expertise in marine biology, taxonomy, genomics and bioinformatics necessary for the development of a meaningful and robust technology. Project goals include three main objectives:

  1. Use targeted sequencing of individual benthic eukaryotes to generate a representative sample of diverse genomes from which to select an expanded set of nuclear and mitochondrial loci for targeted mining of shotgun metagenomic data.
  2. Assess eukaryotic community structure across space and time via high-throughput sequencing of environmental metagenomes using a new and expanded array of nuclear and mitochondrial marker genes.
  3. Establish Standard Operating Procedures (SOPs) and reproducible bioinformatic workflows for environmental monitoring of oil spills. This will include establishing a database for integration of taxonomic and molecular datasets, and dissemination of tools and educational resources.

One of the most important goals of this project is training the next generation of environmental biologists with interdisciplinary tools. Toward that goal, we will organize two formal workshops each year. These workshops will expose students to the full spectrum of this technology from sample preparation, through taxonomy, to metagenomics and bioinformatics. These workshops are also opportunities to attract underrepresented groups and to link the research team with GoM stakeholders. All workshops will be held at the Harte Research Institute for Gulf of Mexico Studies at Texas A&M University, Corpus Christi.

The proposal leverages a set of pre- and post-spill samples from diverse, impacted benthic habitats, some of which have already received significant analysis. The group also brings significant cyberinfrastructure (databases) and advanced bioinformatics tools (PhyloSift, iPython workflows, data visualization software) that will be modified to support the specific goals of this proposal.

Phinch - http://phinch.org

phinch

Phinch is an interactive, browser-based visualization framework that facilitates the rapid exploration of biological patterns in high-throughput -Omic datasets. This framework takes advantage of standard file formats from computational pipelines in order to bridge the gap between biological software (e.g. microbial ecology pipelines such as QIIME) and existing data visualization capabilities (harnessing the flexibility and scalability of technologies such as HTML5). This project is a collaboration between the Bik Lab and Pitch Interactive (a data visualization studio based in Oakland, CA); development of the current prototype framework was funded by a grant from the Alfred P. Sloan Foundation. Phinch is open source software - the underlying code is available on GitHub and detailed software documentation can be found on the accompanying GitHub wiki (e.g. supported file formats and features of visualization tools)

Phinch is optimized for use in the Google Chrome browser. It currently supports downstream analyses of BIOM v1.0 files (Biological Observation Matrix files, a JSON-formatted file type typically used to represent marker gene OTUs or metagenomic data). All sample metadata and taxonomy/ontology information MUST be embedded in the BIOM file before being uploaded.

Reference: Bik HM, Pitch Interactive (2014) Phinch: An interactive, exploratory data visualization framework for –Omic datasets, bioRxiv, doi:10.1101/009944 (preprint)

Genomic baseline surveys of Arctic meiofauna

This project is a collaboration with Sarah Hardy at the University of Alaska, Fairbanks, funded through an award from the North Pacific Research Board. This research focuses on a set of >100 marine sediment samples colleted from two ocean regions (the Beaufort and Chuckchi Seas) off the north slope of Alaska. We are collecting and comparing three types of independent datasets: 1) Morphological taxonomy data from nematode communities, 2) Enviromental marker gene amplicons (16S and 18S rRNA) to broadly examine bacteria, archaea, and microbial eukaryote communities, and 3) Shotgun metagenomic data as a “PCR-free” approach for describing microbial community assemblages. Our goal is to compare biological insights gained from traditional taxonomy vs. high-thrughput genomic datasets, and to assess meiofaunal community diversity and phylogeographic patterns in understudied Arctic regions.

NSF Reserach Coordination Network EukHiTS - http://eukhits.wordpress.com

RCN EukHiTS (Eukaryotic biodiversity research using High-Throughput Sequencing) is a collaborative project between the Bik Lab at UC Riverside and Kelley Thomas’s lab at University of New Hampshire. EukHiTS is funded through a Research Coordination Network award from the National Science Foundation (DBI-1262480); the goal of this project is to catalyze formation of an international scientific network focused on -Omic investigations of microbial eukaryotes and promote cross-disciplinary interactions.

Microscopic eukaryote species (organisms <1mm, such as nematodes, fungi, protists, etc.) are abundant and ubiquitous, yet invisible to the naked eye, in every ecosystem on earth. The biodiversity and geographic distributions for most of these species are largely unknown, and represent one of the major knowledge gaps in biology. High-throughput DNA sequencing technologies now allow for deep examination of virtually all microscopic organisms present in an environmental sample. For microbial eukaryote taxa, en masse biodiversity assessment using traditional loci (rRNA genes) can be conducted at a fraction of the time and cost required for traditional (morphological) approaches. Despite this promise, current bottlenecks include the lack of useful distributed tools for analysis and common data standards to allow global comparisons across individual studies as well as missing links between molecules and morphology. RCN EukHiTS is focusing on developing community capabilities for computational approaches focused on eukaryotic taxa and the infrastructure, both cyber and human, needed for effective interpretation of large high-throughput datasets. The steering committee of RCN EukHiTS includes expertise from computational biology, functional genomics, computer science, taxonomy, ecology, database resource management, and representatives of end user communities to ensure that all aspects of the community are well-represented.

This NSF RCN builds on two previous community meetings organized by PIs Holly Bik and Kelley Thomas: a 2014 SMBE Satellite Meeting on Eukaryotic -Omics and a 2011 NESCent Catalysis Meeting on “High-Throughput Biodiversity Assessment using Eukaryotic Metagenetics”.

PressForward (External Collaboration)

PressForward is a software tool that enables curation and sharing of content from around the internet. This free wordpress plugin works to collect inforation via an RSS feed reeder and browser bookmarklet, amalgamating, republishing and sharing content in one central location (typically via blog feed). The PressForward project is being led Stephanie Westcott at the Roy Rosenzweig Center for History and New Media at George Mason University.

As a PressForward Pilot Partner, we have guided the installation of this software tool on the microBEnet and Deep-Sea Biology Society websites, and helped build a team of community editors to review and disseminate content on a daily basis. Our goal as a Pilot Partner is to reduce the manual effort required for website administration (minimizing the administrative burden for scientists maintainting project websites), and to aggregate and share content in a way that is not currently possible with the default Wordpress backend.

microBEnet (External Collaboration)

microBEnet (the Microbiology of the Built Environment Network - http://www.microbe.net) is an online portal for resources related to the microbiology of the built environment. This project is led and maintained by Jonathan Eisen’s lab at the University of California, Davis, and funded by a grant from the Alfred P. Sloan Foundation. The microBEnet project has historically focused on three main categories of tasks, including 1) organizing meetings and workshops, 2) leveraging social media to facilitate communication and collaboration, 3) curating and creating online resources to facilitate work in the Built Environment and to build a culture of openness and sharing. PI Holly Bik has been involved with curating online resources and social media tools for microBEnet, as well as providing bioinformatics oversight and design advice for microBEnet’s citizen science and undergraduate research projects.

Reference: Bik HM, Coil D, Eisen JA (2014) microBEnet: Lessons learned from building an interdisciplinary scientific community in the online sphere, PLoS Biology, 12(6): e1001884

PhyloSift (External Collaboration)

PhyloSift is a software pipeline for phylogenetic analysis of genomes and metagenomes. Using any biological sequence as input data (nucleotide or amino acid), PhyloSift uses a reference database of profile HMMs to identify candidate sequences matching phylogenetically-informative marker genes. Candidate sequences identified from input data are then subjected to phylogenetic placement approaches, where short reads are inserted into reference phylogenies and given taxonomic assignments based on this tree placement. PI Holly Bik was actively involved in development and documentation of the PhyloSift pipeline, and we continue to use this software for ongoing metagenomics data analysis. Software download and extensive documentation are available on the main PhyloSift website, and the code is maintained as an open source repository on Github.

Reference: Darling A, Jospin G, Lowe E, Matsen FA, Bik HM, Eisen JA (2014) PhyloSift: phylogenetic analysis of genomes and metagenomes, PeerJ, 2:e243.