M2 internship. Comparative analysis of archaeal meta-pangenomes: linking genomic diversity of species to ecosystems

A characteristic of natural populations is that they are comprised of individuals that are, in the majority of cases, not genetically identical to each other. In the microbial world, variation between individuals appears both as divergence at the single nucleotide level and the presence of hypervariable genomic islands within a more stable set of genes shared by multiple individuals. The direct consequence of these regions is that within the same species, the genomes of two individuals can have very different gene contents. This observation led to the definition of the concept of pangenome which corresponds to all the genes of a species [1, 2]. It consists of the core/persistent genome that is common to almost all members of a species, plus all the flexible/variable genome content that is present in some members of the species. 

Genome-resolved metagenomics, in which shotgun sequencing of environmental DNA is assembled and binned into draft genomes, has profoundly reshaped our understanding of the distribution, functionalities and roles of Archaea. Within the domain, major supergroups are Euryarchaeota, which includes many methanogens, the TACK, which includes Thaumarchaeaota that impact ammonia oxidation in soils and the ocean, the Asgard, which includes lineages inferred to be ancestral to eukaryotes, and the DPANN, a group of mostly symbiotic small-celled archaea. These archaea are not restricted to extreme habitats, but are widely distributed in diverse ecosystems [3–5].

However, there has been only limited analysis of the extent of heterogeneity in gene content within archaeal species [6, 7]. The wealth of metagenome-assembled genomes (MAGs) allows access to gene content heterogeneity within environmental populations of uncultivated archaea. In fact, 34 species-level groups of Archaea, as defined by the Genome Taxonomy DataBase [8], contain more than 10 distinct genomes, a number that has been shown to be sufficient to define pangenomes and detect genomic islands using the tools PPanGGOLiN and panRGP we recently developed in our lab [9, 10].

The aim of this M2 internship is to leverage the hundred thousand MAGs available using our recent methodological developments for the comparative study of meta-pangenomes in Archaea.

First, we propose to systematically analyze the pangenomes of archaeal species. The successful candidate will have to define some metrics to assess the diversity of pangenomes in terms of gene and genomic island content and then to visualize this diversity. The main goal will be to identify the most promising species to study.

The pangenome of the most promising species will then be analyzed in the second part of the internship. Particular attention will be given to the functional analysis of the genomic islands with regard to the biological capacities of organisms in terms of defense systems and metabolic processes. The future discoveries will benefit further functional characterization by biochemists of our institute. The student will also conduct a meta-pangenomic approach to track variations in  gene abundances within the pangenome of a species using read recruitment from metagenomic projects [11]. We plan to add available physical and chemical parameters from sampling sites and perform correlation analysis between the environmental parameters and the observed abundance variations. We anticipate this will yield unique insights into the functional basis of microbial niche partitioning and fitness of archaeal species.

We are looking for a highly motivated student in microbiology, ecology, genomics and/or bioinformatics. The successful candidate will be helped by the tools developed in the lab as well as the expertise of the LABGeM team on microbial genomics and bioinformatics (https://labgem.genoscope.cns.fr). As the internship is fully bioinformatics-focused, a minimal set of skills in scripting and data manipulation would be highly appreciated (e.g bash, R, python…). The start and end dates of the internship can be adapted.

For more information, you may contact Raphaël Méheust (raphael.meheust@genoscope.cns.fr) and David Vallenet (vallenet@genoscope.cns.fr). The position will be located at the Genoscope (http://jacob.cea.fr/drf/ifrancoisjacob/Pages/Departements/Genoscope.aspx) in Evry.

1. Medini D, Donati C, Tettelin H, Masignani V, Rappuoli R. The microbial pan-genome. Curr Opin Genet Dev 2005; 15: 589–594.

2. Tettelin H, Masignani V, Cieslewicz MJ, Donati C, Medini D, Ward NL, et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial ‘pan-genome’. Proc Natl Acad Sci U S A 2005; 102: 13950–13955.

3. Adam PS, Borrel G, Brochier-Armanet C, Gribaldo S. The growing tree of Archaea: new perspectives on their diversity, evolution and ecology. ISME J 2017; 11: 2407–2425.

4. Spang A, Caceres EF, Ettema TJG. Genomic exploration of the diversity, ecology, and evolution of the archaeal domain of life. Science 2017; 357.

5. Baker BJ, De Anda V, Seitz KW, Dombrowski N, Santoro AE, Lloyd KG. Diversity, ecology and evolution of Archaea. Nat Microbiol 2020; 5: 887–900.

6. Deschamps P, Zivanovic Y, Moreira D, Rodriguez-Valera F, López-García P. Pangenome evidence for extensive interdomain horizontal transfer affecting lineage core and shell genes in uncultured planktonic thaumarchaeota and euryarchaeota. Genome Biol Evol 2014; 6: 1549–1563.

7. Tschitschko B, Erdmann S, DeMaere MZ, Roux S, Panwar P, Allen MA, et al. Genomic variation and biogeography of Antarctic haloarchaea. Microbiome 2018; 6: 113.

8. Parks DH, Chuvochina M, Chaumeil P-A, Rinke C, Mussig AJ, Hugenholtz P. A complete domain-to-species taxonomy for Bacteria and Archaea. Nat Biotechnol 2020; 38: 1079–1086.

9. Gautreau G, Bazin A, Gachet M, Planel R, Burlot L, Dubois M, et al. PPanGGOLiN: Depicting microbial diversity via a partitioned pangenome graph. PLoS Comput Biol 2020; 16: e1007732.

10. Bazin A, Gautreau G, Médigue C, Vallenet D, Calteau A. panRGP: a pangenome-based method to predict genomic islands and explore their diversity. Bioinformatics. 2020.

11. Delmont TO, Kiefl E, Kilinc O, Esen OC, Uysal I, Rappé MS, et al. Single-amino acid variants reveal evolutionary processes that shape the biogeography of a global SAR11 subclade. Elife 2019; 8.

M2 internship. Comparative analysis of archaeal meta-pangenomes: linking genomic diversity of species to ecosystems