We are looking for an enthusiastic Ph.D student to work on the development of new methods for comparative pangenomics. See below a description of the project.
The last few years have seen the explosion of sequencing projects leading to a deluge of several hundred thousand genomes available in public databases. Comparative genomics approaches in microbiology now use thousands of genomes to analyze the diversity of a species. Indeed, many studies focus on the overall gene content of a species (the pangenome) to understand its evolution in terms of common and accessory genes with regard to epidemiological or environmental data . Nevertheless, the processing of such mass of data imposes a paradigm shift in knowledge representation and in the algorithms used .
In this context, our laboratory has been working several years on a new model to represent genomic data in the form of a pangenome graph, which makes it possible to compress the information of thousands of genomes while preserving the chromosomal organization of genes. We have thus developed methods for the reconstruction and analysis of pangenomes (PPanGGOLiN method)  and the identification of regions of genomic plasticity (RGPs; panRGP method) .
The aim of this PhD thesis is to achieve new methodological developments for the comparative study of pangenomes. This will involve the development of new bioinformatic methods for inter-pangenome comparisons, which will be particularly based on the developments carried out for the identification and characterization of RGPs in functional sub-modules (panModule method). RGPs include both regions which are exchanged between strains by horizontal gene transfer (such as genomic islands) and regions lost differentially among lineages. They are of paramount importance for understanding the adaptive potential of bacteria. The exploration of these functional modules in different species will provide a better understanding of the evolutionary dynamics behind the metabolic diversity of microorganisms.
The algorithms and tools developed during this project will be applied to study different bacterial groups of medical, agronomic or biotechnological interest such as actinobacteria, firmicutes or enterobacteria for which large amounts of data are available. These methods might also be applied at the scale of an ecosystem in order to understand the dynamics of genomes and the interactions between different species living in the same environment. Particular attention will be given to the functional analysis of genomic islands with regard to the metabolism of organisms in terms of production of secondary metabolites or catabolic pathways.
This work will benefit from the developments and tools integrated within the MicroScope platform (mage.genoscope.cns.fr/microscope)  as well as the expertise in our research unit on microbial metabolism. The tools developed in the context of the thesis will be promoted within the MicroScope platform to meet the analysis needs of academic and industrial partners. One of the originalities of this thesis work lies in the pangenomic approach for comparative genomics which addresses one of the challenges of bioanalysis in the era of big data in biology.
- Golicz AA, Bayer PE, Bhalla PL, Batley J, Edwards D. Pangenomics Comes of Age: From Bacteria to Plant and Animal Applications. Trends Genet. 2020;36: 132–145.
- Computational Pan-Genomics Consortium. Computational pan-genomics: status, promises and challenges. Brief Bioinform. 2016. doi:10.1093/bib/bbw089
- Gautreau G, Bazin A, Gachet M, Planel R, Burlot L, Dubois M, et al. PPanGGOLiN: Depicting microbial diversity via a partitioned pangenome graph. PLoS Comput Biol. 2020;16: e1007732.
- Bazin A, Gautreau G, Médigue C, Vallenet D, Calteau A. panRGP: a pangenome-based method to predict genomic islands and explore their diversity. Bioinformatics. 2020;36: i651–i658 doi:10.1093/bioinformatics/btaa792
- Vallenet D, Calteau A, Dubois M, Amours P, Bazin A, Beuvin M, et al. MicroScope: an integrated platform for the annotation and exploration of microbial gene functions through genomic, pangenomic and metabolic comparative analysis. Nucleic Acids Res. 2020;48: D579–D589.