Master 2 internship: Development of methods for the comparison of pangenome graphs

Recent years have seen the explosion of sequencing projects leading to a deluge of several hundred thousand genomes available in public sequence databases. Comparative genomics approaches in microbiology now use thousands of genomes to analyze a species. Indeed, many studies focus on the overall gene content of a species (the pangenome) to understand its evolution in terms of common genes (“core-genome”) and accessory genes (“variable-genome”) in the light of epidemiological or environmental data [1]. Nevertheless, processing this mass of data requires a paradigm shift in knowledge representation and in the algorithms used [2]. 

The LABGeM has been developing methods for the reconstruction and analysis of pangenomes (PPanGGOLiN method), notably with the identification of regions of genomic plasticity (panRGP method) [3] [4]. In order to extend the functionalities for the exploration of pangenomes, we aim to develop new methods allowing to perform inter-pangenome comparisons and, thus, to explore the genomic dynamics between different species. The algorithms and tools developed during this internship will be applied to study different groups of bacteria for which large amounts of data are available. Particular attention will be given to the functional analysis of genomic islands with regard to the metabolism of organisms in terms of secondary metabolite production. 

We are looking for a highly motivated master student in bioinformatics. The selected candidate should have programming skills in Linux/Bash and Python, as well as good knowledge in microbiology and genomics. This work will benefit from the developments and tools integrated within the MicroScope platform [5] as well as the expertise of the LABGeM team on microbial metabolism .

For more information, you may contact Alexandra CALTEAU ( and David Vallenet ( The position will be located at the Genoscope in Evry.

  1. Golicz AA, Bayer PE, Bhalla PL, Batley J, Edwards D. Pangenomics Comes of Age: From Bacteria to Plant and Animal Applications. Trends Genet. 2020;36: 132–145.
  2. Computational Pan-Genomics Consortium. Computational pan-genomics: status, promises and challenges. Brief Bioinform. 2016. doi:10.1093/bib/bbw089
  3. Gautreau G, Bazin A, Gachet M, Planel R, Burlot L, Dubois M, et al. PPanGGOLiN: Depicting microbial diversity via a partitioned pangenome graph. PLoS Comput Biol. 2020;16: e1007732.
  4. Bazin A, Gautreau G, Médigue C, Vallenet D, Calteau A. panRGP: a pangenome-based method to predict genomic islands and explore their diversity. doi:10.1101/2020.03.26.007484
  5. Vallenet D, Calteau A, Dubois M, Amours P, Bazin A, Beuvin M, et al. MicroScope: an integrated platform for the annotation and exploration of microbial gene functions through genomic, pangenomic and metabolic comparative analysis. Nucleic Acids Res. 2020;48: D579–D589.

Master 2 internship: Development of methods for the comparison of pangenome graphs