panRGP: a pangenome-based method to predict genomic islands and explore their diversity

Horizontal gene transfer (HGT) is a major source of variability in prokaryotic genomes. Regions of Genome Plasticity (RGPs) are clusters of genes located in highly variable genomic regions. Most of them arise from HGT and correspond to Genomic Islands (GIs). The study of those regions at the species level has become increasingly difficult with the data deluge of genomes. To date no methods are available to identify GIs using hundreds of genomes to explore their diversity.

The panRGP method predicts RGPs using pangenome graphs made of all available genomes for a given species. It allows the study of thousands of genomes in order to access the diversity of RGPs and to predict spots of insertions. It is a scalable and reliable tool to predict GIs and spots making it an ideal approach for large comparative studies.

Pangenome subgraph of the leuX hotspot in 1413 MAGs of Escherichia coli.

The tool is freely available and easily installable as part of the PPanGGOLiN software suite (https://github.com/labgem/PPanGGOLiN). It is also integrated in the MicroScope platform with a dedicated web page for result analysis and exploration.

Detailed results and scripts to compute the benchmark metrics are available at https://github.com/axbazin/panrgp_supdata.

Preprint on BioRxiv

panRGP: a pangenome-based method to predict genomic islands and explore their diversity