PhD Thesis available at LABGeM


Beyond metagenomics: from analyzes by high-throughput sequencing to the development of bioinformatics strategies for the study of microbial communities in the scope of human health

Advances in sequencing technologies (NGS) have opened new perspectives in the exploratory genomics, especially in metagenomes analysis that represent an entire ecosystem (environmental, clinical or synthetic). The high-throughput sequencing of microbial communities appears as a powerful exploration tool because it not only provides a description of a sample of genomic content (estimation of biodiversity), but also an overview of the functional potential of an environment (metabolic capabilities). In addition, it provides an opportunity to pinpoint non-cultivable organisms in the laboratory and to study the relationships between all members of the ecosystem through time and space. Associated with global functional approaches (metatranscriptomics, metaproteomics, metabolomics), these techniques should increase our understanding of ecosystem.

Several bioinformatics software have been developed to analyze microbial metagenomes, however none makes it possible to combine heterogeneous data (genomic, transcriptomic, epidemiological …), and a more integrative analysis of these communities is clearly required. As part of his thesis, the candidate will have to develop innovative and powerful bioinformatics processes to study these microbial metagenomes. The proposed project will be organized around three complementary points, namely i) adapt and/or develop new algorithms to address an increase of data in terms of both quantity and quality, especially by the integration of heterogeneous data (contextual data about the habitat, taxonomic data, gene functions, gene expression, co-occurrence of organisms, epidemiological data for clinical samples); ii) implement statistical tools and metrics to analyze and compare these habitats/ecosystems; iii) test the techniques developed in 1) and 2) on test samples previously sequenced at CEA-Genoscope for the need of various projects (NRBC PathoTrack project, MicroScope platform instance dedicated to reference microbial genomes from human intestinal tract microbiome).

This thesis will take place in the context of Genoscope: one of its main activity focuses on the control of microbial NGS data to assess their capabilities and limitations in various fields of study such as variants discovery, transcriptomics and metagenomics. This work, which will be carried out in the Laboratory of Bioinformatics for Genomics and Metabolism (LABGeM) will also benefit from the expertise of different Genoscope laboratories (Laboratory of Genomics and Biochemistry of Metabolism, Chemistry Laboratory, etc).

Interested candidates should have a strong bioinformatics background, good skills in basic statistics and knowledge in microbiology. A first experience in analyses of high throughput sequencing data would be a real advantage.

The successful Ph.D. student will be hired by and based at CEA-Genoscope-Evry (91) and member of the LABGeM team. He/she will work in collaboration with bioinformatics specialists and biologists.


Applications will be in the frame of the Irtelis recruitment campaign, which will fund 20 PhD grants 3 years from the end of 2016 to the Fundamental Research Department (DRF) of the CEA (www. The application deadline is fixed for March 25, 2016.

Then an ad-hoc jury will pre-select the candidates who will be invited from 25 to 27 May 2016 at the Saclay research center CEA (near Paris) to:

  • give a talk to a panel of scientists from the CEA and doctoral partner schools,
  • have an informal meeting with the PIs proposing subjects (poster session).

For candidates not living in the Paris region “low-cost” transport  and accommodation costs will be borne by the DRF.

The web page containing all relevant information (thesis topics and application form) is:

Interested candidates should contact Stéphane Cruveiller (; cc David Vallenet and Claudine Médigue prior definitive application on the irtelis website.


PhD Thesis available at LABGeM