Functional genomic annotation with paraconsistent logic through biological network

One consequence of increasing sequencing capacity is the the accumulation of \textit{in silico} predictions in biological sequence databanks. This amount of data exceeds human curation capacity and, despite methodological progress, numerous errors on the prediction of protein functions are made. Therefore, tools are required to guide human expertise in the evaluation of bioinformatics predictions taking into account background knowledge on the studied organism.

GROOLS (for “Genomic Rule Object-Oriented Logic System”) is an expert system that is able to reason on incomplete and contradictory information. It was developed with the objective of assisting biologists in the process of genome functional annotation by integrating high quantity of information from various sources. GROOLS adopts a generic representation of knowledge using a directed acyclic graph of concepts that represent the different components of a biological process (e.g. a metabolic pathway) connected by two types of relations (i.e. “part-of” and “subtype-of”). These concepts are called “Prior Knowledge concepts” and correspond to theories for which their presence in an organism needs to be elucidated. They serve as basis for the reasoning and are evaluated from observations of “Prediction” (e.g. a predicted enzymatic activity) or “Expectation” (e.g. growth phenotypes) type. Indeed, GROOLS implements a paraconsistent logic on set of facts that are observations. Using different rules, “Prediction” and “Expectation” values are propagated on the graph as sets of truth values. At the end of the reasoning, a conclusion is given on each “Prior Knowledge concepts” by combining “Prediction” and “Expectation” values. Conclusions may, for example, indicate a “Confirmed-Presence” (i.e. the function is predicted and expected), a “Missing” concept (i.e. the function is expected but not predicted) or an “Unexpected-Presence” (i.e. the function is predicted but not expected in the organisms).

GROOLS reasoning was applied on several organisms and with different sources of “Predictions” (i.e. annotations from UniProtKB or MicroScope) and biological processes (i.e. GenomeProperties and UniPathway). For “Expectations”, growth phenotype data and amino-acid biosynthesis pathways were used. GROOLS results are useful to quickly evaluate the overall annotation quality of a genome and to propose annotations to be completed or corrected by a biocurator. More generally, the GROOLS software can be used to improve the reconstruction of the metabolic network of an organism which is an essential step in obtaining a high-quality metabolic model.

 

Functional genomic annotation with paraconsistent logic through biological network