We developed a method to classify proteins of a family based on their conserved genomic contexts. Each protein genomic context is compared against all others to determine syntenies. A graph is generated where nodes are input proteins connected by edges when a synteny is observed. Edges weight represent the mean number of genes in the synteny.
To investigate the enzymatic diversity of a family, we used clustering algorithms on this weighted graph to define protein groups which are supposed iso-functional. Furthermore, functions of conserved neighbor genes within each group may give clues to predict the precise function for yet uncharacterized or misannotated proteins of the family.