There is a massive amount of sequence and structural data available, and the accumulation rate exceeds the pace of functional studies. One way to enhance functional assessment is to mine the available data to inform a strategy that can be applied serially to families of uncharacterized proteins. We have used experimental routes, starting with genomic context and structural information, to determine the functions of uncharacterized bacterial enzymes. We report here a biochemical screen guided by active site modeling to identify the diverse substrates and specialized functions of the DUF849 Pfam family.
We started our search after identifying an enzyme in the lysine fermentation pathway as a ?-keto acid (3-keto acid) cleavage enzyme4 (BKACE). Our bioinformatic analysis of other genomes turned up similar enzymes that were noteworthy for two reasons: first, they occurred in bacteria that did not conduct lysine fermentation, and second, they were part of loci involved in nonlysine metabolic processes. We initiated the examination of this enzyme family (Pfam DUF849) by aligning all of the high-scoring sequence homologs and choosing 322 sequences that were representative of different organisms, genomic regions and homologous clusters to move forward in our analysis. Of these 322, 163 were successfully expressed and used in high-throughput enzymatic assays for biochemical function against 16 carefully chosen ?-keto acid substrates and analyzed for both forward and reverse reactions. Our enzymatic results, coupled with active site modeling and clustering results, enabled them to partition the original BKACE family into 14 functional reaction groups corresponding to 7 subfamilies. The use of structurally diverse R groups on the 3-keto acid backbone provided a broad net to help understand BKACE function and also pointed to the relative substrate flexibility of some of the subgroup members. Alignment and modeling of their active sites after enzymatic analysis pointed to some family members missing key residues in the active site, thus making them non-BKACE DUF849 members. The function of these non-BKACE members will be an important question in light of the enzymatic diversity present in this fold backbone. Thus, we concluded that the BKACE family represents a central scaffold with diverse members that, in total, cover a broad structural range of ?-keto acids.