25º Congresso Brasileiro de Microbiologia
ResumoID:1919-1


Área: Genética e Biologia Molecular ( Divisão N )

ARTIFICIAL INTELLIGENCE APPLITED TO BIOENERGY GENOMICS: PROBABILISTIC ANNOTATION OF MICROBIOAL GENOMES

Fabio Filocomo (FMRP-USP); Ricardo Vêncio (FMRP-USP)

Resumo

INTRODUCTION: Brazil became a key player on energy production from renewable sources, in particular ethanol derived from sugarcane (1st-gen) and “waste” cellulose (2nd-gen). Microbial Genomics has a key role in both approaches. However, it is important to acknowledge that sequencing data acquisition have been growing but the actual functional characterization of these sequences grows much slower. OBJECTIVE: We aim to develop computational tools for automatic probabilistic annotation of microorganisms, in particular the Bioenergy-related: Leifsonia xyli, which causes sugarcane diseases and Trichoderma reesei, a cellulosic ethanol producer. The technical challenge is to define the probability that a gene belongs to a given functional category instead of just assigning a function when it meets some arbitrary criteria. METHODOLOGY: The starting point is the Phylogenomics-based SIFTER method, developed by Engelhardt and colleagues in 2005, 2006 and 2009, which uses the Bayesian Networks (BN) methodology. The BN topology for each gene is build using a phylogenetic tree (PT) and any information available on related genes are propagated through the tree using classical BN algorithms. This procedure is an advancement compared to BLAST-based approaches. The datasets used are the genome sequences of L.xyli and T.reesei, made available in 2004 and 2008, respectively. RESULTS: The SIFTER methodology was implemented as a pipeline in our lab and tests with the current state-of-the-art tools confirm the claims of superior performance in preliminary results, considering the published manually curated data as gold-standard. A careful theoretical study of BN properties and Microbial Genomics requirements resulted on the identification of some limitations of SIFTER's underlying models. In Microbiology it is well know that a phylogenetic network (PN) would better represent evolutionarily relationships among genes in contrast to PT due to events such as lateral transfers. Our results pointed that BNs have properties, such as being direct acyclic graphs, that fit perfectly into the PN framework. Theoretical studies resulted in improvements on the annotation methodology and we anticipate that such new mathematical models would produce better results when dealing specifically with microorganisms. The necessary confirmation is being carried out. Acknowledgments: MCT/CNPq's grant Universal-A/470616/2008-3.


Palavras-chave:  Bioinformática, Biologia Computacional, Anotação Genômica, Inteligência Artificial, Redes Bayesianas