Área: Genética e Biologia Molecular ( Divisão N ) ARTIFICIAL INTELLIGENCE APPLITED TO BIOENERGY GENOMICS: PROBABILISTIC ANNOTATION OF MICROBIOAL GENOMES
Fabio Filocomo (FMRP-USP); Ricardo Vêncio (FMRP-USP)
Resumo
INTRODUCTION: Brazil became a
key player on energy production from renewable sources, in particular
ethanol derived from sugarcane (1st-gen) and “waste” cellulose
(2nd-gen). Microbial Genomics has a key role in both approaches.
However, it is important to acknowledge that sequencing data
acquisition have been growing but the actual functional
characterization of these sequences grows
much slower. OBJECTIVE:
We aim to develop computational tools for automatic probabilistic
annotation of microorganisms, in particular the Bioenergy-related:
Leifsonia xyli,
which causes sugarcane diseases and Trichoderma
reesei, a cellulosic
ethanol producer. The technical challenge is to define the
probability that a gene belongs to a given functional category
instead of just assigning a function when it meets some arbitrary
criteria. METHODOLOGY:
The starting point is the Phylogenomics-based SIFTER method,
developed by Engelhardt
and colleagues in 2005, 2006 and 2009, which uses the
Bayesian Networks (BN)
methodology. The BN topology for each gene is build using a
phylogenetic tree (PT) and any information available on related genes
are propagated through the tree using classical BN algorithms. This
procedure is an advancement compared to BLAST-based approaches. The
datasets used are the genome sequences of L.xyli
and
T.reesei,
made available in 2004 and 2008, respectively. RESULTS:
The SIFTER methodology was
implemented as a pipeline in our lab and tests with the current
state-of-the-art tools confirm the claims of superior performance in
preliminary results,
considering the published manually curated data as gold-standard. A
careful theoretical study of BN properties and Microbial Genomics
requirements resulted on the identification of some limitations of
SIFTER's underlying models. In Microbiology it is well know that a
phylogenetic network (PN) would better represent evolutionarily
relationships among genes in contrast to PT due to events such as
lateral transfers. Our results pointed that BNs have properties, such
as being direct acyclic graphs, that fit perfectly into the PN
framework. Theoretical studies resulted in improvements on the
annotation methodology and we anticipate that such new mathematical
models would produce better results when dealing specifically with
microorganisms. The necessary confirmation is being carried out.
Acknowledgments:
MCT/CNPq's grant
Universal-A/470616/2008-3.
Palavras-chave: Bioinformática, Biologia Computacional, Anotação Genômica, Inteligência Artificial, Redes Bayesianas |