Application of Optimization to Genome Annotation and Metabolic Network Modeling

Christopher Henry
Seminar

Hundreds of microbial genomes are now being sequenced every day, demanding the analysis of a wide variety of microbes that may be utilized in multiple medical and industrial applications. Metabolic network models are becoming widely popular for the high-throughput prediction of microbial phenotypes and behavior. The Model SEED framework (Henry, DeJongh et al. 2010) provides a means for automatically constructing genome-scale metabolic models using high quality annotations from SEED subsystems (Aziz, Bartels et al. 2008). One of the common problems identified in draft models generated by the Model SEED are numerous gaps in the model pathways, caused by corresponding gaps in the genome annotations. We apply a mixed-integer linear optimization approach to identify the minimal set of reactions required to fill pathways gaps. We also explore the integration of genome annotation data into the optimization formulation, demonstrating how this improves the quality and biological relevance of solutions proposed by gap filling. We demonstrate how gap filling can be used to identify cases of excessive annotation (false positives) in addition to filling in missing annotations. We apply our gap filling algorithms to over 3000 models, gathering statistics on pathway gaps across our entire genome database. We use these studies to identify systematically poorly annotated pathways. We also assess the capacity of all genomes to produce a wide range of biomass compounds. Finally, we use the algorithms to fit our models to available experimental data.

Optimization is also extensively applied in genome-scale modeling to predict metabolic pathway flux and phenotypes. We will discuss the formulations associated with these applications of linear and mixed integer optimization in flux modeling. For example, we apply optimization to predict minimal media required for growth, to fit flux to experimentally measured data, and to disable flux in conditions where the model should not grow (using bi-level optimization). Finally, we will discuss the optimization solvers used in our work, our experience with these solvers, and the mechanisms we use to interact with these solvers, and we will highlight the KBase resource as a means of making flux modeling available to the broader academic community.