Professor Emeritus, Institute for Genomics and Bioinformatics
Donald Bren School of Information and Computer Sciences
Professor Emeritus, Chemical Engineering & Materials Science
The Henry Samueli School of Engineering
Professor Emeritus, Microbiology & Molecular Genetics
School of Medicine
Director of Graduate Studies and Technology Transfer, UCI Institute for Genomics and Bioinformatics (IGB)
Co-founder, Verdezyne, Inc. (formerly CODA Genomics Inc.) Carlsbad, CA
Co-Founder, Board Director, President and CEO, Actavalon Inc., Irvine, CA
Co-founder, CEO and Manager, 4Design Biosciences LLC, Corona del Mar, CA
Director of the Computational Biology Reseach Laboratory of the IGB, 2002 - 2012
B.S., University of California, Santa Barbara, Analytical Biology
Ph.D., Purdue University, Molecular Biology and Biochemistry
Postdoctoral, Duke University, 1970, Biophysical Chemistry
Phone: (949)244-5684 mobile
University of California, Irvine
246 Information and Computer Sciences II
Mail Code: 3445
Irvine, CA 92697
Molecular mechanisms of biological control systems; global gene expression; genomics; bioinformatics; computational biology
2015 Purdue University Science Distinguished Alumnus Award
2012 Discover Research Award of the UCI/NIH Chao Family Comprehensive Cancer Center
Purdue University Biological Sciences Alumnus of the Year Award, 2007
UCI 40th Anniversary Faculty Innovation and Entrepreneurship Award, 2005
UCI Athalie Clark Outstanding Researcher Award, 1999
Elected Fellow of the Academy of Microbiology, 1993
IPA Visiting Scientist Award, National Cancer Institute, 1980
American Society of Microbiology Eli Lilly Research Award, l975
National Institutes of Health Career Development Award, l970-75
National Institutes of Health Individual Postdoctoral Fellowship Award l968-70
National Institutes of Health Individual Predoctoral Fellowship Award, l964-68
Continuous research program funding from the NIH from 1964 to retirement in 2006
Assistant Professor of Medical Microbiology, College of Medicine, University of California, Irvine, l970-72.
Associate Professor of Medical Microbiology, College of Medicine, University of California, Irvine, l973-76.
Professor of Microbiology and Molecular Genetics, College of Medicine, University of California, Irvine, l977 to 2006.
Professor Emeritus, Dept. of Microbiology and Molecular Genetics, SChool of Medicine, and the UCI Institute for Gnomics and Bioinformatics, 2006 - present.
Professor of Chemical Engineering and Materials Science, Samueli School of Engineering, University of California, Irvine, l994 to present.
Co-Director, UCI Institute for Genomics and Bioinformatics, 2000 to 2006.
Co-director, NIH Biomedical Informatics Training (BIT) Program for graduate students and postdoctoral fellows, 2002-2012
Member, Purdue University Biological Sciences Alumni Advisory Committee, 2008-2012
CBRL TECHNOLOGY SUMMARY
The researchers in the CBRL apply innovative, UC patent protected, technologies of IGB faculty towards challenging life science problems. The mission of the CBRL is to employ these technologies to facilitate the research programs of UCI faculty and industrial collaborators. The CBRL technology is an interdisciplinary combination of proprietary computational and biological tools to analyze, design, and optimize novel biochemical pathways in microorganisms for the economically competitive biosynthesis of commercial chemicals. The following is a review of the programs inlace during the time I was the Director of the CBRL from 2002 - 2012.
The CBRL integrates flux and kinetic modeling in microorganisms in a powerful new tool to design and optimize pathways that produce target chemicals. The computational design of novel biochemical pathways is facilitated by proprietary automated methods, kMech™ and Cellerator™ (U.S. Patent 7,319,945; 1, 2) for simulating biological pathways and regulatory networks, (3- 6).
Currently available pathway modeling approaches use the Michaelis–Menten kinetic equation for one substrate/one product enzyme reactions and the King–Altman method to derive equations for more complex multiple reactant reactions. These are steady-state velocity equations since the derivatives of the concentration of each reactant in the model over time are set to zero to reduce a set of non-linear differential equations to linear algebraic equations. While these metabolic flux models provide valuable information about mass transfer and metabolite distribution, they provide no dynamic information about the biology of the cell in response to changing environments.
In contrast to flux analysis, kinetic modeling can reflect dynamic changes in the cellular metabolic state. The CBRL nonlinear kinetic modeling approach is based on the physicochemical properties of enzymes under physiological regulation. Kinetic modeling generates non-simplified, non-linear, differential equations that describe the change in reactant levels over time, and therefore reflect the dynamic cellular response to environmental and metabolic conditions. We have developed proprietary methods to generate rate constants (kf, kr, ki, ka, etc.) from easily measured or computationally approximated kinetic constants (Km, kcat), cellular enzyme concentrations, and other data. This more sophisticated modeling better reflects the dynamic cellular response to environmental and metabolic conditions.
The CBRL integration of flux and kinetic modeling allows investigators to incorporate both kinetic and steady-state parameters into models greatly improving the approximation of biosystems.
Following computational prediction and modeling of pathway(s) and identification of necessary host modifications, enzyme expression cassettes can be synthesized and introduced into the production organism. This task is facilitated by the availability of genomic sequences representing diverse microbes, fungi, plants and animals. These DNA sequences of genes for a vast array of enzymes with specific catalytic functions provide a rich source of enzymes and activities for the design of efficient, combinatorial or novel biosynthetic pathways. The rapid, error-free DNA synthesis of expression cassettes for enzymes with desired catalytic activities optimized for target organism expression is enabled by our proprietary computationally optimized DNA assembly (CODA) technology (U.S. Patent 7,262,0312; 7, 8).
Approaches to protein engineering vary from rational design, where proteins are engineered from fundamental principles, to random mutagenesis, where random mutations are introduced into a starting set of proteins. Both strategies are often followed by some form of directed evolution to select for the desired properties. In general, neither of these strategies is ideal. Rational design is still at a relatively early stage of development due to a limited understanding of protein folding, structure, and function. Random mutagenesis faces an astronomically large protein space and is often laborious and inefficient.
The CBRL offers a new protein engineering methodology that employs bioinformatics to integrate multiple sources of information. This knowledge-based approach informs and focuses the design of new classes of non-random, defined combinatorial DNA libraries referred to as Rational Search.
Rational Search is the combination of a rational focus driven by bioinformatics analyses with a combinatorial search component associated with defined DNA libraries. In general, the bioinformatics component must flexibly incorporate information from multiple sources, as well as multiple goals and constraints. Information to be considered may include: (1) crystallographic, NMR, or predicted information about the structures of the relevant proteins in apo and holo form; (2) evolutionary information of ortholog or paralog sequences and their properties; (3) information from the literature; and (4) mutagenesis experiments and the structural or functional effects of any mutations. Multiple goals and constraints may include: (1) protein function (e.g., change in substrate specificity); (2) protein structure (e.g., preserve structure of binding site and improve stability); and (3) protein production (e.g., improve folding, solubility, or secretion). In addition, constraints from the library technology and from the final assay used to assess protein properties also must be taken into consideration during library design.
Often structural considerations for library designs rely on protein structure predictions. Prof. Pierre Baldi and his research group have developed the SCRATCH suite of tools (9-13) for the analysis and prediction of protein structures and structural features. While the prediction of full 3D structure for complex proteins remains challenging, tools for predicting other structural features such as secondary structure (9) or relative solvent accessibility (10) have a correct prediction rate of about 80% on a per amino acid basis. We also maintain and regularly use third party tools. In combination, these tools can be used to provide the basic information about the proteins of interest, suggest the effect that mutations can have on protein structure and function, and inform library design.
LIBRARY DESIGN. The questions of where variation will be introduced and the nature of that variation are critical for two reasons. First, experimental assays may detect improvement of one property, but be blind to disruption of another. Second, as the number of sites and degree of variation increase the coverage will decrease. Therefore high coverage of a few knowledge-based sites may be more beneficial than low coverage of an astronomically large theoretical library size with more sites (14). Among other types of information, information that we typically use in analysis includes secondary structure, solvent accessibility, proximity to cofactor or substrate binding, variation observed at different levels of homology analysis, and organism-specific codon pair bias. In the absence of explicit structural information structure features are predicted using SCRATCH and other methods. Structural analysis of the subset of non-identical residues between specific pairs of homologous sequences can also provide valuable information about which changes can be made independently and which are highly dependent on other changes. Libraries can be designed with specific changes, or degenerate codons, for individual or adjacent sequence sites.
GENE LIBRARY TYPES ENABLED BY CBRL TECHNOLOGIES
An advantageous consequence of the CODA thermodynamic optimization technology is that every location in the gene is assigned a globally unique thermodynamic address. Consequently, every site-directed mutatgenesis oligonucleotide goes where, and only where, it is desired. This eliminates incorrect cross-hybridizations previously identified as a source of mutagenesis failure. Here we describe the use of this CODA enabled mutational accuracy to construct protein structure-guided, controlled diversity, completely defined gene libraries such as those described below.
It is often desirable to remove the ends of a protein for example to isolate a soluble enzymatically active core domain for crystallization. End-deletion libraries are often made by mechanical disruption or enzymatic digestion of a gene, but this is inefficient because only one out of nine resulting products have the correct reading frame at both the N- and C-termini and some deletion products are missed entirely. With CODA technology it is possible to generate a complete structure-guided, base pair position defined, deletion gene library in a single reaction tube only with both N-terminal and C-terminal in-frame endpoints and knowledge that all desired end-point combinations are present in the library.
Saturated Amino Acid Scanning Library.
With CODA technology it is possible to guarantee all 64 possible codons (or any codon combination) at each residue position or structure-guided target region(s) of a gene, such that each gene has only one changed amino acid (15).
Libraries with Controlled Distribution and Number of Amino Acid Changes.
Modern structural biology methods are able to identify protein regions that can accommodate certain amino acid changes without compromising the structural integrity of the protein. These changes can be explored for improved protein properties such as activity, substrate specificity, solubility, and so forth. A common strategy for producing a gene library for certain amino acids at defined positions is to replace the codons for these amino acids with degenerate mixtures of bases, for example “NNK” where N is a mixture of all four nucleotides and K is a mixture of G and T. While this approach produces libraries with limited amino acid substitutions at desired positions, it still will contain undesired amino acids. Also, it is not possible with current methods to generate a library with a defined set of desired amino acid distributions at each position.
CODA technology enables the assembly of diversity controlled libraries by a systematic method that allows for high-throughput, multiplexed, site-directed mutagenesis with an adjustable mutation incorporation rate at each desired mutation site. This requires a CODA-designed wild-type gene that has been thermodynamically optimized so that each mutation location in the gene is thermodynamically unique, plus a set of mutation oligonucleotides that covers exactly the location and type of desired changes to explore. The CODA algorithm provides the concentrations of wild type and mutation oligonucleotides for a single-tube primer extension reaction that yields the desired expected number and distribution of incorporated amino acid changes at each desired position in the resulting library.
Structure-guided Protein Domain Gene Shuffling Libraries.
Gene shuffling followed by in vitro evolution has proven to be a powerful way to engineer proteins with desired properties. However, current gene shuffling is a random event that results in extremely high diversity gene libraries without any guarantee of a desired gene rearrangement. What is really desired is the shuffling of defined protein structural or activity domains to produce chimeric proteins rather than random gene sequences. Now with CODA technology it is possible to easily produce complete, controlled diversity, chimeric gene libraries with cross-over points only at specific, structure-guided, base pair positions
For this application, the CODA algorithm is employed for the simultaneous global thermodynamic optimization and self-assembly of a set of two or more genes to be shuffled. Within the CODA determined temperature gap for the self-assembly of the gene set, any complementary oligonucleotide will hybridize to only one site of only one gene of the set. This allows the DNA shuffling and self assembly of designed chimeric proteins with structure-guided, base pair specific cross-over point precision facilitated by chimeric oligonucleotides with sequence complementarity than spans the cross-over point of two different genes of the set. An advantage of this CODA thermodynamic optimization DNA shuffling method is that, for the first time, it makes possible shuffling of evolutionarily divergent proteins with structural homology but lacking sufficient DNA sequence homology for DNA shuffling. It also eliminates the random out-of-sequence reassembly of current DNA shuffling methods. The result is a completely defined, diversity controlled, combinatorial set of chimeric genes composed of all segments of each gene with only base pair specific cross-over points.
Any combination of the library strategies described here can be combined in a single custom gene library to suit the users end product requirements and selection or screening needs. Please enquire about assistance in protein engineering and library design for specific needs.
1. Shapiro; Bruce E., Mjolsness; Eric D., Levchenko; Andre. Automated methods for simulating a biological network. U.S. Patent 7,319,945.
2. Shapiro B.E., Levchenko A., Meyerowitz E.M., Wold B.J., Mjolsness E.D. (2003) Cellerator: extending a computer algebra system to include biochemical arrows for signal transduction simulations. Bioinformatics 22;19(5):677-8.
3. Yang, C.-R., Shapiro, B., Mjolsness, E., Hatfield, G.W. (2005) An Enzyme Mechanism Language for Mathematical Modeling of Metabolic Pathways. Bioinformatics. 21(6):774-780.
4. Yang, C.R., Shapiro, B.E., Hung, S.P., Mjolsness, E.D., and Hatfield, G.W. (2005) A Mathematical Model for the Biosynthesis of the Branched Chain Amino Acids in Escherichia coli K12. J. Biol. Chem. 280(12):11224.
5. Najdi, T.S., Yang, C.-R., Shapiro, B.E., Hatfield, G.W., and Mjolsness, E.D. (2006) Application of a Generalized MWC Model for the Mathematical Simulation of Metabolic Pathways Regulated by Allosteric Enzymes”. Journal of Bioinformatics and Computational Biology, 4(2): 335.
6. Najdi T.S., Hatfield G.W., Mjolsness E.D. (2010). A 'random steady-state' model for the pyruvate dehydrogenase and alpha-ketoglutarate dehydrogenase enzyme complexes. Phys Biol. 12;7:16016.
7. Lathrop; Richard H., Hatfield; G. Wesley. Method for producing a synthetic gene or other DNA sequence. U.S. Patent 7,262,031.
8.Larsen, L.S., Wassman, C.D., Hatfield, G.W. and Lathrop, R.H. (2008) Computationally Optimized DNA Assembly of synthetic genes. Int J Bioinform Res Appl, 4: 324.
9. Pollastri, G., Przybylski, D., Rost, B. and Baldi, P. (2002) Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins, 47: 228.
10. Pollastri, G., Baldi, P., Fariselli, P. and Casadio, R. (2002) Prediction of coordination number and relative solvent accessibility in proteins. Proteins, 47: 142.
11. Cheng, J., Randall, A.Z., Sweredoski, M.J. and Baldi, P. (2005) SCRATCH: a protein structure and structural feature prediction server. Nucleic Acids Res, 33, W72.
12. Cheng, J. and Baldi, P. (2007) Improved residue contact prediction using support vector machines and a large feature set. BMC Bioinformatics, 8: 113.
13. Randall, A. and Baldi, P. (2008) SELECTpro: effective protein model selection using a structure based energy function resistant to BLUNDERs. BMC Struct Biol, 8: 52.
14. Saraf, M.C., Gupta, A. and Maranas, C.D. (2005) Design of combinatorial protein libraries of optimal size. Proteins, 60: 769.
15. Baronio R., Danziger S.A., Hall L.V., Salmon K., Hatfield G.W., Lathrop R.H., Kaiser P. (2010) All-codon scanning identifies p53 cancer rescue mutations. Nucleic Acids Res.38(20):7079.
Pierre Baldi and G. Wesley Hatfield, (2002) "DNA Microarrays and Gene Expression: From Experiments to Data Analysis and Modeling". Cambridge University Press.214 pp. ISBN: 0521800226
Federation of American Societies for Experimental Biology
International Society for Computational Biology
Cellular and Molecular Biosciences
Institute for Genomics and Bionformatics http://www.igb.uci.edu
Computational Biology Research Laboratoty (CBRL)