Education
The diversity of concepts and technologies in our laboratory creates a need for a formal didactic program in which lab members can build a core foundation of knowledge from studying the literature, and also by sharing their expertise and knowledge through a summer lecture series. We also value a small collection of videos and movies that capture themes of value for practical science. Here, we organize a collection of notes on various core topics of interest to the lab and a curated reading list of relevant papers… …coming soon.
Quantitative Biology: From Complexity to Simplicity
The term “complexity” is routinely used to characterize biological systems. But what does it mean, precisely? We have designed a course that comprises lectures that introduces the basic process of data analysis, modeling, and formulation of theory, and includes a number of case studies in which understanding of biological systems has emerged through the application of this approach. An overall theme is to think deeply about a rigorous definition of system complexity and to learn about strategies to rationally address such systems. As a counterpoint, we begin with the study of linear systems and the rich mathematical foundations for understanding and predicting their behaviors. We then move to non-linear systems; what makes them complex and difficult, and why is the mathematical treatment of these systems so much harder? We will explore several biological examples of non-linearity in fields ranging from structural biology to evolution, ending ultimately with a proposed general definition of complexity in biology and an operational strategy for studying such systems.
Part 1: Introduction
From the scale of macromolecules to ecosystems, biological systems are made up of many parts that interact to produce behaviors that are not predictable from knowledge of the parts taken individually. That is, the whole is much different from the sum of the parts. In this lecture, we create a classification of all problems in science and attempt an initial definition of what constitutes complexity. We lay out the current status, and begin discussing strategy by introducing the principles, process, and goals of modeling with examples from simple kinetic systems. [PDF]
Part 2: Linear Systems
Before we contend with the inherent complexity of non-linear dynamical systems, we first need to understand the fundamental simplicity of linear systems. In this lecture, we learn definitions of linearity and time-invariance and how these properties provide powerful constraints that make systems predictable and decomposable. We introduce a mathematical transform method that makes these properties particularly clear and provides an efficient approach to analytically solving global behaviors. We show how linearity and time-invariance open up powerful tools of engineering to design, control, and predict the action of any linear system. As a matter of historical perspective, we recognize the key individuals whose work was essential for the knowledge described in each lecture…here LaGrange, LaPlace, and Fourier. [PDF]
Continuing our study of linear systems, we show to sketch the behavior of linear dynamical systems and discuss a formal analytic approach for decomposing higher order linear systems into weighted, additive combination of more elementary functions. This analysis leads to one of the truly marvelous results of linear dynamical systems – the possibility of defining a zoo of all possible behaviors of these systems from knowledge of their key control parameters. In that sense, we make the point that arbitrary linear systems can be completely understood. They may be complicated, but they are not complex. Many individuals contributed to these concepts, but if we were to pick three…Euler, Fourier, and Poincare. [PDF]
Our descriptions of phenomena so far have been macroscopic, implicitly involving the action of large numbers of reacting particles at the so-called deterministic limit. Thus, we speak of differential equations with characteristic parameters describing the progress of reactions over time and space. In this lecture, we remind ourselves that these parameters are in fact statistical averages over distributions of stochastic events – the microscopic basis for all reactions. We review the basics of probability theory, ways of counting, and deduce the three central distributions of greatest relevance. A key concept is the statistical independence of the individual events that make up macroscopic reactions, a principle that underlies the central limit theorem and the origin of the Gaussian distribution. Our choice of the key individuals responsible for these concepts…LaPlace, Gauss, and Poisson. [PDF]
We continue our discussion of linear systems but at the limit of large number of variables with the topic of diffusion. In this lecture, we describe the phenomenological treatment of Fick, who wrote down the macroscopic theory of diffusion by analogy with Fourier’s analysis of heat conduction. The later mechanistic analysis of Einstein explained the microscopic origin of diffusion in the principles of the unbiased random walk, reinforcing the concept that simple global behaviors emerge from statistical averaging over many independent random microscopic events. We connect the principles of the random walk to gradients of the chemical potential, the thermodynamic driving force of diffusion, and show that diffusing systems are systems that are seeking the condition of maximum entropy. For ideal dilute solutions, we redefine the thermodynamic driving force for diffusion as a molar entropy gradient in space and time. The key individuals responsible…Brown, Fick, and Einstein. [PDF]
The theory of scattering of light by matter is beautiful, and another example of the power of linearity in achieving simple analytical solutions to what seems like a very complex problem. In this lecture, we review the basic principles of electromagnetic radiation, and the mathematical description of light as traveling waves in space and time, and explain the problem of interference when light rays of different phase shifts are added. Consistent with the simplicity of linear summations, we show that the theory of diffraction is no more complex for arbitrarily large molecular systems than for two scattering centers, and we show that the analytical solution to diffraction is the spatial Fourier transform. We take this moment to reintroduce principles of linear transforms, and then discuss some practical issues of applying the theory of diffraction for solving molecular structures (like growing crystals and solving the inverse Fourier transform). The key individuals in the development of this theory…Huygens, Young, and Bragg. [Part1 PDF] [Part2 PDF]
Often, we choose variables that make up our system of interest by purely empirical or even arbitrary criteria; for example, we choose to describe the basic units of proteins, genomes. and ecosystems as amino acids, genes, and species, respectively. There is a logic for these choices but why should these be the best variables to describe the functional mechanism of biological system? Indeed, there is considerable evidence that system function at all scales emerges from the collective action of their parts, suggesting a re-parameterization of variables into these collective groups might be a better description of the basic units of function. But, how to find the right effective variables? In this lecture, we discuss simple linear methods of spectral decomposition (also called principal components analysis) and independent components analysis as first-order approaches to find a good representation of seemingly complex biological systems. In doing this, we re-introduce the eigenvalue decomposition and make conceptual connections to the analysis of high-order linear dynamical systems. The key individuals responsible for the concepts discussed….Pearson, Hotelling, and a combination of Comon and Sejnowski. [PDF]
Part 3: Non-Linear Systems
With a review of the essential characteristics of linear systems (decomposability, predictability, designability), we begin to study the fundamental distinction of even small-scale non-linear dynamical systems. We look at three examples, the van der Pol oscillator from electronics, the FitzHugh-Nagumo model of neuronal action potentials, and the discrete-time representation of the logistic equation. The key concept is that these systems exhibit remarkable “emergent” behaviors that are not decomposable into the behaviors of the parts taken individually. From study of the 1-dimensional logistic equation, we explore the path to period doublings and deterministic chaos. The key individuals responsible for these concepts include LaGrange, Poincare, and the more recent combination of Lorenz, May, Feigenbaum, and Libchaber. [PDF]
This lecture initiates a series of seven lectures focused on understanding non-linear dynamical systems through a series of specific examples. The examples are arranged in increasing system size (that is, number of dynamical variables potentially in play), and an important goal is to present both the bottom line conclusions as well as the mathematical approaches and tools used to get there. What general tools and strategies can we learn from these didactic examples? Indeed, we will see that strategies for addressing nonlinear systems depend on the system size in an interesting way, requiring a formal hypothesis about the complexity of dynamical systems in biology. Here, we present the beautiful work of Jim Ferrell (Dept. of Chemical Biology, Stanford) in deducing the basic function and mechanism of the MAP kinase enzyme cascade, a common motif in eukaryotic signaling systems. By studying this example, we see how fundamentally new properties (multi-stability and irreversibility) emerge even in small systems that include non-linear reactions. These properties ae centrally important to the biology of the MAP kinase switch that Ferrell is studying. We also introduce a clever graphical method to sketch system behavior simply, without analytic calculation or computer simulations. The key individuals…LaGrange, Poincare, and of course, Ferrell. [PDF]
Continuing our study of small-scale non-linear dynamical systems, we return to the van der Pol relaxation oscillator, but now with a more formal approach. We introduce concepts of local linearization and formal methods of analyzing system stability, limit-cycle oscillations, and bifurcations. We will see that local linearization is an attempt to bring back the analytic power of linear dynamical systems to the study of the behavior of non-linear systems. As a practical model, we discuss an electric circuit implementation of the van der Pol oscillator called the Chua circuit, and apply our knowledge of stability and bifurcation analysis to predict its behavior. There is nothing like physically seeing a dynamical system in operation to enhance our understanding, and so we build a real implementation of the Chua circuit with real world components and carry out empirical studies to compare with our theoretical predictions. Finally, we describe the Fitzhugh-Nagumo model for the neuronal action potential, another realization of the van der Pol oscillator, with a small extension….building in a threshold for the oscillatory regime. The key individuals here are as before, LaGrange and Poincare, and the group of Lorenz, May, Feigenbaum, and Libchaber. [PDF]
We move to the analysis of a somewhat larger system – the signaling pathway that mediates visual transduction in the fruit fly Drosophila melanogaster. Absorption of a photon by the light receptor rhodopsin leads to a rapid, all-or-nothing, stereotyped electrical response with a random time delay called the single photon response (or the quantum bump (QB)). Remarkably, the probability of a QB in the dark is close to zero, but approaches one upon absorption of a photon. How does the system wok? The same mathematical tools we learned in smaller systems demonstrate that this visual signaling system in flies works as a light-driven, stochastic relaxation oscillator with a switch-like threshold, converting photon absorptions into a single limit cycle oscillation (the QB). An important concept is that though the system is defined by seven dynamical variables, the effective dynamics of the system can be projected down to a two-dimensional space. Thus, the “larger” system is just a small system if one can find the right effective variables, a possibly simplifying feature of the evolutionary design. The bottom line is that the model captures all the essential functional properties of QB generation, and provides an example of how to intuitively “understand” the operation of seemingly complex cellular signaling pathways. The key individuals underlying the concepts here…Poincare, Lorenz and May, and Shraiman and Pumir. [PDF]
What about the general case of non-linear systems with lots of apparent dynamical variables that control behaviors? How can we understand principles of operation of such systems? To develop the right intuition, we begin with what seems at first glance like an unrelated problem! We discuss the theory of epistasis, the concept in genetics where the effect of mutations depends on the background of other mutations (that is, where mutations are non-independent and coupled). We show that non-independence is the natural outcome of non-linear interactions between system variables, and that with different formalisms, epistasis quantitatively measures these interactions. A major concept is that epistasis is a mathematical transform from the space of phenotypes of genetic variants to a space of non-linear interactions between the underlying mutations. Accordingly, representations in both spaces contain the same total information, but differ in the organization and in the sparsity of the information. For example, if there are no non-linear interactions, epistasis is sparse, and phenotypes of all variants are predictable from a small number of measurements – a case of extreme simplicity. If all possible non-linear interactions exist, epistasis is not sparse, and phenotypes are not predictable without exhaustive measurements – a case of extreme complexity. Where do real systems exist in this hierarchy? Key individuals whose work has contributed to these ideas…Wright, Fisher, Wyman, Fersht, though we now come to a place where some major advances have yet to be made. An opportunity for contribution…. [PDF]
So,based on what we have learned, how can we understand large non-linear dynamical systems? As an example, we focus on the problem of protein structure and function. One parameterization of proteins is by their amino acid sequence, a choice that logically reflects chemical structure and the fact that phenotypic variation often happens at the scale of individual amino acids. But protein phenotypes (folding and functional activities) arise from non-linear (that is, epistatic) interactions between amino acids. Since the number of amino acids in a typical protein is ~200, the combinatorial complexity of potential non-linear interactions is frighteningly enormous – a large problem. In this lecture, we examine the extent and complexity of non-linear interactions between amino acids in a model protein – an empirical approach to develop some intuition. A key result is that non-trivial high-order epistasis exists between amino acids, but that it is also extraordinarily sparse, with only a small number of non-linear terms controlling phenotypes. We review formal approaches to exploit this sparsity to learn the relevant nonlinear terms, and introduce a strategy that we will call the “statistical genomics” approach. A major concept is that sparsity in epistasis implies that “large” biological systems can be projected down to a lower-dimensional effective space in which they can be mechanistically understood. The strategy of statistics before mechanism represents a major shift in operational paradigm, but may be the right way to understand large non-linear systems. We are now in the realm of cutting edge problems with little history to discuss; accordingly our key individuals are a collection of current-day scientists who are working on this problem. [PDF]
We continue our discussion of general approaches for large nonlinear dynamical systems with a practical example of the statistics before mechanism strategy. Specifically, we review new approaches for using large ensembles of homologous protein sequences to make models for the extent and pattern of nonlinear interactions between amino acids. There are three key issues: (1) defining the right phenotype(s) to be explained by the models, a problem of understanding fitness in an evolving population, (2) defining the optimal sampling of sequences, and (3) defining what constitutes a “better” model, a surprisingly subtle problem. Broadening from the problem of proteins, how can the concepts developed here be used more generally? Again, the key individuals here are a small group of current-day scientists working on this problem. [PDF]
Part 4: Conclusion and next steps
Center for Physics of Evolution
Biochemistry & Molecular Biology
The Institute for Molecular Engineering
The University of Chicago
929 E. 57th Street Chicago, IL 60637