Education
The diversity of concepts and technologies in our laboratory creates a need for a formal didactic program in which lab members can build a core foundation of knowledge from studying the literature, and also by sharing their expertise and knowledge through a summer lecture series. We also value a small collection of videos and movies that capture themes of value for practical science. Here, we organize a collection of notes on various core topics of interest to the lab and a curated reading list of relevant papers… …coming soon.
Quantitative Biology: From Complexity to Simplicity
The term “complexity” is routinely used to characterize biological systems. But what does it mean, precisely? We have designed a course that comprises lectures that introduces the basic process of data analysis, modeling, and formulation of theory, and includes a number of case studies in which understanding of biological systems has emerged through the application of this approach. An overall theme is to think deeply about a rigorous definition of system complexity and to learn about strategies to rationally address such systems. As a counterpoint, we begin with the study of linear systems and the rich mathematical foundations for understanding and predicting their behaviors. We then move to non-linear systems; what makes them complex and difficult, and why is the mathematical treatment of these systems so much harder? We will explore several biological examples of non-linearity in fields ranging from structural biology to evolution, ending ultimately with a proposed general definition of complexity in biology and an operational strategy for studying such systems.
Part 1: Introduction

Part 2: Linear Systems






Part 3: Non-Linear Systems





So,based on what we have learned, how can we understand large non-linear dynamical systems? As an example, we focus on the problem of protein structure and function. One parameterization of proteins is by their amino acid sequence, a choice that logically reflects chemical structure and the fact that phenotypic variation often happens at the scale of individual amino acids. But protein phenotypes (folding and functional activities) arise from non-linear (that is, epistatic) interactions between amino acids. Since the number of amino acids in a typical protein is ~200, the combinatorial complexity of potential non-linear interactions is frighteningly enormous – a large problem. In this lecture, we examine the extent and complexity of non-linear interactions between amino acids in a model protein – an empirical approach to develop some intuition. A key result is that non-trivial high-order epistasis exists between amino acids, but that it is also extraordinarily sparse, with only a small number of non-linear terms controlling phenotypes. We review formal approaches to exploit this sparsity to learn the relevant nonlinear terms, and introduce a strategy that we will call the “statistical genomics” approach. A major concept is that sparsity in epistasis implies that “large” biological systems can be projected down to a lower-dimensional effective space in which they can be mechanistically understood. The strategy of statistics before mechanism represents a major shift in operational paradigm, but may be the right way to understand large non-linear systems. We are now in the realm of cutting edge problems with little history to discuss; accordingly our key individuals are a collection of current-day scientists who are working on this problem. [PDF]
We continue our discussion of general approaches for large nonlinear dynamical systems with a practical example of the statistics before mechanism strategy. Specifically, we review new approaches for using large ensembles of homologous protein sequences to make models for the extent and pattern of nonlinear interactions between amino acids. There are three key issues: (1) defining the right phenotype(s) to be explained by the models, a problem of understanding fitness in an evolving population, (2) defining the optimal sampling of sequences, and (3) defining what constitutes a “better” model, a surprisingly subtle problem. Broadening from the problem of proteins, how can the concepts developed here be used more generally? Again, the key individuals here are a small group of current-day scientists working on this problem. [PDF]
Part 4: Conclusion and next steps
Center for Physics of Evolution
Biochemistry & Molecular Biology
The Institute for Molecular Engineering
The University of Chicago
929 E. 57th Street Chicago, IL 60637
