Monday, December 2, 2013

Science: Principles or Cook Book?

There is a recent paper by Prof. Dougherty from Texas A&M which bemoans the state of some parts of science in the current environment[1]. As Dougherty so clearly states: 

…science concerns relations between measurable variables and it is these relations that constitute the subject matter of science, scientific knowledge ipso facto is mathematically constituted… 

 Let me give a couple of examples of how this applies.

First, let us look at the world of genomics which I have been discussing herein for a while. The introduction of the microarray has allowed an explosion of data that has then allowed scientists to putatively argue some relationship between genes and cancers. Namely they go about examining say 9,000 prostate cancer patients and using microarrays primed for say 500 genes they conclude that say some 50 of these gene are seen in prostate cancer. They then allege that there is some actionable clinical relationship between the presence of the gene and the cancer. There is no underlying system model identifying this, just a microarray demonstrating that “oftentimes” these genes are under or over expressed.

Second, let us look at the BRAF V600 melanoma cases. Here unlike the above we have a case where one knows the RAF pathway and that loss of control of certain elements of that pathway lead to gene instabilities and thus a malignant expression. Therefore one targets the mutated RAF gene, the BRAF V600, and it results in a suppression of the malignancy, for a while. Then we had squamous cell carcinomas, but since the full pathway was known, go down one step and there was MEK and controlling it controlled the sequella. In this case there was a model, a system, and by logically following the system one found what the next step should be.

The above are two examples of how “science” is being done today in the area of gene related results. The second example is a Dougherty like science, namely it connects data to an underlying model which is predictable, and by using that the cancer is controllable, at least until another instability results. The first model, data collecting, is not really science as we accept it today. It is more akin to 19th century Botany, at best, where one goes out and collects specimens of plants and then tries to sew together a quilt of understanding to explain nature.

What Dougherty is focusing on is the Why question. When I recall Medical School, one is taught What and How. What disease is it and How do I treat it. In contrast Engineering is first Why and then How. There is a strong dissonance when an Engineer is studying Medicine. At least forty years ago. An Engineer all too often keeps asking Why, what is the underlying set of basic scientific principles that explain the phenomenon and how can I express them in a manner in which they can be used on a predictable basis. Why would drive many a Medical Professor to apoplexy. Medicine was for a long while the transfer down of “facts” and not validatable principles. The old adage at graduation that fifty percent of what one had just learned in Medical School was now invalid was a bit of a joke but sadly it was also true.

But as we move to Genomics we sadly see this trait arise again. There is a tension between those who want to have basic repeatable principles to build upon and those who believe that collecting data is the sine qua non. Let me give an example of a recent experience. Prof Lander at MIT is teaching an EdX course on Biology. Now Lander is brilliant and his style of teaching is in many ways classic MIT. Namely he highlights the basic principles, and then the student works through the Problem Sets developing the details for themselves. So far so good. His first two three fourths of the course was fantastic. Then I noticed a subtle change, a change that, unless you were prepared to recognize would have slipped through the cracks. He slowly started giving a mixture or core predictable principles and cook book recipes. For example, we know that we can denature DNA because the base pair bonds are Hydrogen bonds, relatively weak, and the backbone Phosphate bonds are strong because they are ionic. Thus by heating the molecule we break the Hydrogen bonds first and then before we break the ionic bonds we can do our complementary additions, thus PCR works well.

On the other hand as he progressed to a discussion of Knock Out genes there were a collection of “tricks” or cook book recipes that were used. Why, for example did one get the modified DNA into the denatured gene the way he said? Well it just happens. Well nothing just happens. Fortunately bench Biologists have developed many “tricks”, like alchemists, and as a result they have become a bit too comfortable with this unexplained bevy of tools, albeit indispensable, but in the long run self-defeating.

As Dougherty states when he examines data mining as an example of the Biologist’s flair for data at all costs: 

Data mining and Copernicus share a lack of experimental design; however, in contradistinction to data mining, Copernicus thought about unplanned data and changed the world, the key word being ‘thought.’ Copernicus was not an algorithm numerically crunching data until some stopping point, very often with no adequate theory of convergence or accuracy. Copernicus had a mind and ideas. William Barrett writes, ‘The absence of an intelligent idea in the grasp of a problem cannot be redeemed by the elaborateness of the machinery one subsequently employs’. Or as M. L. Bittner and I have asked, ‘Does anyone really believe that data mining could produce the general theory of relativity’? Data mining represents a regression from the achievements of three and a half centuries of epistemological progress to a radical empiricism, in regard to which Reichenbach writes, ‘A mere report of relations observed in the past cannot be called knowledge. If knowledge is to reveal objective relations of physical objects, it must include reliable predictions. A radical empiricism, therefore, denies the possibility of knowledge’. A collection of measurements together with statements about the measurements is not scientific knowledge, unless those statements are tied to verifiable predictions concerning the phenomena to which the measurements pertain. 

What is Dougherty getting at? Simply, to reiterate the first quote: Science demands a marriage between data and models, to be true science it must be predictable and predictable based upon an embodiment in an abstraction.

Let me now apply this to genomics. Consider prostate cancer. The question is complex but can be asked; what is the first set of steps that lead to prostate cancer? Let us examine what we know:

First, we know many of the pathways. We know that the AKT pathway is critical, we know that c-MYC is a critical control element, we know that PTEN is often mutated, and we know that AR (Androgen Receptors) ultimately get mutated and we have metastatic growth. We pathways, we have relationships; we can demonstrate causality and results. Thus a modicum of a basis in reality exists. If one would use this pathway model and then search using microarrays matched against the model one arguable could iterate to improved models and improved predictability. The data without the model is useless and the model without the data is unverifiable.

Second, we can ask what sets the process off. Are all the changes due to mutations or more likely due to epigenetic insults? Thus when we look at MDS for example, we are looking at a hypermethylated set of blood stem cells. Something hypermethylated them and we know that since they are hypermethylated that the gene expression is repressed and thus cell proliferation of immature cells is a result. In prostate cancer, is the control mechanism lost because of a mutation, methylation, both, and in what order? Having a model allows one to validate and then iterate along a consistent trajectory of reality.

What does Dougherty have to say here? 

While ignorance of basic scientific method is a serious problem, it is necessary to probe further than simply methodological ignorance to get at the full depth of the educational problem. Science does not stand alone, disjoint from the rest of culture. Science takes place within the general human intellectual condition. Biology cannot be divorced from physics, nor can either be divorced from mathematics and philosophy. One’s total intellectual repertoire affects the direction of inquiry: the richer one’s knowledge, the more questions that can be asked. Schrodinger comments, ‘A selection has been made on which the present structure of science is built. That selection must have been influenced by circumstances that are other than purely scientific’  

The point I believe he is making is that in the new world of Genomics, it is necessary to have a foundation that exceeds just the Laboratory and its tricks. One must understand that no matter what we think that every time we look at a cell, at an organism, we are looking at a system, at some stochastic dynamical process wherein things move forward, albeit randomly, but in a way controlled by principles. We must look at the world wherein data is used not as an end in itself but as an iterative process with our mathematical world view. Thus the tools needed to view this world are extensive yet available. Engineers are trained to use them daily. Perhaps Genomics will grow to appreciate their essential import.