U.S. Markets close in 1 hr 53 mins
  • S&P 500

    +11.89 (+0.31%)
  • Dow 30

    +29.06 (+0.09%)
  • Nasdaq

    +52.81 (+0.47%)
  • Russell 2000

    -17.00 (-0.98%)
  • Crude Oil

    -1.47 (-1.48%)
  • Gold

    -29.70 (-1.68%)
  • Silver

    -0.03 (-0.16%)

    -0.0076 (-0.7441%)
  • 10-Yr Bond

    +0.0950 (+3.38%)
  • Vix

    -0.18 (-0.65%)

    -0.0016 (-0.1373%)

    -0.1390 (-0.1023%)

    +504.18 (+2.56%)
  • CMC Crypto 200

    +2.35 (+0.54%)
  • FTSE 100

    +82.30 (+1.17%)
  • Nikkei 225

    -315.82 (-1.20%)

Daphne Koller's Push to Marry Big Data to Big Pharma

Drug discovery and testing is a complex, fraught process that modern computing methods promise to reinvent — but only with the right data, the right tools, and the right people (and a lot of money). Coursera and Calico veteran Daphne Koller thinks she has all the right ingredients in her new company Insitro.

Video Transcript

DAPHNE KOLLER: Hey, Devin, really glad to be there virtually.

DEVIN COLDEWEY: Yes, virtually, and so much yeah. Can you-- you've been through a variety of companies and institutions, but you've been in this field of computational biology for quite a while now. But when you first started getting into it, did you always picture yourself being at the head of a company like Insitro?

DAPHNE KOLLER: No, I thought I was going to be retiring as an academic and taking a whole bunch of papers and training students and stuff. But I'm having so much fun now. To me that's--

DEVIN COLDEWEY: Well, I'm glad it's fun. Maybe for the benefit of our non-Latin speaking users or viewers, you could tell us a little bit about the name of the company.

DAPHNE KOLLER: So the name of the company is a combination of two phrases that are typically used in biology. One is "in vitro," which means experiments that are done in the lab in a tube, and the other is "in silico," which are experiments that are done in a computer.

And what we're really doing is bringing together these two disparate cultures that really don't often talk to each other and certainly don't come together as an integrated whole. And the whole company vision-- the name, as well as the logo, as well as how we built the company-- is to really try and create a synthesis of these two disciplines.

DEVIN COLDEWEY: And the discipline of computational biology as I guess I would refer to it isn't really isn't exactly new. I remember I studied it as a undergrad, but it has been reinvented in the last couple of years. How would you say that it went from being something that occurs in a lab to being something that can attract $100 million [INAUDIBLE]?

DAPHNE KOLLER: So first of all, I think it's there's a number of forces that led to that. One is just the growing need, especially in the areas that I'm in, which is therapeutics design, where the cost to create every new drug is growing exponentially from year to year. So one is just the unmet need.

But I think the other is that there is a confluence of technologies that are actually making us now for the first time to do things differently. When you and I did computational biology all those years ago, a large data set in biology was a few dozen, maybe a couple hundred, samples and very small bespoke experiments from which you could glean some insight. But it was very hard to really uncover general principles.

And the technologies that have emerged that include high-throughput sequencing and super resolution microscopy and CRISPR and stem cells have really enabled us to create massive amounts of data that is truly relevant to human disease, as well as to other endeavors.

Whereas on the other side, maybe closer to what people typically talk about at TechCrunch, is that machine learning has given us a bunch of tools that enable us to really make sense of data, including in domains where people are just not capable of deriving those insights, because they've never looked at this type of data before. And so that convergence is, I think, what's driving a lot of the progress that we're seeing in this field.

DEVIN COLDEWEY: Yeah, absolutely. And of course, you mentioned CRISPR. Of course, we are Jennifer Doudna on just an hour ago, a very interesting interview. But I think that really speaks to the way that the discipline has spread out in many different directions. You've got, obviously, CRISPR, and you've got computational biology. But you also have that splitting into drug candidate generation and protein folding. Where does Insitro fit into this new and growing ecosystem?

DAPHNE KOLLER: So our focus initially was really on the biology, which is using massive amounts-- creating and using-- because the important part of what we do as a company, the vision is that we actually generate massive amounts of data that are fit the purpose for feeding machine learning algorithms. We create massive amounts of biological data to derive new biological insights on intervention in which targets are going to lead to a meaningful, clinical outcome for people.

So specifically creating models, in vitro models, that are predictive of human clinical outcomes. And that allows you to identify new patients segments. It allows you to identify interventions, whether genetic interventions, which could be new targets or drugs, are actually going to modulate those clinical outcomes in a meaningful way in a human being.

That being said, we view ourselves as being on the first phase of a very long journey, which is to transform every step of the drug discovery and development process using machine learning and large amounts of relevant data. So once you have targets, next step is you want to make drugs. And how do you make drugs differently without five to seven years of painful medicinal chemistry?

And then when we get to the next stage, how do you do, for instance, a clinical trial in a way that the outcomes are not very qualitative-squishy? "How do you feel today" kind of test, but rather things that are much more quantitative and robust and reproducible, so that we have smaller error bars and can see statistical significance more readily.

So there is a lot of steps in this 15-year journey from idea to drug that I think can be transformed and accelerated and also de-risked, increasing the probability of success by the use of new methods.

DEVIN COLDEWEY: Right, and the first thing that you're approaching with that is steatohepatitis with this contract that you're working on with Gilead. Can you give me a beginning, middle, and end of when you're working with a company like that, what do you start with? What do you do with that data? And what is the desired outcome in sort of like-- well, tell me a story here.

DAPHNE KOLLER: Sure. So the partnership with Gilead occurred very early in our days at Insitro. We didn't even have a working lab. But fortunately, the first year of that effort was to bring together both private data from the clinical trials that Gilead had done in this disease, nonalcoholic steatohepatitis, as well as publicly available data from resources like the [? bank ?] and really uncover the biology of the disease that is relatively new, even though it's incredibly widespread-- because of the increase in obesity and type 2 diabetes and such, but is not really well characterized, and really dig into these data to uncover the pathways and drivers of the disease.

And what we were able to do, which is quite striking, is to get a relatively small, but high quality data set from the clinical trials that Gilead and done, where you had two time points in the disease, and be able to uncover novel, genetic drivers that meet the threshold for genome-wide statistical significance that drive not the disease initiation, but rather the progression to fibrosis and ultimately the outcomes that really require a transplant or even lead to liver cancer.

And so that is the starting point for the second year of the effort, which is now to take those targets and the biologies that they revealed, and use those as a starting point for putting those into these cellular systems of disease that we're generating from these induced pluripotent stem cells that capture human biology of patients versus controls human genetics of patients versus controls and see whether those targets indeed are associated with what we can measure in a test tube as associated with more rapid progression or less rapid progression.

And with that, the question would then be, "What are the modifiers or drugs that can help slow down or even reverse that progression?" So you can think of it as first understanding the biology. That was year one. Year two was modeling the biology in the dish. And year three is with this in vitro model that is predicting the human clinical outcome, what are interventions that are actually going to make a difference?

DEVIN COLDEWEY: Gotcha, that's very clear. Thank you. So but that seems like it's such a humongous effort. There's so much to build. My question is why pursue this as a startup? Why not go to Gilead? Why not go to Pfizer or whoever and say, "Give me a billion dollars, and I'll build you a world-class, computational biology division." Why take the risk?

DAPHNE KOLLER: So many people have asked that question, and I will be honest. The thought did cross my mind when I started this is, "Why not go and do this at a company that has all these people resources, money, data?"

And I think that ultimately I came to the conclusion that what we're trying to do is so different and so out of alignment with how most of these companies do their work, that trying to shift the trajectory of these really large ships of 100,000 people and build a culture that is truly a data-driven, machine learning-driven culture is going to be really challenging.

And when you think about the big giants that have emerged in industries that are tech-enabled, they really didn't emerge, in most cases, from the [? incumbents. ?] Netflix didn't emerge from a Hollywood studio or blockbusters. Google didn't emerge from the Yellow Pages, and Amazon didn't emerge from Walmart. So maybe to really build a truly different type of drug discovery development company, you need to start from a blank slate.

DEVIN COLDEWEY: I guess that answers my next question, which would be why these companies haven't attempted something like this themselves. But it sounds like the culture is not conducive to what you're trying to build. At the risk of asking you to disparage your future clients, what is the different culture that they have?

DAPHNE KOLLER: So I have a lot of respect and admiration for these companies. They are full of-- despite some public perception, they are by and large full of truly motivated people who want to make a difference in the lives of patients. And I have a lot of just respect and admiration for what they're trying to do.

But I think this notion of the innovator's dilemma and coming in with a mindset that says, first of all, we're going to do this in a different way. We're not coming at this from let's first have a bunch of people think about the biology, do a lot of bespoke experiments, draw diagrams as pathways on the board, and then say that's the target that I believe. That's the one that I'm going to go after. That's how it's typically done.

Whereas, we come at it from the perspective, "Let's generate a whole bunch of data." And we don't know what the data is going to tell us, but that's-- but the data will speak for itself. And I think that's really that mindset between hypothesis-driven and data-driven work that is a very hard bridge, I think, really hard paradigm shift.

And I don't think that the hypothesis-driven approaches are useless. They obviously have come up with some great drugs using that approach over the years, and I think others will continue to come. But we're saturating a lot of work in this in the way that I've said. The drug discovery effort is becoming increasingly expensive and increasingly prone to failure. And the question is if we do things differently, will this give us a whole new range of ways to treat patients with unmet need?

DEVIN COLDEWEY: So you mentioned generating a lot of data and using a lot of data that they already have. But we talked before about how they have a data problem. They're sitting on a goldmine, but they don't know it. They can't sort through all that data. Could you characterize their data problem? Because I feel like this is something that affects a lot of legacy industries.

DAPHNE KOLLER: So honestly, I think characterizing it as a gold mine is maybe a little bit of a [? generous ?] statement. I think they've generated a lot of data, and some of it is really good. But a lot of it is actually not as good, because it wasn't generated with the needs of automation and machine-- I mean, machine learning automated inference in mind.

And when you try and take a whole bunch of bespoke experiments-- each of which was done for a different purpose, often with a different experimental approach, with a different assay-- and you put them all together into one big pile, it becomes a very disparate and heterogeneous mess that, frankly, machine learning algorithms can fail just because they latch on to artifacts in the batches and the specifics of the experiments and are unable often to tease out the underlying patterns.

And so one of the things that I think is a hard sell for a lot of pharma companies is when I go tell them, "Look, this data that you think is a gold mine is probably not nearly as valuable as you think it is. And you might be better served not trying to clean it up and curate it and try and see if there's something there, but rather to generate a whole bunch of it from scratch using modern day technologies."

Now you have to be careful, because human data from human clinical trials, especially the larger ones, those are harder to regenerate. Anything-- any experiment [? using ?] [? a ?] human is precious. But a lot of the in vitro stuff, you're actually just better off doing it from scratch.

DEVIN COLDEWEY: And you're using modern machine learning techniques, among others, to do that. I know some of our more technical viewers probably be curious what your approach is. Machine learning is such a fast-moving field. Are you on the cutting edge? Are you using tried, but true, techniques? What do you do over there?

DAPHNE KOLLER: I believe that one shouldn't be in the position of you're a hammer looking for a nail, and everything suddenly becomes a nail. We have multiple problems that we're working on, and if we're doing microscopy data, then the right thing to do is to use the latest and greatest convolutional neural network as a way of interpreting images. Of course, we need to adapt it and expand it to deal with microscopy data, which is different from images of cats and dogs. But a lot of the building blocks are actually applicable.

The other-- for other techniques, for other problems, we're using other techniques. So when we interpret human genetics data, we use techniques from statistical genetics, because they are very powerful and very suitable for that task.

DEVIN COLDEWEY: Gotcha. So who do you see as your competition in this-- it's a very strange field to be in. But I can't tell whether you're competing with major pharma companies or whether you're competing with somebody working out of a garage who has the cool new modeling software.

DAPHNE KOLLER: Well, I could say both or neither. I think if we are thinking about the drugs that we make, ultimately, the competition will be if there's another company that has a drug for that same patient population. That will be the actual competition in the sense of this is what matters to our end customers, which are patients.

I think the competition as it relates to other things is not really in market forces. It's more about mindshare. People say, "Oh, how are you different from all the other machine learning-enabled drug discovery companies?" Well, there is a machine learning-enabled company that folds proteins and comes up with a new synthetic protein therapeutics. They're not really competing with us in a meaningful sense, except when it comes to someone asking that question and the question of mindshare, if you will, in the eyes of investors, perhaps.

DEVIN COLDEWEY: So the deal you have with Gilead, the contract with Gilead, this is just the start of-- it's just a very specific application of what you hope to be a general-purpose platform for this kind of solution. But how can you-- how do you generalize from something like steatohepatitis to something completely different, like neurology, which I believe you're looking into?

DAPHNE KOLLER: Absolutely, and the beauty of the platform that we're building is that it is, in fact, general purpose. So first of all, we're working with these induced pluripotent stem cells. You can turn them into hepatocytes. You could also turn them into neurons. You can edit them to create genetic variants that are prevalent in NASH. You can edit them differently using the same CRISPR technology or such to create variants that are relevant to whatever neurological disease you're studying.

The platform, which automatically-- and by the way, in a machine learning-enabled way-- cultures and differentiates themselves. You don't need to have a person pipetting for days on end. That also is a completely general-purpose platform. Our ability to [? phenotype ?] cells is also general purpose. So the beauty of it is that the ability to transfer what we've done from one indication to another, even to a totally different therapeutic area, is quite broad.

DEVIN COLDEWEY: Gotcha, and I assume you'll be using Elon Musk's Neuralink for all those purposes.

DAPHNE KOLLER: Of course, tomorrow.

DEVIN COLDEWEY: Yeah, it's available tomorrow. I forgot, yeah. But I think this is-- we're getting towards the end of our time, but I wanted to ask you tend to find yourself ahead of the curve on a lot of things in education, in bioinformatics, and things like that. What do you think is out there right now that's underappreciated or has a great deal of potential that's untapped?

DAPHNE KOLLER: So I think the whole space that I'm in right now is a tremendous opportunity and not just in the therapeutic design. I think of this confluence that we talked about, where you have the ability to use the tools that cell biologists and bioengineers have developed to measure biology at an unprecedented scale.

The tools of machine learning to interpret the data that we see and then tools like CRISPR, as well as cell engineering and others to create biological systems that do things that they wouldn't normally do has implications, not just in therapeutics. It has implications in the energy field, like biofuels. It has potential in the environment and can re-engineer algae that suck more carbon dioxide out of the air and help with global warming, agriculture.

So I think that whole space which I've come to call "digital biology" is just an incredibly exciting place to be right now.

DEVIN COLDEWEY: Absolutely. And I think we're about out of time. I have one more very quick question, just some practical advice for the many people out there, I'm sure, who are looking to get into this field. What would you tell them if they're looking to be-- to start of startup or become a coder in this space? What would you tell them?

DAPHNE KOLLER: So come into this with a tremendous sense of respect and humility for the other disciplines that you need to have collaborating with you in order to be successful in this space. Just because you're really good as a technologist or whatever, doesn't mean that you can just forget about the biology or the medicine or whatever it is that you're working on. The best work that happens, I think, happens at the intersection of two fields, the boundary between two fields. You have to treat both fields with respect and really try and bring them together as we've talked about.

DEVIN COLDEWEY: Absolutely. Well, hopefully you've done that. Well, best of luck with Insitro. I look forward to hearing more from you guys, and I believe that is all our time. So thank you for joining us very much.

DAPHNE KOLLER: Thank you very much, Devin. It was a pleasure.

DEVIN COLDEWEY: Pleasure here too.