"BMY has a full time team discrediting anything EXEL does and their strategy is to crush any competitor by any means available to them."
I have no love for BMY. That said, I haven't seen them overtly do anything to discredit any EXEL product. Their full time team actively promotes their pipeline. It's left to the analysts and medical community to decide who has the better story to tell.
"N/Ipi which is wholly owned by BMY unless it's dramatically better due to marketing pressures by BMY who'd love to keep every piece of the profit from any combination it sells."
Your getting it. BMY has blanketed all the low hanging fruit with their checkmate trials using Nivo monotherapy. Now they are starting to look at various combinations and you have to know that preferentially they will lo through their own pipeline first. That's one of the reasons I don't get overly excited about the Apollo Nivo/Cabo trial. The data from that trial lacks a control for comparison and even if the data looks interesting, who's going to pick up the torch and spend the $100+ million to do a pivotal trial. Nivo is not the only PD1 or PDL1 inhibitor. A better strategy that trying to arm twist BMY into some kind of partnership would be to get one of the 2nd tier PD1 competitors playing catch up to explore a PD1/Cabo combination. The problem is that your starting from scratch with all the requisite preclinical work and phase 1 dose finding, but late is better than never.
"...I think they will shoot for the March NCCN meeting."
The quarterly conference call should happen in about a week and they should confirm the venue for the data presentation. If it is a closed venue, then expect a detailed press release to compensate. Maybe we'll also hear more about Celestial enrolment.
"The Meteor subgroup HR of .22 should carry the weight of clinical reasoning. How large is that patient pool, and how many of those patients will not consider any other therapeutic other than Cabo per that .22HR. I suspect most of them...perhaps all of them."
As things stand right now, I do think Cabo occupies the 3rd line spot. Sutent-Nivo-Cabo. I don't think the prior Nivo treated subset from Meteor gets much notice, the sample size is too small (n=32 split between the cabo and ever arms). I keep talking about the increasing influence the reimbursement agencies have on drug choice, and that will be a factor. Probably a positive one for Cabo as I think the weight of evidence from Meteor suggests Cabo superiority to the less expensive alternative TKI's. Unless Eisai finds a way to undercut Cabo pricing with the L/E combination, I see that slipping into 4th line. I think that's where things sit until we see the first PD1 doublets show up or Cabosun shows up with a big surprise to the upside.
Let's say for a second that the HR=.22 PD1 pre-treated subgroup data is not a fluke and holds true for the general case. If that's so then Cabo is already sitting in the sweet spot and PD1 pretreatment could actually result in greater efficacy and longer treatment durations for cabo. Win/Win scenario.
"My fear is that it's only a little OS improvement. Let's say 2 weeks. The implication would be another survival pathway is upregulated giving a poor increase in OS."
They've already promised the improvement is clinically meaningful and anything less than a 3 month separation would represent quite a degradation from what the first interim was indicating. We'll see.
"...is there a number that if and when it is released in the near future would make us all stand up and go 'wow'? Its another way of asking, what number is baked into (expected) the SP and what number breaks that threshold and moves it positively? Anyone?"
First, cross trial comparisons are always difficult. I agree that the Meteor population appears to be more advanced than either the Checkmate or L/E trials, so in this case the cross trial comparison is especially unfair. I also agree that HR is a better indicator of benefit than mOS comparisons. All that said, once the regulatory process is complete the marketing starts, and those comparisons are going to be made because those are the the most prominent and eaiest to understand statistics available.
Some of the KOL's expressed a preference for Nivo simply because it was initially the only statsig OS result. It showed a lack of statistical savvy, but it was a predictable response. By the time Cabo (and maybe L/E) hit the market, I think the reimbursement agencies will have as much influence over the treatment algorithm as does doctor and patient preference. I think a better reason to show preference to Nivo is because there is a subset of patients who do really well with some very durable responses including a handful of complete responders.
Back to your question. For Cabo gain a competitive advantage, it would need to show a mOS equal to or greater than Nivo which had a 25 month mOS. That just is not going to happen. We have the KM curve from the first interim and it would need to drastically reshape for that kind of result. I think a more likely outcome is something on the order of 19 months vs 14.5 months for Ever.
"the results showed a highly statistically significant and clinically meaningful increase in OS for patients randomized to cabozantinib as compared to everolimus."
There is no room for interpretation. The trial achieved its secondary OS endpoint.
So what are the meanings of "statistically significant" and "clinically meaningful."
After all my explanations this morning, there should be no doubt but that statistically significant means that there was a predefined requisite p value for this analysis and that p value was achieved.
"Clinically meaningful" is a term that is less defined, but frequently used. Recall that earlier I said that achieving a requisite p value is considered proof of superiority, but does not describe the magnitude of the effect. In "power" we learned that two things affect a trial's ability to achieve an endpoint. The volume of data and the magnitude of the effect. So putting it together, a large trial has the ability to prove superiority with a small effect. Avastin was once the worlds best selling oncology drug. Roche would run large clinical trials with 1000-1500 patients with sensitive radiologically assessed endpoints. In some indications they would achieve statistically significant results with the treatment advantage measured in only days and weeks. The achievement of the endpoint proved superiority, but the small magnitude of the effect left question as to whether those resuts were "clinically meaningful" enough to justify approval.
So back to the EXEL press release. Highly significant implies a p value less than or equal to .001 and statistically significant means that it met the requisite p value that defined the endpoint. Clinically meaningful means that the magnitude of the improvement is significant enough as to leave no question as to whether the drug is an improvement over the Ever standard of care.
"Why would you waste time analyzing EXEL for a message board?"
Good question and one I ask myself frequently. Yes, I do have a lot of time on my hands. The kids are grown and my job is not demanding. I am mildly O/C. I do enjoy posting and one positive aspect is that it does lead me down different fact gathering pathways that I would not otherwise pursue.
"Or do you actually think you can influence the share price."
A few pennies here and there and definitely in the thinly traded premarket. My own trading in the pre and aftermarket sessions has set both tops and bottoms. But that's not why I post and I don't make overt recommendations.
"The bar was set so high for the interm to give them more wiggle room at the final analysis."
You get it, but let me go into more detail for the other guy. I mentioned that demonstrating something to a 95% degree of certainty is a standard. The least complicated case would be a trial with a single final analysis. The requisite p value would be p=.05 or less and that would give that 95% degree of certainty. Having multiple looks in the form of interim analyses or multiple approvable endpoints complicates trial design. FDA's mandate is that the overall possibility of a false positive (aka type 1 error) outcome be less than 5%. Having multiple looks and using that described endpoint of p=.05 or less for each would violate that premise of keeping the overall possibility of a false positive under 5%. To solve the problem and allow multiple looks and also control type 1 error, FDA allows sponsors to allocate the overall p=.05 between the separate analyses. Recall that two things affect p value, the magnitude of the effect and the volume of data. Throughout a trial, the effect will largely remain constant, but data accumulation increases with time. The P=.0019 standard required for the interim look was not imposed on EXEL. Before the trial started they allocated the p value between the described analyses and chose .0019 themselves. AS commonwealthinc pointed out, that stingy allocation preserved the majority of the remaining p value to be used for subsequent analyses when greater accumulation of data would improve the probability of success. Having the interim cost them a small amount of p value, but gave them tremendous insight into the status of their trial. EXEL did not expect to hit the OS endpoint at the first interim, that they came so close was truly a surprise.
"But just weeks after suggesting/predicting/pondering a JV by end of 2015 in the November timeframe, you vacated your long position of EXEL. Your posts since have turned decidily negative since."
That's a fair approximation of my mindset. You seem to be asking for more color or detail on my thinking. First, I've followed EXEL since 2007, so I have some history here. I felt very strongly that Meteor would succeed and that was the value driver through July 2015. From July-Dec there was a series of news events which all pretty much turned out positively. I was getting up early every morning and on the positive announcements selling into the premarket spikes and then buying back in later. The stock just seemed to continuously cycle between $5.40 and $6.40 and I made the most of it while I could. I had expected the $7's by December, but it was clear that the market was not reading from the same script.
A series of events started to sour me on the investment. First, the Nivo detailed results came out and the consensus of Nivo before Cabo emerged. The overall market reached historically dangerous multiples. An awareness of unsustainable drug prices is spreading and getting press. Len/Ever got breakthrough and EMA accelerated review. The long awaited JV never materialized and the usable cash reserves have crossed below $200 million. Roche priced Cobi lower than I expected. Roche did their Investors day and their Cobi revenue projection was only $100mill (I think), anyways much less than expected. The news cycle ended in December, and I felt it was a good time to get out.
Actively participating on a mb like this has a downside. It's almost as if being a member of the club requires you to own the stock and selling is a form of disloyalty. Insulting someone consists of accusing them of being a short in disguise. Is that an attitude an objective investor should have?
"It is the probability that the result observed in a sample correctly reflects the real world condition..."
Rats, I botched it. A p value by definition is the probability that the result observed in a sample does NOT reflect the real world condition. If you subtract the p value from one, that is the probability of a true result. So let's look at p=.005 from the original July press release again. This tells us there is a .5% chance that Cabo is not superior to Ever and conversely a 99.5% probability that it is superior to Ever. The required p value at that analysis was .0019. So we can say that even though Cabo demonstrated a 99.5% probability of being superior, it failed because it needed to show a 99.81% (1.00-,0019=.9981) probability.
It is intuitively obvious that there needs to be a way to take the enormous body of information gathered in a clinical trial as determine if it succeeded or failed. The agreed upon convention is to use a statistic called a p value.
Some things can be directly observed and are repeatable. The speed of light is 186,000 miles per second or as Newton hypothesized, F=ma. Other things are not so observable and can only be inferred by statistical sampling, Does smoking cigarettes cause cancer or does drug XYZ work better than placebo.
The common convention has become that if something can be statistically observed through sampling with a 95% or better probability that it applies to the overall population, then it has been shown to be acceptably proven true until shown otherwise. It was this concept that has led to the adoption of our current standards of drug approval.
Recall the scientific method you learned about in school. Devolop a hypothesis and then design a controlled experiment to test the hypothesis. So for a clinical trial, the hypothesis is that drug XYZ is superior to a control. Not superior by a specific margin, just superior and we need to demonstrate that superiority with a a high degree of probability that the superiority observed in our sample holds true for the population at large. The statistic we use to make that determination is p value. A p value by definition is a probability. It is the probability that the result observed in a sample correctly reflects the real world condition, So last July the observed p value for the OS analysis was .005. Let's restate that in conventional terminology. Using the sampling results of the Meteor trial as a guide, there is only a .5% probability that Cabo is not superior to Everolimus in the described indication.
So p value is the FDA's way to determine success or failure and the smaller the p value the better. Two things affect p value, the magnitude of the effect and the sample size.
Powering is a statistical tool used to determine how to size a trial and determine the event total that will trigger locking the database and performing the analysis. So lets look at Meteor's powering assumptions and figure out exactly what is being said and not said. Straight from yesterday's PR.
"The secondary endpoint of OS assumed a median of 15 months for the everolimus arm and 20 months for the cabozantinib arm. The study was designed to observe 408 deaths in the entire intent-to-treat population of 650 planned patients, providing 80% power to detect a HR of 0.75."
The assumed OS medians were 20 months and 15 months yielding an assumed HR of .75. A median of 20 months means that half of the patients survived more than 20 months and half less. 15 months divided by 20 is .75. Using medians is not the correct way to compute HR, but it gives a reasonable approximation. The assumed HR of .75 is not the assumed trial result, it is a best guess as to the actual HR for the overall population being measured. A clinical trial just measures a small sample and due to various influences and random chance the statistical result of this sample will not match the real world result that would be observed if every single eligible patient in the world were given the treatment.
So let's look at that powering and restate it in conventional terms. Assuming that the real world HR for Cabo vs placebo is .75, 80 times out of 100 (80%), a trial of n=650, with the analysis triggered by 408 events will successfully achieve its primary endpoint. The HR of .75 is not a minimum requirement or a goal. 80 out of a 100 implies that an end result worse than HR=.75 can still have a successful outcome. By powering at 80%, EXEL has built in a buffer against either an overly optimistic assumed HR or against a drawing a poor hand and having a disproportionate number of poor prognosis patients on the treatment arm.
Survival analysis should more properly be called time to event analysis because survival is not always the event being measured. Progression free survival as well as time to progression also uses the statistical methods grouped under the heading of survival analysis. For biotech investors, the two types of analyses we see most often are survival analysis, sometimes expressed as overall survival (OS) and progression free survival (PFS).
So, for an OS analysis the "hazard" we are measuring for is death and for a PFS analysis the hazard is progression or death if it occurs before progression. When computing a hazard ratio, the statistician breaks down the data into basic time segments, usually a single day, but it can be hours, weeks, months, etc. He looks at the entire body of data and computes the risk of a hazard occurring on a single time segment and expresses it as a probability and computes that risk for each arm of a clinical trial. Probabilities can be expressed as decimal fractions or percentages. So as an example, during a 2 arm OS trial, drug XYZ vs placebo, it is observed that the risk of dying on any single day on the XYZ arm is 1 in a hundred, that would be expressed as a fraction of .01, in percentage terms that risk would be 1%. On the placebo arm the risk of dying is 2 in a hundred, or .02 or 2%. The ratio of those two probabilities is the hazard ratio. In clinical trial usage it has become standard to divide the treatment arm probability by the control arm. So going back to the example, we can see intuitively that you are twice as likely to die on the placebo (control arm). The math for the hazard ratio would be to divide the .01 hazard risk from the XYX arm by the .02 result from the control and the result is a hazard ratio of .50. Hazard ratios less than 1.0 indicate some level of superiority for the treatment and the lower the HR, the better.
"In the trial, a 5-month benefit in OS was required in order to demonstrate statistical significance. This was equivalent to a hazard ratio of 0.75, with a P value of .0019 representing significance."
Dr. Inman botched this badly. If anyone wants an in depth explanation of powering, hazard ratios and p values, I'll try to make it understandable. If yes give me thumbs up.
"To further this point, why would you hold your "disappointing" numbers for a cancer conference where your lackluster OS benefit will be outed in front of all attending?"
Isn't that exactly what they did with the median PFS duration? Go back and read the July press release. They gave the HR's and p values as those were the best stats, but left out the mPFS. If Meteor had finished 6 months sooner, none of this would be an issue, they would have had the space to themselves for a while and would not have had to engage in the press with competing results from other drugs in the same indication.
"ernie, how much of market share will bei left for cabo, if cabo moves in 3rd line behind the anti-pd1 mab? per year: 100M, 200M, 400M?"
That's the big question, isn't it? My guess is not much better than anyone else's. I think somewhere in the 200M-400M range, but again I have no special insight.
"Will physicians prefer the ever/lenva combination over cabo?"
Some might. There are a number of variables. Cabo will definitely get approved with OS on its label. L/E might get approved and its label language is questionable. Eventually the KOL's will write articles. Reimbursement agencies are also publishing their own treatment algorithms and making it difficult to deviate to approved, but more expensive alternatives. That might work in Cabo's favor. The market clearly does not like the uncertainty and is pricing the stock accordingly.
"When first announced, the trial was powered to display PFS. A protocol revision in April 2015 added OS at the top of the list of primary objectives."
I see that, nice catch. This happened a few months before Meteor unblinded. It started with a PFS primary and an OS secondary endpoint. Now PFS and OS are co-primary endpoints. Obviously some major changes were made to the statistical action plan. If they were able to preserve the full p=.05 allocation for the OS analysis, then this trial could be statsig with HR=.70 and 122 events. It is a certainly in the realm of possibility, but it is asking a lot.
1. L/E 22%
2. Opdivo 21.5%
3. Cabo 21%
1. L/E 14.6 months
2. Cabo 7.5 months
3. Opdivo 4.6 months (nonsignificant)
1. L/E 25.5 months (non-significant)
2. Opdivo 25 months
3. Cabo ??? but significant
1. Opdivo (about half the Ever SAE rate)
2. Cabo (same as Ever)
3. L/E (40% more SAE's than Ever)
This comes with all the qualifiers about cross trial comparisons and special notice of the limited sample size for the L/E pivotal trial (n=153, 51 per arm).