Comments on “Guidance for Industry Drug Development for Amyotrophic Lateral Sclerosis”

For the past 10 month I have served on the Benefit/Risk Working Group and ALS Patient & Caregiver Advisory Committee for the ALS guidance document  (http://www.alsa.org/advocacy/fda).   The first public draft was released this week and comments are being accepted until May 30. (http://www.alsa.org/advocacy/fda/assets/als-drug-development-guidance-for-public-comment-5-2-16.pdf). I have great respect for the patients and researchers who have participated in crafting this draft. I would never doubt their commitment to this community. I myself have tried to play an active role in this process and fully take my share of the blame for its shortcomings. This draft does a good job discussing the natural history of the disease. It describes many of the complexities of the disease, the current understanding of the disease, and its horrific impact on patients and their families. Each section touches on the urgent need to find new treatments. There are also instances with regard to evidentiary standards and alternative endpoints in the Benefit/risk section, where this need is reflected in our guidance.

However, I do not believe the majority of the recommendations that it provides to sponsors are based on or address the specific nature of this disease. Nor do they truly reflect the needed urgency. More so, I do not even believe that the guidance reflects the degree of flexibility that the FDA is currently willing to provide given the prognosis that patients face and the current lack of treatment options. We, as a community, have the opportunity to describe how current FDA regulations should be applied given what we as scientists and patients know about ALS. However, instead of trying to be thought leaders on how to conduct clinical trials in a way that directly addresses the nature of the disease and truly aim to speed effective treatments to patients, we are putting forth a document that is even more cautious than the Agency’s current positions.

Somehow other disease spaces like HIV or many different cancers or DMD can utilize the flexibility that the FDA grants in terms of endpoints, expedited approval pathways, or trial design, but we must fall back on using the same techniques that have been discussed in intro biostats classes for generations. I cannot fully express my degree of frustration at the disconnect between the lip service given to the urgent need to get effective treatments to this generation of patients and the recommendations that all tend to fall back on the ultra-cautious status quo. The President and Congress have repeatedly told the FDA that they should be flexible when it comes to serious conditions with unmet needs. The FDA has gone on record stating that they are willing to be flexible in working with sponsors. Yet here we are advocating for essentially nothing new or novel. We are telling sponsors to ignore the FDA’s public stance. Either we don’t believe the FDA will honor their word and we are not willing to hold their feet to the fire, or we are not willing to go out on a limb and advocate for something new, that while not perfect, is clearly better than the status quo.

When I agreed to participate in crafting this document, I believed its purpose was greater than simply spelling out the cruelty of the disease. I believed we were going to discuss this specific nature of ALS and then use that to inform improved ways of conducting trials within the FDA’s current regulations. Somehow, instead we were satisfied to briefly discuss potential new techniques, but then close each section describing their limitations and recommending we stick with what has (not) worked in the past without any discussion of the HUGE limitations of our current methods or how they have contributed to recent failures. Saying that something new does not work perfectly is not justification for sticking with something old that is highly flawed. How can we open the Clinical Trial section talking about how failures are due to heterogeneity and inefficient trial designs, but then insist on methods that in no way address these inefficiencies and are solely based on minimizing bias? How can we talk about how many people will die over the next two years and the willingness of patients to take greater risks but then only give passing mention to Accelerated Approval or any of the FDA’s other expedited pathways? How can we discuss the importance of finding biomarkers and then give sponsors one sentence describing how they can be used as surrogate endpoints? Are we really going to lay out “guidance to sponsors” without discussing any lessons learned from the two biggest (failed) ALS trials of the past 20 years?

I am simply asking that we live up to what we say in the introduction, “This guidance reflects the FDA’s current thinking regarding the weight that should be given to the preferences of ALS patients and caregivers with regard to benefit/risk tradeoffs in light of the severity and rapid progression of the disease coupled with the lack of effective treatments.”

I am not advocating an abandonment of science or throwing out everything we have learned from the past. I am not advocating anything that violates current FDA guidelines. If anything, our document should more closely reflect the ideas in FDASIA 2012 than the FDCPA of 1938, or its amendments from 1962. As recently as April 25, 2016 you have the director of the FDA Center for Drug Evaluation and Research (CDER) saying that greater uncertainty is tolerated under Accelerated Approval[1]. The deputy director repeatedly stated that trials using external controls can be considered “well-controlled”[2]. The FDA for decades has maintained that the effectiveness standard is discretionary and that they understand that different diseases have different needs[3]. Yet there is very little in this document that takes advantage of the tools at our disposal that not only fall well within the FDA’s current guidelines and are used to fight other diseases, but that also coincide with the leadership’s public statements. The FDA isn’t the problem here. Our lack of creativity and cowardice is. Or maybe we are just not willing to have the serious discussion required to justify anything new?

Efficiency and Single-arm trials.

The goal of a clinical trial is to figure out whether or not a treatment works. In ALS, this is especially difficult for a number of reasons that are well spelled out in the document. The disease manifests itself differently across patients. Patients progress at different, ever-changing rates. Our current measurement techniques are fairly imprecise. Many times we do not have a good idea on who a treatment will likely work for. Lastly, a relatively small number of patients have the disease and will be able to survive a long, drawn out trial. This implies that it is hard to get a signal of the underlying effectiveness of a treatment because of the degree of statistical noise due to all of these factors. Constructing trials that efficiently utilize data to inform us of the likely effect of a treatment is paramount in ALS.

Yet, in the discussion of an adequate control group, there is boilerplate language on the unbiased nature of RCTs, but absolutely nothing addressing whether or not this should be our primary concern given the specific nature of ALS. Even when addressing the rarity of the disease and the need for small clustered trials, the document only goes so far as to discuss when RCTs might not be feasible, but fails to state when they may not be advisable relative to other techniques. Multiple sections contain detailed discussions of the degree of heterogeneity in this disease and our limited ways of measuring progression. However there is no serious discussion of the importance of efficient trial design. I do not see how we can not have any real discussion of the trade-offs between bias and efficiency in this setting. We simply ignore everything else that we have stated regarding natural history when it comes to selecting an appropriate trial design. Our recommendations make almost no distinction between 500 patient trials and 20 patient trials[4]. This is insane. We are not doing our job.

One of the most direct and cost-effective ways to increase the explanatory power of a trial is to use external controls. For example, in an early phase, 50 person double-blinded randomized controlled trial, you provide treatment to 25 patients and then give the 25 others a placebo in order to establish a baseline for what the treated patients would look like on average over time.  You use half of your data to reestimate a baseline, ignoring everything we have learned from the past[5]. In a single-armed trial using external controls, you can use thousands of observations from existing datasets to estimate a more precise baseline and then can use all 50 patients to try and get a sense of whether or not the treatment works. Same number of patients. Same timeframe. Four times the explanatory power. If we are serious about trying to increase the signal to noise ratio in our trials so that we can make definitive determinations on whether or not to proceed with potential treatments, we have to be open to utilizing these types of techniques. The FDA has repeatedly stated that external controls can be used to generate and “adequate and well-controlled” trial[6]. Our document currently says no. So while the ALS Association is publicly praising, and generously funding, the work of groups like Origent, we put down in writing that only RCTs can provide “conclusive evidence”.

The most positive thing we have to say about this technique is that we mention that it can be used to provide supplemental analysis in the Risk/Benefit section. The Clinical Trials section takes a much more negative stance.

As I read the section on historical controls under Clinical Trials, it struck me as being grossly outdated and based on examples from generations old textbooks. If you look at the citations in this section (426-429) they are from 1972, 1974, 1972, and 1976. There is anecdotal evidence from (430) 2012 and from a fifty-year overview of multiple myeloma from 2005 that is tangentially related (432), as well as a survey (431) from 1983.

Yet curiously there is no mention of Monzon et al. 2015, (http://www.sciencedirect.com/science/article/pii/S0959804915007790) that compared 270 single arm and 66 randomized controlled phase 2 cancer trials conducted between 1998 and 2012. The authors found that subsequent phase 3 RCTs were no more likely to be successful when resulting from a positive RCT than from a positive single arm trial. If it was the case that single-arm trials were prone to excessive bias and were unreliable (as was found in the trials from the 1960s and 1970s in the Sacks 1983 paper), then you would expect that subsequent phase 3 trials based on their results would likely fail. However, receiving a positive signal from a single-armed phase 2 trial was equally as informative as receiving a positive signal from a phase 2 double-blinded randomized control trial.

Again, I am dumbfounded that the ALS community seems willing to rest on what was commonly accepted in textbooks in the 1970s and 1980s, while other areas, like oncology, have moved forward and currently make great use of externally controlled trials. The 1983 survey, used to argue against the use of historical controls, also pointed out that RCTs are problematic as well. We completely ignore that best practices for using historical controls obviously have changed over the past 30 or 40 years, yet we are quick to dismiss the concerns over RCTs by saying that we should just relax our evidentiary requirements.

If trials are constructed carefully and you make sponsors ex ante identify and justify their choice of controls, external controls would be one of the most direct and effective ways of trying to increase the efficiency of our trials. If we are worried about missing data, isn’t it at least worth mentioning that single-arm trials may improve retention? Are we going to disregard an approach, supported by the FDA and frequently used in other disease spaces, that would increase the power of our data by 4x because we do not want to move beyond what was commonly accepted 40 years ago or have to think seriously about establishing best practices? Are we going to ignore the fact that the most consequential failed trials in this space have followed double-blinded phase 2 trials?

The final line of this section stating that only RCTs can provide “conclusive evidence” is patently false and I believe is a huge red flag indicating that we do not want to think seriously about improving on the status quo.  It is complete fiction. Has oncology been built on pseudoscience? Has the FDA been permitting a sham method since they established the “adequate and well-controlled” requirement in 1970? For all the focus on making this a scientifically justified document, there is in no way we should be inserting this myth into our guidance[7].

Lastly, I will not lower myself to seriously consider the idea that single-arm trials are inherently unethical (p. 72). This is being marketed as a “Patient-focused Guidance” and this is the single instance in the document where we are going to choose to discuss ethics? I speak for one patient who would never even hint at this kind of absurdity. I don’t care what esoteric references are used to justify it.

 

Dexpramipexole and Ceftriaxone.

Efficiency is not some obtuse statistical concept not relevant to ALS studies. It played and an enormous role in the Dexpramipexole trial. In the phase 2 study, whose results led to a failed, $50M+, 1,000 patient phase 3 study, it was crucial. That study compared patients on three different doses to patients in a small placebo arm. They found that fewer patients receiving high doses of Dexpramipexole had a six point decline in ALSFRS-R than in the placebo group over three months. 33% of patients on placebo declined by six points are more. Patients on higher doses of Dexpramipexole incurred six-point declines 15% and 8% of the time. However, if you look at historical data from placebo patients in the Pro-Act database using similar exclusion conditions, roughly 14% of them decline by six points or more over three months. So it was not that the dex patients were progressing slower than expected, it was that placebo patients in the trial happened to progress faster than normal. The dose dependent response is much less convincing when you add in historical placebo data that mirrors the high dose patients. Perhaps if the treated patients had simply been compared to representative external controls instead of to a small number of concurrent patients on placebo, the money and time spent on a 1,000 patient trial could have been deployed elsewhere. Even though this was perhaps the most important trial in the history of ALS, there is no discussion in this draft on what we should learn from its failure.

Along the same lines there are no mentions of the failed ceftriaxone trial. There were a number of unique motivations that contributed to the stage 1-3 adaptive design of this trial. However, lessons should be learned about how and why it progressed to a 448 patient stage 3 phase based on results from 45 treated patients and 21 patients on placebo after a 20 week stage 1-2.  The published articles following this trial only included the difference in the decline of ALSFRS-R between treated and placebo patients, so it is unclear whether this positive signal was due to unexpectedly slow progressing treated patients or unexpectedly fast progressing placebo patients. Either way, there is no discussion of these results relative to data from other sources. Therefore the decision to add 448 patients and extend the trial was dependent on a baseline generated from only 21 patients.

The use of external controls may or may not have impacted the preliminary results of these studies that led to the two largest and most expensive (and failed) ALS trials in recent history.  However, the fact that they were based on results from “gold standard” double-blinded randomized trials must at least be discussed. We are content to cherry pick misleading anecdotal results from trials in other disease spaces to rationalize not using external controls, but we ignore misleading results from “gold standard” trials in ALS which led to incredibly expensive dead ends. At a bare minimum, we must acknowledge that according to the recommendations that we are currently making, decisions to expend an enormous amount of scarce resources on phase 3 trials will continue to hinge upon the very noisy progressions of a handful of patients who randomly end up in the placebo arm of a preliminary trial.

 

Biomarkers, Surrogate Endpoints, Accelerated Approval.

The biomarkers section describes a number of promising areas and techniques. While it stresses the need to find usable biomarkers, I believe it could go further in providing guidance to sponsors on how to get there. In the discussion of identifying and using biomarkers, the FDA’s concept of “surrogate endpoints” must be addressed. There is currently one single sentence on this issue on the last page of the section. If we truly want to provide sponsors with a way to get effective treatments to patients in a manner that reflects the urgency needed, we must provide them with true guidance on the regulatory vehicles at their disposal.

According to the FDA, “A surrogate endpoint is a marker, such as a laboratory measurement, radiographic image, physical sign or other measure that is thought to predict clinical benefit, but is not itself a measure of clinical benefit.”

Part of the permitted “uncertainty” with regards to Accelerated Approval is on the connection between a surrogate endpoint and the desired clinical benefit. For decades cancer used objective response rate (ORR) as a surrogate for survival before it was ever empirically validated to predict clinical benefit. It was only recently that researchers began to do this and there have been conflicting results across different forms of cancer (Chen et al. 2000 vs. Buyse et al. 2000).

It will take decades to definitively validate any surrogate endpoint. To truly do it, we would need multiple effective treatments that could manipulate the surrogate and then we would need to track patients for years to verify that changes in the surrogate were correlated with clinical benefit. This process will take decades at best. This is exactly why the FDA uses ambiguous language about surrogate endpoints needing to be “reasonably likely to predict” clinical effect. Then given this added level of uncertainty, the FDA requires post-approval trials to verify anticipated clinical benefits.

Again, why are we not playing a role as thought leaders on this subject? Refer to documents like this (http://www.hpm.com/pdf/blog/Subpart%20H%20Analysis%20-%20FDA-2013-D-0575.pdf) to help sponsors understand how the FDA makes determinations on the validity of surrogates.  Discussing the severity of the disease in the natural history section and in the introduction of other sections is not sufficient. Our “guidance” must be tailored utilizing the FDA’s protocols that were specifically established for diseases like ALS. The Public Policy section contains no description of Accelerated Approval and only discusses preapproval expanded access programs.

Given the statistical difficulties in generating conclusive results in ALS, the severity of the disease, and the number of patients who will die or lose significant physical abilities while large and lengthy trials are conducted, there is urgent need to get effective treatments to patients as quickly as possible. Sponsors must be encouraged to utilize the FDA’s Accelerated Approval program and we must be providing them guidance as to what this program entails and its evidentiary standards. We need to do a better job breaking out Accelerated Approval in a distinct fashion and stress to sponsors that we as a community and the FDA believe this is an important and appropriate regulatory route to take in finding an effective treatment for ALS.

 

Expedited pathways.

In addition to Accelerated Approval, the FDA has three additional programs intended to expedite development and review for new drugs that address unmet needs for serious or life-threatening conditions. Of the 21 new drugs approved by the FDA for rare diseases in 2015, 18 utilized one or more of the expedited pathways. These programs obviously were put in place specifically for diseases like ALS, yet we devote one sentence to their existence on the second to last page of the document. If our goal is to help sponsors get drugs through the FDA as efficiently as possible, we must inform them on how to navigate these different programs. Shouldn’t this be central to “Guidance for Industry Drug Development for Amyotrophic Lateral Sclerosis”?

 

Survival as an Outcome Measure.

The discussion of the use of Survival as an outcome measure must address the fact that even if you have the resources to conduct a trial of sufficient duration, by definition you would need to wait for a significant number of patients in the placebo arm to die before you could make a determination of efficacy. (This point was discussed at length on the only substantive call for the PCAC, yet no changes have been made.)

So just to clarify, the document currently states that it is unethical to give a treatment to every patient in a trial. However, there are no ethical issues with a design that by construction requires standing by while a certain number of patients in the placebo arm die before you can statistically see an effect. If the treatment is effective, you only sacrifice the placebo patients. If it doesn’t work, you sit and watch an equal number from each arm die. Where is the patient voice here?

Even if we suspend disbelief regarding the point above, it speaks to the broader issue that using inefficient designs means we need bigger, longer trials. That means patients in the placebo arm have to wait longer until they get access to the treatment and everyone who was excluded from the trial must wait longer as well. For a frame of reference, a 6 month delay will lead to over 2,000 unnecessary deaths in the US alone. Before disregarding anything new, we must be honest about what the status quo implies.

 

Linearity of ALSFRS-R.

Where is the citation justifying that ALSFRS-R declines linearly over the course of a year? I believe it is inaccurate and counterproductive to suggest to sponsors that ALSFRS-R follows a linear trend in any way. Origent recently changed their methodology to avoid assuming a constant slope for this exact reason. Sure, you can draw a line through any set of points, but that does not mean that the relationship is truly linear or that assuming linearity will improve your trial. Even if we believe progression follows a somewhat constant pattern, ALSFRS-R does not. Yet, you still see trials set up measuring “deviation from trend” or “change in slope”. We think this makes intuitive sense because we know patients progress at different rates, and so tracking patients for a couple of months gives you different individual baselines. The problem is that individual progressions simply do not follow linear patterns and analyzing “deviation from trend in ALSFRS-R” can introduce more bias due to unbalanced arms than simply assuming that everyone will progress at the same rate.

One of the most striking things that I have found looking at the PRO-ACT data is that of the 20% of observations that had the smallest average monthly decline in ALSFRS-R over a three-month period, 100% of them experienced larger average monthly declines over the following nine months and only 47.1% continue to decline at a slower rate than the overall population average. The same holds at the other tale and when examining FVC. Crudely using a three-month lead-in does not identify “slow progressors” it identifies who progressed slowly over the first three months. In no way does the data support assuming linearity over 12 months in a trial. Suggesting ALSFRS-R follows a linear trend is not only incorrect, but applying that incorrect assumption makes our trials worse.

 

Prevalence.

On page 13, in reference to the prevalence figure from the 2014 Registry report, the word “definite” should be in quotations. This usage comes from a specific definition of “definite ALS” related to the methods in that paper to passively identify ALS cases from government administrative databases based on items like prescription records. It has no connection to how the term is typically used in the ALS community related to diagnostic designations within the El Escorial Criteria that is discussed at length in the Natural History section.

Also, the 12,000 figure is the number of patients who self-enrolled in the registry along with selected cases from government databases. A large number of patients are not captured through either source. At the Collaboration For A Cure meeting last year, the consensus was that this number needed much more work before it should be published. “After a robust discussion among members of the group, it was determined that the data is not sufficient to support currently reported numbers of people living or diagnosed with ALS.” http://www.alsmndalliance.org/collaboration-for-a-cure/  Without controlling for enrollment decisions, this number should not be used as an estimate of prevalence and does not merit inclusion in this document.

 

Conclusion.

My main concern is that in too many sections of the paper we are not seriously tailoring our recommendations to the specific nature of ALS and the urgent need to identify and provide patients access to effective treatments.  We are not taking into consideration the limitations and drawbacks of current methods and techniques when looking at potential improvements. Instead we are relying on decades old norms, that other disease spaces have left in the past, to justify the status quo. Whether it is in the discussion of biomarkers, new measurements, exclusion criteria, single-arm trials, or Accelerated Approval, simply noting why a new method might not work perfectly does not mean that it still is not better than the status quo. It is unacceptable to me that we are writing a document that takes a much more cautious tone than the FDA’s recent public statements and current policies. If we are truly writing “guidance” to sponsors then I do not see how we can avoid seriously examining whether imperfect new strategies may be superior to imperfect old strategies. We must base our guidance on the disease and the needs of patients. We must do better.

[1] “The accelerated approval program includes a requirement for confirmatory studies for efficacy. So, as you heard from the sponsor, they have to do further studies to explore and confirm effectiveness. An inherent presumption in this program of accelerated approval, which is written in the preamble of our regulation about it, is that more uncertainty is going to be tolerated initially and that in fact sometimes we will collectively get it wrong. Otherwise accelerated approval will have no different standards than regular approval.” – Dr. Janet Woodcock, Director of CDER, 4/25/2016 Sarepta AdComm.

[2] “Our regulations since 1970 have said that a historical controlled trial can be adequately and well controlled study. The question here goes: Under the circumstances do you think it was? Do you think the way they selected patients, the way they analyzed them, was good enough to make it an adequate and well-controlled study? That’s the question. Historically, historical-controlled trials have been the basis for approval — sometimes in sort of obvious cases and sometimes in cases that quite aren’t so obvious.” – Dr. Bob Temple, Deputy Director of CDER, 4/25/2016 Sarepta AdComm.

[3]  “The standard is adequate and well controlled trials, OK, that’s what is in this statute, but we are instructed to have flexibility on how we interpret that based on medical need.” – Dr. Woodcock, 4/25/2016 Sarepta AdComm.

[4] We are also ignoring efficiency when we recommend that sponsors explore the dose range even in early trials. Typically this involves further segmenting the small number of observations into more than two arms. Therefore efficacy is judged based on a comparison of the treated patients to an ever smaller sample of patients in the placebo arm.

[5] Using unbalanced allocation ratios actually further reduces efficiency. For instance, if you used a 2:1 randomization, or a 1:1:1 with two different doses and a placebo arm, your baseline would then be determined by the 33 random patients who ended up in the placebo arm.

[6]  In the Sarepta case, the issues were not the use of historical controls, they were with how the trial was designed and conducted. “The ways the controls were selected and analyzed didn’t meet the threshold that I would consider to be adequate and well controlled,” said Caleb Alexander, an associate professor of epidemiology and medicine at the Johns Hopkins Bloomberg School of Public Health and chair of the advisory committee.

[7] If you hold some concept of “conclusive evidence” so closely to your heart that you do not believe a single arm trial could ever meet your standards, then if you are truly intellectually honest, you have to concede that no RCT (especially in as small and heterogeneous population as ALS patients) would ever be able to provide “conclusive evidence”. This is a false construct made by someone who fundamentally does not understand science and the uncertainty that goes along with anything that is empirically based.

Advertisements