"Endpoints" are actually just the key data items a trial is focused on. Or to put it another way, they are the data necessary to prove or disprove the trial's central hypothesis. They can be events in a person's life that serve as meaningful milestones for deciding if a treatment is effective. Some common endpoints are viral load, CD4 count, toxicity, quality of life, opportunistic infections (OIs), and death. If, at the end of a randomized trial, there is a difference between the tallies of endpoints in a treated group compared to an untreated group, it is attributed to the effect of the treatment.
In the early days of AIDS research, there were vehement and vocal protests against clinical trials that used death as a key data item. Loud sloganeering protested "killing for science," but in fact, while these trials may or may not have shortened or lengthened life, the outcomes were due to the disease and the drugs being tested -- not the choice of the endpoint. Using the "death endpoint" simply meant that dates of deaths were recorded, and that ample numbers of people were enrolled so that if, in fact, there were a difference in death rates, the trial would be able to measure that. Since deaths occur less frequently than, say, opportunistic infections (OIs), it takes more people in the trial to measure a meaningful difference.
In many ways an impact on mortality was, and remains, the most critical attribute of an anti-HIV medication. Sure, I personally would want to take a drug that reduced my viral load more. But this is only because earlier research has established that viral load is a good predictor of one's risk of death (it's also a partial surrogate marker, which is not exactly the same thing). The numbers of my viral load test have meaning only because they predict advancing disease and death. Otherwise, I really couldn't care less.
We in the US learned that lesson the hard way -- and some of us are still learning it. Two early ACTG trials, 019 and 016, investigated AZT in persons who were asymptomatic or had early symptoms. These trials were halted too soon to give a clear insight into the limitations of AZT monotherapy -- tragically, that had to be learned mainly by patient experience. There were very small reductions in opportunistic infections in the AZT groups of those trials, but this benefit was more than offset by a greater number of adverse events (life threatening toxicities). A more important lesson, though, was the fact that those trials were neither large enough, nor lasted long enough to determine AZT's effect on survival except among a small number of very rapid progressors.
Concorde, the European trial of AZT in asymptomatic patients, by contrast, was much larger, and followed patients for three or more years, as opposed to the year and a half in the ACTG trials. Concorde revealed that there was indeed a "blip" of transient benefit at time points similar to the ACTG trials, but that by three years, long before most asymptomatic participants would have progressed to overt disease, the blip disappeared, and any benefit from AZT monotherapy was lost. The only remaining characteristic of those on the AZT arm was a much higher rate of toxicity. Of course, now we know that AZT resistance sets in fairly rapidly when it is used as monotherapy, but these trials took place long prior to resistance testing.
Critics of death as an endpoint often argue that for trials with early asymptomatic patients, death is likelier to occur due to common causes like automobile accidents, ODs, and homicide. This criticism was fed by the US Veteran's Administration (VA) study number 298: a study of AZT in asymptomatic people, conducted after the earlier AZT trials but before Concorde reported its results. The VA trial too showed little or no benefit from AZT monotherapy, but the drug's advocates worked hard to find a methodological flaw to blame this on, since Concorde had not yet spoiled the party. In particular, they pointed to the fact that a majority of the deaths were not "HIV associated." This criticism reflects a fundamental lack of understanding about the role of randomization. While randomization does not absolutely guarantee there will be exactly the same number of auto accidents in the AZT arm and the no-AZT arm, it does mean that the differences will be very small (if, in fact, AZT has no effect on risk of auto accident). But more importantly, random allocation of people between the treatment arms provides a sound basis for clarifying whether the difference in auto accidents was a matter of chance, or likely due to the drug. Skeptics used to point to suicide (which tragically has been all too common in AIDS trials) to criticize and ridicule the idea of using death as an endpoint. Now, with a greater appreciation of psychiatric side effects that has accompanied the growing use of Sustiva/Stocrin, differences in suicide rates related to treatment assignment is not as humorous or as irrelevant as once thought.
Imagine two treatments that perform virtually identically in their viral load effect at 48 weeks, yet have very different clinical performances over longer periods of time. A relatively tolerable regimen might allow years of good health, and a more toxic one could prevent HIV-related conditions, yet eventually poison you. The initial approval of the first protease inhibitors (PIs) offers some interesting insights. Despite those who claim that the age of clinical endpoint trials is over, the initial ritonavir study, conducted by William Cameron, and ACTG 320, which investigated indinavir, each showed dramatic reductions in both infections and deaths among patients taking PIs. Of course, now that there are several HAART regimens to choose from, and with OI rates having plunged even further from the dual-nucleoside days, similar trials would need to be far larger and longer if they were to be conducted today. An exception to this might be among salvage patients, who unfortunately are at immediate or near term risk of OI, which allows the clinical benefits of treatments to be more rapidly assessed.
There's another instructive lesson to be learned from the initial PI trials. Not one of the trials that led to FDA approval of the current PIs showed evidence of lipodystrophy or insulin resistance during the study period. These syndromes -- both with life threatening potential -- only began to appear after a year or more on treatment. So it's not enough to know that a treatment reduces your viral load. You also need to have some sense of whether the treatment itself is likely to kill you. Early on in the first days of protease euphoria, Keith Henry of Regions Hospital in Saint Paul wrote an article describing several previously healthy patients in their twenties who suffered alarming rises in lipids (cholesterol and triglycerides), and then suffered heart attacks after initiating protease therapy. These particular patients were relatively immunologically healthy. It would have been highly unusual if they had suffered an OI during that brief period. For them, at least, the treatment may have been worse than the disease. In the absence of clinical endpoint studies, we may never know how common an occurrence this is.
The issue is further muddied by the rapid approval of HIV drugs based on viral load changes over 48 weeks, as specified by the initial accelerated approval regulations. The situation is even worse today when drugs are approved based on 24 week data -- significantly short of half a year. The possibility of detecting slowly developing toxicities in such a short time is almost nil. After all, even 48 weeks was too short to observe lipodystrophy or insulin resistance in the initial PI trials. Also, 24 week data short-changes us on resistance information as well. It is quite possible for PI resistance to not fully develop within the first six months.
Obviously everyone wants new treatments to become available as soon as possible, but there's a crucial component of accelerated approval that has been neglected by the drug companies. Rapid approval based on virologic data was supposed to be only the first step to gaining full approval. Longer-term clinical follow up was supposed to continue after rapid approval. Drug company commitments to do post-marketing studies have for the most part been abandoned or done in a lackluster fashion. There is very little information published in the scientific literature that derives from longer-term post-market trials. Although post-market study is a formal obligation, written into the accelerated approval regulations, most companies simply ignore it, and the FDA unfortunately cannot, or will not force the issue.
Unfortunately, the public antipathy to including death as an endpoint contributed to a case of science being clouded by human desperation as well as business interests. People with AIDS needed new treatments as soon as possible, but if the FDA only agreed to review trials with mortality endpoints, then new drug approvals would be delayed. Neither the community nor the drug companies wanted that. Yet as we've seen, by foregoing death as an endpoint and by not following-up long-term there's an increased risk that a drug might slip through that actually shortens life.
Nowadays, of course, viral load is the most common endpoint in an antiretroviral drug trial -- although it can be used in several different ways. One way is to find the percentage of participants whose viral load is undetectable, or below the level of quantification (BLQ). This is a nice simple case of what's called a binary endpoint: something that can have only one of two possible values. The viral load is either detectable or not and that's all we know. Death is also a binary endpoint. We could still have the same censoring problems with viral load endpoints as with OI endpoints, but happily, people on treatment no longer die the way they did in the eighties. Nonetheless, censoring by death can be a significant source of bias.
One of the most common ways of dealing with censoring bias is to employ a "non-completer equals failure (NCF)" analysis. This is used when people in a trial are not available to have their viral load drawn, whether due to death, dropout or any other reason. The missing value is recorded as a "failure" -- in this case, detectable virus. Clearly, this also has its problems. An NCF analysis can possibly cause the drug's potency to appear understated. But that is preferable to believing a drug is effective when it is actually ineffective, or even killing people despite their having no detectable virus to measure. It also, some believe, adds a "reality factor." If many people stop taking one drug but not the other in a trial, or people on that drug just drop off, this suggests an underlying reason. The drug may be very toxic, difficult to take, or otherwise unattractive. So while NCF analysis may give a poorer estimate of the biological activity of a drug, many feel it can provide a preview of its effectiveness in the real world.
There are other ways to use viral load as an endpoint for a clinical trial. Using an average of viral load values for everyone in a trial can actually provide much more information than a simple binary (detectable/undetectable) result. But now there is a real problem about what to do with people who are in fact undetectable. If we decide to average the viral loads of all participants, what value do we set for those who are undetectable? There are a number of ways of dealing with this problem such as predicting what an undetectable value should be from the slope of past values or assuming that the undetectable value is at some point the middle of the undetectable range.
One drug company had its hide nailed to the wall by some sharp-eyed activists a few years ago. For the purposes of their trial, the company assumed that everyone who was undetectable was at the bottom of the range; they were assigned a viral load of 1. Viral load tests in those days were only able to detect virus levels above 500, so in this trial, the minute someone went undetectable, their statistical shenanigans made it seem like there had been a huge 2.7 log drop! "What a great drug!" crowed the company. "Not so fast!" said the activists. Despite this not-so-sly effort to cook the books, it's a real challenge figuring out how to deal with data and still come up with useful, scientifically valid answers.
But although death is still uncomfortably common in our community, it's nothing like it was in the eighties and early nineties, when gay men, injection drug users, and female sex partners of positive men were dying by the dozens every day. The last thing I would want this article to do is to flat out discourage someone from accessing treatment. Nonetheless, attitudes are rapidly changing, and many clinicians are wondering if putting people with high CD4 counts on these powerful drugs is a great idea. The Community Program for Clinical Research on AIDS (CPCRA)* is beginning a trial later this year to try and address some of those questions. The "SMART" trial will assemble one group of patients whose treatment is managed so as to keep viral load as low as possible at all times. Their endpoints will be compared to another group that will try to hold off treatment until their CD4 cell counts approach (but not too closely) dangerous levels. This is going to be a very large, very long-term trial, so definitive answers are years away. Meanwhile, as through so much of the epidemic, we are going to have to rely on the educated best guesses of our doctors and ourselves. Happily, even with treatment-associated illness, death rates are a small fraction of what they once were. As wonderful as this is in a human sense, it makes the science more difficult. "All cause mortality" is a very reasonable and useful candidate endpoint for clinical research. But with deaths on the decline, such studies will be very large and long. We can only hope that risk of iatrogenic (treatment caused) disease does not continue to increase with duration of exposure to antiretrovirals.
The protease inhibitors, non-nucleoside reverse transcriptase inhibitors, and the classic nucleosides gave us a mini-miracle. It would be hard to think of any disease where progress in treatment has been so swift and decisive. Unfortunately, everything comes with a price, and these very powerful drugs have substantial toxicities. It remains to be seen how many people can survive after five, ten, or fifteen years on these drugs. The challenge for antiretroviral clinical research, then as now, is to create endpoints that are meaningful for today. If future trials are to be relevant and provide us with the information we need to make safe, sound health choices, these tough questions about appropriate endpoints will have to be grappled with again and again.
* The author needs to disclose that he works for the Statistical and Data Management Center of the CPCRA, and is, in fact, on the SMART protocol team. However I am writing this piece as an individual, and nothing I say should be construed as an official CPCRA position.
Back to the GMHC Treatment Issues September 2001 contents page.