Immediately following the Division of AIDS January workshop on long-term clinical research in HIV infection, the workshop attendees were invited to attend a short statistical symposium. The half-day meeting of minds was to focus on the methodological issues of these long-term clinical endpoint studies. Along with a handful of other community representatives, Mark Harrington was in attendance. He prepared this report.
It was early in the year 2000, three and one half years after the introduction and widespread adoption of HAART. Prospects for eradication of the virus with HAART alone had receded to the vanishing point, and the leaders of the National Institutes of Health (NIH)'s AIDS research effort had decided that the long-term effectiveness of HAART merited some focused attention from clinical researchers. The NIAID Division of AIDS was also trying to decide how to cut up the adult AIDS clinical research pie. It had refunded the Adult ACTG at $80 million per year for five years, but had deferred consideration of two other applicants -- the CPCRA and the Veterans' Administration (VA) Network -- while trying to develop its own long-term effectiveness research agenda. The January 2000 workshop, held at Bethesda's Holiday Inn, was designed to focus on the scientific needs and methodological concerns of this kind of research.
Among the key questions:
"This an extremely high priority for the NIH," stated Killen. "Getting a research program in place which can do this is of the highest priority right now. We need to not be paralyzed by fear of not getting it exactly right. We're moving into new methodological territory so we need a lot of help."
Michael Hughes, director of the Pediatric AIDS Clinical Trials Group Statistics & Data Analysis Center (SDAC) at Harvard, discussed ways that randomized, controlled trials could address these long-term questions.
AIDS-related mortality in adults and children in clinical trials has dropped from over 5% in 1995 to less than 1% in 1999. "So to think about mortality -- or even morbidity -- endpoints means we're talking about a very long duration of trials. If these trends persist, they might have to be 5-6 years long to have any real meaning; ten years is more likely to be relevant." How big would these trials have to be? It depends on the magnitude of the effect we're looking for. To see a two-fold (100%) difference in events between two arms, with 90% power to detect a reduction in events from 10% to 5% over five years, we'd need 1,200 patients; to detect a reduction from 5% to 2.5% we'd need 2,500.
Would we want a trial that only showed a benefit in a small proportion of patients, e.g., the ten or five percent mentioned above. Or would we want to know the outcome for the larger population? If so, the trial would have to be larger and longer. More important, there is also the possibility of early transient effects, as was seen with early AZT in ACTG 019 and Concorde. Hughes also raised other issues related to long-term randomized clinical trials:
Hughes stated that, within the ACTG, losses to follow-up are typically in the range of 5% per year. Thus, losses to follow-up occur more quickly than do study endpoints. How could a large, long-term randomized clinical trial avoid such disparities? [One possibility would be to conduct the study at primary care sites, or in a captive network such as the VA system.]
Flexible study designs will be key. Patients should be allowed to change strategies or regimens during follow-up. Size requirements increase if event rates rise over time. One could also envision nested trial designs in which second randomized regimens could be related to starting regimens [e.g., protease inhibitor sparing vs. protease inhibitor containing, with cross-over, cf. ACTG 384 and CPCRA 059].
Hughes stated that we need some event-based randomized clinical trials to underpin cohort-based evaluations. Marker changes and clinical effects of protease inhibitors are comparatively large; we're probably interested in much smaller differences between strategies.
In response to a question, Hughes stated that, "I don't think it's useful to do studies to detect things which will only affect 5-10% of people."
Jim Neaton of the University of Minnesota said that "50% effects -- 10%, 5% -- are totally implausible for the designs you showed. We need trials with not 1,200 subjects but 1,200 events. A combination of larger sample size and longer follow-up."
Victor DeGruttola of the adult ACTG SDAC pointed out that industry or product-specific studies could be nested within strategy comparisons, as is being done in ACTG 384 and FIRST.
Tom Fleming of the University of Washington discussed sample size. Fleming clarified that if we were looking for a 100% improvement (i.e., a change in relative risk of 2.0), we would need 88 events, so we could get by with 88 endpoints. If you were looking for a 20% improvement (i.e., a change in relative risk of 1.2), we would need 1,200 events. "These are important achievable differences. . . . We need large numbers followed long-term."
Neaton replied, "I don't think a 50% effect is plausible. Is 20% plausible? 1,200 is kind of low. 20% is a very meaningful effect, but I question whether we'll see it in a when-to-start trial. Another way to approach this would be to design trials with interconnected designs and a plan to combine them -- non-identical trials, designed in an interlocking way to answer different parts of the puzzle."
Peter Peduzzi of the Veteran's Administration suggested borrowing the large simple trial concept to get some efficiency: get many patients enrolled as soon as possible. "When trials go on too long, patients drop out and physicians lose interest. Embed substudies to study mechanisms. Ensure the design is flexible."
Next Alvaro Muñoz, chief statistician for the Multicenter AIDS Cohort Study (MACS), addressed the potential role of observational cohort studies in determining long-term effectiveness. Randomized clinical trials, being randomized, are used to determine efficacy of various treatments or strategies. Observational cohort studies can supplement and complement randomized trials by providing additional information on individual and population effectiveness.
In the conversation which followed, the general consensus was that observational cohorts are useful to complement or supplement -- but cannot replace -- randomized trials, and that randomized trials are needed in particular to eliminate bias and to tease out treatment effects more modest than the dramatic short-term impact of HAART.
The rest of the afternoon was taken up by two presentations about the use of mathematical models in randomized clinical trials and observational cohort studies. For those attending who were not mathematically super-sophisticated, this portion of the symposium was relatively incoherent, and its relevance to the topic at hand was unclear.
Amy Justice of the Veteran's Administration brought the conversation down to earth, commenting that, "I've been to three meetings at which these issues have been discussed over the last six months. It's not randomized clinical trials versus observational cohort studies, it's how much do we want to spend, how large are the effect sizes we want to see? If it's two- to four-fold, observational databases are likely to be OK. If it's 20%, I don't see how anyone in this room can be confident that they've measured all the relevant covariates. How long do we have to follow them? Are there going to be differences in event rates early or late? These questions we need to answer."
Victor DeGruttola pointed out that, "Having devoted several years of my life to the design of a study of 'tight' versus 'loose' virological control, only to see it bite the dust, I certainly admit these questions are important. But the most difficult aspect of this science isn't the statistics but formulating clinical questions for studies in which doctors and patients are willing to enroll. People have to be willing to live with these questions through several cycles of technological change."
Amy Justice added, "Part of what randomization is trying to buy us is a simple design so that clinicians can understand the results. If they don't understand the results, they're not going to believe them. So for God's sake let's keep it simple: clinical endpoints."
In summary, Steve Self of the Fred Hutchinson Cancer Research Center in Seattle said, "There is no substitute for randomized trials to give us answers to strategy questions that can give us an even 20% effect." Tom Fleming added, "There's so much we don't know regarding the disease process, multiple endpoints, intended and unintended intervention effects, ancillary care effects. We need to have all these types of studies. In particular, now, large-scale randomized trials -- that's what are lacking in the landscape. . . . These need to be long-term studies and large studies. . . . I'm an advocate of large simple trials -- in cardiology they've established themselves as very important tools. But, given the complexities in HIV disease, these trials are not going to be as simple. The length of these trials. . . I'd like them to be revealing some insights at 5-7 years. But for front-line therapy, I want to know ten years. Those will be expensive trials. The size and length of these studies will be study specific. If important insights emerge over five to seven years -- or if there's a [r]evolution in clinical care -- we need to be sure the question we've formulated will remain relevant over time."
In summary, we need: