illustration of tubes with eyes at the ends, looking at a group of 5 men

Behavior on Trial

Behavior change may be our best hope for stemming the HIV/AIDS epidemic. Why is it so hard to evaluate the interventions?

By Christine Grillo • Illustration by Jon Reinfurt

Behavior change may be our best hope for stemming the HIV/AIDS epidemic. Why is it so hard to evaluate the interventions?

In the late 1980s and early 1990s, investigators at the Medical College of Wisconsin wondered if popular men in gay bars could effectively promote safe-sex messages.They designed a randomized-controlled trial (RCT) that was conducted in three small Southern cities. In Biloxi, Mississippi, where the intervention took place, the popular men were recruited to endorse condom use to bar patrons. The two comparison cities received no specific intervention and served as controls.The results were impressive. In Biloxi, the number of risky sexual encounters fell by about 30 percent over two months. In the other two cities, the risky behavior stayed the same. Another trial, this time in 16 cities, yielded the same results. A new, effective HIV/AIDS intervention had been identified.

Ten years later, epidemiologist David Celentano, a veteran in the field of HIV/AIDS prevention and of RCT design, did a similar trial with the same intervention in five countries on four continents. Over 24 months, the number of risky sexual acts decreased by 33 percent in the intervention group. But the number of risky sexual acts decreased by the same amount in the comparison group. Clearly, risk had been reduced all around, but there were no differences between the two study arms. (With dramatic downward shifts in risk in all five countries, it seemed unlikely that anything other than the trial itself had effected the risk reduction.) The results surprised Celentano. Because the intervention known to work seemed no more efficacious in the trial, it might be cast aside.

What happened?

The standards for ethical conduct of trials had changed, says Celentano, the Charles Armstrong Chair in Epidemiology at the Bloomberg School. In the intervening years, many new interventions had been proven effective, and ethical obligations required that they be offered to the control arm of the later trial. Celentano lists the services offered to participants: educational materials, free condoms, HIV and STI testing and treatment, pre- and post-test counseling, and extensive interviews about risk behavior. "And that's just the control group," he says.

The stakes for finding effective HIV preventions are high: An estimated 33 million people live with HIV, another 2.6 million are newly infected every year and 1.8 million die of AIDS annually. Now 30 years old, the field of HIV/AIDS prevention draws scores of researchers who spend billions of dollars in a race to find ways to prevent transmission. Some want to identify biomedical interventions, such as microbicides, vaccines and male circumcision. Others are counting on behavior change programs—safe sex education, peer counseling, media campaigns— to slow down the epidemic.

But the field of behavior change, in particular, is tricky terrain for evaluation. As new interventions are shown to be effective, ethical obligations and aspirations evolve, making evaluations more challenging. And as real-world HIV/AIDS conditions become more complex, RCTs begin to seem less feasible.

Do the Right Thing

In HIV prevention trials, investigators compare the incidence of new infections among the control arm to those among the intervention arm. For an intervention to be deemed effective, the intervention arm must show significantly fewer infections than the control, which reveals the intervention's impact. "No researcher wants people to get infected," says Maria Merritt, a core faculty member in the Johns Hopkins Berman Institute of Bioethics, "but the expectation is that some participants will get infected."

Researchers and ethicists agree: Investigators have an obligation to minimize risk to all participants. And in this field, there is an acceptable level of protection for all participants in a trial that has been generally agreed upon by experts.This standard of prevention guides what is offered to participants—known as "the prevention package"—in order to minimize risk. (Prevention packages often include counseling, testing and treatment, like the suite of services offered in Celentano's trial.) But, notes Merritt, PhD, an assistant professor in International Health, some people would add that there is an obligation to maximize benefits for participants.This might take the form of making available as many known effective interventions as possible to all participants.

Not surprisingly, robust prevention packages tend to dilute the results of a trial. Overall, there may be fewer participants with new HIV infections, which is a wonderful thing. "The paradox," says Jeremy Sugarman, deputy director for medicine at the Berman Institute, "is that the more effective the basic package, the less likely that the research question will be answered."

When faced with this challenge, some investigators and bioethicists invoke the “real world” or the notion of “usual care.” In theory, the control arm of a trial represents the local standard of care in the setting. “What is usual care in some developing countries? Nada,” says Celentano, ScD ’77, MHS ’75. “You want to do what’s right for the community, but these ethical imperatives make the science incredibly hard to do.”

Community trials of behavior change interventions are expensive and take years. With full-monty prevention packages, the results are sometimes unimpressive. “And then the basic science folks say, ‘See, behavior change doesn’t work,’” Celentano says.

Sugarman, MD, MPH, MA, who chairs the Ethics Working Group of the HIV Prevention Trials Network, cites ethical and pragmatic reasons for examining carefully the standard of prevention. Ethically, it might be irresponsible to introduce a prevention package that cannot be sustained or implemented locally after the trial has finished. Practically, he argues, it’s important to remember that just because something works in one setting doesn’t mean it will work in another setting. For example, while an antiviral drug may be useful in preventing HIV among men who have sex with men, it remains unclear whether the same drug will be effective in preventing heterosexual transmission. To include that in a prevention package among heterosexuals might be premature and presumptuous.

Yet another challenge is the adding on of new interventions mid-trial as they are shown to be effective—to maximize benefits to participants. “I worry that the ethics discussion got ahead of itself by not asking what we need to know before adding new interventions. You can’t always change a study design on the fly,” says Merritt.

“Heaping on interventions in the prevention package isn’t necessarily the right thing to do,” says Sugarman.“You need to make sure there’s an explicit reason for doing so and that there is reason to assume that more will necessarily be better.”

The Gold Standard

For the last 30 years, RCTs have been considered the gold standard in evaluating HIV preventive interventions. The RCT has been linked in our hearts and minds with the term “evidence-based medicine,” and thus, argues Steve Goodman, a core faculty member in both the School’s Center for Clinical Trialsand the Berman Institute of Bioethics, the gauntlet has been thrown down. To be taken seriously, an intervention must prove itself in a randomized trial.

In biomedical interventions such as male circumcision, designing an RCT is challenging enough. In the circumcision trials, participants consented to take part in the trial without knowing if they would be assigned to the intervention arm (circumcision) or the control arm (circumcision only after it was shown effective). Because the assignments are randomized to reduce bias, neither the participant nor the provider has any say in who gets what done. “One could argue that the reason clinical trials are a new entry in the tools of medical investigation is that doctors and patients couldn’t countenance the notion of randomization,” says Goodman, a professor in Epidemiology. “There’s an ethical calculus in every trial.”

But at least with a biomedical intervention—circumcision, for example—investigators can more or less control both arms. One group gets the circumcision; the other doesn’t. It’s either yes or no, cut or uncut. No gray area.

Behavior change interventions, on the other hand, create gray areas. Take the example of a mass media campaign that encourages people to have fewer sexual partners. The first complication is that there’s no way to control who sees a billboard or hears a radio commercial; diffusion is inevitable.

Deanna Kerrigan, who directs a USAID R2P (Research to Prevention) project to evaluate HIV/AIDS intervention programs under way in several African countries, finds diffusion to be one of the many challenges in evaluating these types of interventions. “With a pill, you know—yes or no,” she says. “When you deal with a mass media campaign, it’s impossible to say this group got it, this group didn’t.”

There’s another wrinkle: many behavior change interventions are aimed at communities, not at individuals. (Individuals get circumcised; communities get the billboard.) And tracking outcomes in individuals is much more clear-cut than tracking outcomes on a village or town level.

Historically, the government agencies or NGOs that implement interventions such as media campaigns evaluate their own programs. Sometimes programs overlap, often inefficiently, with several interventions targeting the same people, and the effectiveness of the interventions is at the mercy of social factors such as migration or civil war. But, to avoid any perception of bias, independent evaluations should be made.

The goal of Kerrigan’s project is to provide objective, academic evaluation. She and colleagues will study dozens of interventions already under way in various villages and districts, which are implemented by dozens of different partners. Randomization is out of the question: in such complex situations, it’s impossible to conduct a conventional RCT. It’s messy, says Kerrigan, an associate professor in Health, Behavior and Society. “But this is real-world stuff,” she says. “We can’t slow down this train.”

Almost Gold?

The best she and her colleagues can hope for, says Kerrigan, is to identify solid control arms and conduct a valid observational study [see below]. But there is some help for the herculean task ahead for her and other researchers. Where randomization is not possible, recently introduced statistical tools such as propensity score matching may help close the gap between observed and randomized trials, compensating somewhat for the loss of comparability between study arms when randomization is not possible. (Propensity score matching is one of many statistical tools used to make intervention and control arms similar enough to allow fair comparisons.)

“We have statisticians bringing some observational studies very close to RCTs in terms of confidence in their findings,” says Goodman, MD, PhD. Adds Kerrigan, PhD, MPH, “We want the most rigorous design feasible.”

But Kay Dickersin, who directs the Center for Clinical Trials at the Bloomberg School, advises caution when relying on observational studies to determine the effectiveness of new interventions. “Certainly we should use whatever data we have. But we’ve seen major mistakes that make us shy about using observational data” to determine intervention effectiveness.

The hormone replacement therapy controversy is a good example: Many observational studies showed a cardiac benefit for post-menopausal women treated with estrogen plus progesterone; when the treatment was tested in the context of a large RCT, however, it showed no cardiac benefit, and perhaps even a higher risk of heart disease. “That trial was like, ‘Oops, we blew it.’ … I’m intrigued that we might be able to emulate RCT findings using observational studies and special statistics. I’d like to see studies validating the modeling alternative, comparing findings to those of RCTs,” says Dickersin, PhD, professor in Epidemiology. “As far as I know, it’s still an open question.”

Goodman notes that a recent reanalysis of the largest observational study on estrogen showed that the observational results were quite similar to the clinical trial’s.

What’s an Investigator to Do?

Albert Einstein is credited with saying, “Things should be as simple as possible, but no simpler.”

Ethicists believe that studies should be designed to accommodate ethical obligations, and investigators agree. Celentano thinks that a larger sample size would help: “With reduced risk in the control arm, the anticipated difference between the two arms is smaller than anticipated, and so you need a larger sample size to demonstrate that one arm’s intervention is more effective than the other. … But we can’t always afford a larger sample size.”

Goodman, MD, PhD, suggests that in some cases, an RCT might be overkill; sometimes what we learn from an observational study is good enough. Dickersin acknowledges that “randomized-controlled trials are hugely expensive, and they can only address one question at a time, usually over the short term, whereas with huge data sets you can address multiple questions and rare, longer term outcomes. [But] I think we better be very careful if we rely on observational data to determine the effectiveness of an intervention.”

There’s no right or simple solution for measuring the effectiveness of behavior change interventions in the field of HIV/AIDS prevention. The virus is wily, and the epidemic entrenched. What kind of evaluation is the best evaluation? Says Goodman, “What you’ll learn is defined by the purposes at hand. … You measure the risk, the cost, the stakes, the consequences of being wrong. … A clinical trial cannot be done in all situations, regardless of the stakes.”