One of the best ways to quantify the causal effects of a given intervention is through a randomized controlled trial. This approach is considered the gold standard for testing new drugs and vaccines.
It is also frequently used in online settings. For example, to test whether the new version of an app performs better than the old one, two sets of users are shown the different versions and the results are later compared. This is called A/B testing and is one of the most basic forms of a randomized controlled experiment.
In certain scenarios, however, simply observing the outcomes of the different groups doesn’t give you the full picture. Indeed, individuals may be deliberately selecting themselves into a group, causing what it’s known as a self-selection bias, or other external factors may be impacting the results, making it trickier to estimate the real causal effect of the treatment.
For instance, to analyze the impact of vaccination on a person’s risk of infection, one important factor to consider would be whether the people around that individual were also vaccinated. “Suppose you are immunocompromised and you don’t take the vaccine, but everyone around you does. In that case, you would still get a reduced probability of getting infected,” says Alexandre Belloni, a professor of Decision Sciences at The Fuqua School of Business at Duke University.
Belloni and his colleagues are exploring a new approach to estimate the effect of interventions or treatments in scenarios like this—when individuals can self-select into getting the treatment and the treatment of one individual may affect the outcome of others as well.
“It is a very hard problem because one needs to consider how individuals interfere with each other and you can have a very rich set of patterns of interference but not all of them are going to be useful to learn the impact of interest,” Belloni says. In the recent paper “Neighborhood Adaptive Estimators for Causal Inference under Network Interference,” Belloni and his co-authors—Fei Fang, a post-doc at Yale University, and Alexander Volfovsky, an assistant professor of statistical science at Duke University’s Trinity College of Arts & Sciences—explore how to recover good estimates of the impact of the intervention alone when the interference propagates through a network formed by these individuals (also known as network interference).
With the connectivity between individuals increasing, there is a growing interest on recovering the causal impact of decisions under interference. In those situations, even a standard randomized A/B testing would be biased, and it gets even more complicated when self-selection bias is present.
Most causal studies that allow network interference consider randomized experiments and that interference depends only on their immediate neighbors, not on the whole network. “That was a very natural starting point to capture a first order effect of interference: you look at the neighbors and, based on that, you define a pattern. And you can use that to control for the amount of interference observed as some individuals of the network that experience similar patterns received the treatment and others did not,” Belloni says.
Only very recently a stream of work started to explicitly consider interference of units that are not directly connected (through propagation). In the new paper, the authors consider that the interference may come not only from immediate neighbors but also from neighbors that are two or more steps away in a network. They also consider that the intensity of the interference may change across the network. “In this approach, I want to be more flexible on my definition of neighbors and I want to learn the relevant patterns of interference from the data… how far I need to look might depend on the local configuration of treatments,” Belloni says.
Going back to the vaccine example, if all of your immediate neighbors were vaccinated, maybe that’s the only information that matters in your experiment. But if no neighbor was vaccinated, maybe you need to go deeper into the network and look at the neighbors of your neighbors. If in that group no one was vaccinated either, then you would need to look further, at the neighbors of the neighbors of your neighbors, and so forth.
“I want the data to tell me which patterns matter,” says Belloni. “So, we would not start with a fixed radius. Because every node now becomes the center of its own neighborhood, we want to find how deep we need to go for the given configurations of treatment assignments in my neighborhood while balancing the additional information that going deeper is bringing.” Belloni explains.
The main contribution of the research, he says, is “to allow for this adaptive setting where the radius of the network is determined by the data. We further derive a machine learning method that can achieve a near optimal balance when estimating this individual dependent radius and allowing for the construction of confidence intervals that are useful in applications.”
The machine learning model developed by Belloni and his colleagues may be applied to different settings where this type of network interference is observed. The impact of vaccination on infection probability is one of these cases. Another example is educational experiments performed in schools where an intervention is assigned to a group of students and their behavioral change has the potential to “interfere” with their colleagues. Experiments on social media, where a group of users is selected to see a certain message that may lead to a behavioral change, could also benefit from this approach.
Applying machine learning to causal inference
Belloni’s main research interests in recent years have been on econometrics, machine learning, statistics and mechanism design. “Particularly, how to blend machine learning with causal inference and to determine which variables are important to be included in a model,” he says.
He is a Co-Area Editor for the Machine Learning and Data Science area of the journal Operations Research, one of the most respected publications in the field. Since 2019, Belloni is also an Amazon Scholar, a program where academics apply their research methods to help Amazon solve complex technical challenges without leaving their academic institutions. He has been studying problems related to mechanism design—another research area he is interested in—at Amazon’s Supply Chain Optimization Technologies organization. He notes that there is a growing interest from employers in machine learning applied to business. “As an Amazon Scholar, I can attest that Amazon really values the use of machine learning, causal methods, and analytics in their decision making,” he says. “There is a value from the employer side and many students are recognizing that.”