28 Interference and Spillover Effects
Most of the causal-inference machinery in this book rests, often silently, on the assumption that one unit’s treatment does not affect another unit’s outcome. That assumption is convenient and frequently false. When a vaccinated person lowers an unvaccinated neighbor’s risk of infection, when a price promotion to one customer cannibalizes a friend’s purchase, when a deworming program in one school reduces transmission in a nearby school, or when a regulatory shock to one bank propagates through interbank exposures, the outcome of a unit depends on the treatment status of others. This phenomenon goes by two closely related names. Statisticians tend to call it interference, meaning a violation of the no-interference half of the Stable Unit Treatment Value Assumption. Economists and applied researchers more often call it spillover, peer effect, externality, or contagion. The vocabulary differs but the structural problem is the same, and the goal of this chapter is to make that problem precise, to show what is and is not identified once we admit it, and to connect the experimental theory to the quasi-experimental and observational designs developed elsewhere in the book.
The chapter proceeds in three movements. First, it defines interference and the SUTVA assumption it violates, then rebuilds the potential-outcomes framework to accommodate it, introducing exposure mappings as the central device for taming an otherwise unmanageable number of potential outcomes. Second, it develops the partial-interference designs that make estimation tractable, principally the two-stage randomized designs of Hudgens and Halloran (2008) and the direct, indirect, total, and overall effects they identify, together with the design-based estimators for general experiments and the long-standing identification problems in peer-effects regressions associated with Manski (1993). Third, it turns to interference in quasi-experimental and observational work, where spillovers contaminate difference-in-differences control units, threaten the exclusion restriction in instrumental-variables designs, and undermine the stable-comparison logic of matching. This chapter is cross-referenced from the discussion of modern concerns in difference-in-differences (Section 37.13) and from the quasi-experimental foundations (Section 33), and it is meant to be read as the place where the no-interference assumption invoked there is examined head on.
28.1 SUTVA and the Meaning of No Interference
The Stable Unit Treatment Value Assumption, formalized in Rubin (1980) and treated at length in Imbens and Rubin (2015), bundles two requirements. The first is no interference: a unit’s potential outcomes depend only on its own treatment assignment, not on the assignments of other units. The second is no hidden variation in treatment: there is a single, well-defined version of each treatment level, so that “treated” means the same thing for every unit. Both halves are substantive, and both are routinely violated in social science, but this chapter concerns the first.
To see precisely what no interference buys us, write the assignment vector for a population of \(N\) units as \(\mathbf{D} = (D_1, \dots, D_N)\) with each \(D_i \in \{0,1\}\). In full generality, unit \(i\)’s outcome is a function of the entire vector, \[ Y_i = Y_i(\mathbf{D}) = Y_i(D_1, D_2, \dots, D_N). \] With \(N\) binary treatments there are \(2^N\) possible assignment vectors, hence up to \(2^N\) potential outcomes per unit. This is the combinatorial explosion that makes interference hard: even with a modest sample the number of potential outcomes vastly exceeds the number of observations, and nothing is identified without further structure.
The no-interference assumption collapses this dependence to the unit’s own treatment, \[ Y_i(\mathbf{D}) = Y_i(D_i), \] restoring the familiar pair \(Y_i(1), Y_i(0)\) and the unit-level effect \(Y_i(1) - Y_i(0)\). Observed outcomes satisfy \(Y_i = Y_i(D_i)\), and the average treatment effect \(E[Y_i(1) - Y_i(0)]\) is the estimand that completely randomized assignment identifies through a simple difference in means. Everything downstream in the experimental and quasi-experimental chapters inherits this reduction. When it fails, the reduction fails with it, and we must work with potential outcomes that respond to the assignments of others.
It helps to distinguish three regimes that the literature treats quite differently. Under no interference the own-treatment reduction holds exactly. Under arbitrary (general) interference, \(Y_i\) can depend on the full vector \(\mathbf{D}\) with no restriction, which is the most honest assumption and also the least identifiable without a design that supplies replication. Between these lies partial interference, introduced by Sobel (2006) and Hudgens and Halloran (2008), under which the population partitions into groups (households, classrooms, villages, markets) such that interference may operate freely within a group but not across groups. Partial interference is the workhorse assumption of the field because it delivers, within each group, the independent replication needed to estimate spillover quantities, and most of the design theory below presumes it.
28.2 Potential Outcomes Under Interference and Exposure Mappings
Once we admit that \(Y_i\) depends on others’ assignments, we need a language for the kinds of effects that become meaningful. Following the draft framework and the formal development in Hudgens and Halloran (2008), write the assignment of all units other than \(i\) as \(\mathbf{D}_{-i}\), so that \(Y_i = Y_i(D_i, \mathbf{D}_{-i})\). Two conceptually distinct contrasts emerge.
The direct effect holds the rest of the world fixed at some configuration \(\mathbf{d}\) and toggles only unit \(i\)’s own treatment, \[ \tau^{\text{direct}}_i(\mathbf{d}) = Y_i(D_i = 1, \mathbf{D}_{-i} = \mathbf{d}) - Y_i(D_i = 0, \mathbf{D}_{-i} = \mathbf{d}). \] This is the closest analogue to the classical treatment effect: the change in \(i\)’s outcome from its own treatment, for a given pattern of others’ treatments.
The spillover (indirect) effect holds \(i\)’s own treatment fixed and changes the configuration of others, \[ \tau^{\text{spill}}_i(d; \mathbf{d}, \mathbf{d}') = Y_i(D_i = d, \mathbf{D}_{-i} = \mathbf{d}) - Y_i(D_i = d, \mathbf{D}_{-i} = \mathbf{d}'), \] capturing how much \(i\)’s outcome moves when the surrounding pattern shifts from \(\mathbf{d}\) to \(\mathbf{d}'\) while \(i\)’s own status stays put. The total effect combines the two, comparing being treated in a treated environment with being untreated in an untreated environment.
The practical difficulty is that \(\mathbf{d}\) and \(\mathbf{d}'\) range over an astronomically large space, so these contrasts are not directly usable. The key simplifying device, developed rigorously by Aronow and Samii (2017), is the exposure mapping. An exposure mapping is a function \(f(\mathbf{D}, \theta_i)\) that reduces the high-dimensional assignment vector, together with unit-specific attributes \(\theta_i\) such as a unit’s position in a network, to a low-dimensional exposure value. The substantive assumption is that potential outcomes depend on \(\mathbf{D}\) only through this exposure: if two assignment vectors induce the same exposure for unit \(i\), they induce the same potential outcome. Common choices include an indicator for whether any network neighbor is treated, the count or fraction of treated neighbors, or a binary “treated versus untreated” exposure that simply recovers the classical setup. Aronow and Samii (2017) show that, given a correctly specified exposure mapping and a known randomization, one can construct Horvitz-Thompson and Hajek estimators of average potential outcomes under each exposure level, with variance estimators that account for the dependence the design induces across units. The honesty of this approach is that the exposure mapping is an explicit, falsifiable modeling assumption rather than a hidden one, and misspecifying it (for example, assuming only first-order neighbor effects when second-order effects are present) reintroduces bias in a way that can in principle be probed.
The graph-cluster randomization of Ugander et al. (2013) is a complementary engineering response to the same problem: by assigning treatment to coarse network clusters rather than to individuals, the design increases the probability that a unit and all of its neighbors share a treatment status, which makes a “fully treated neighborhood” exposure occur often enough to be estimated. The general lesson is that under interference the design and the estimand must be chosen together, because the randomization scheme determines which exposure contrasts are estimable at all.
28.3 Partial Interference and Two-Stage Randomized Designs
The most fully developed identification results assume partial interference and a hierarchical, two-stage randomization. The canonical framework is Hudgens and Halloran (2008), published in the Journal of the American Statistical Association, which defines a quartet of causal estimands that has become standard, with formal extensions and inference in Tchetgen Tchetgen and VanderWeele (2012) and a unifying mediation-style account in VanderWeele et al. (2012).
The design has two stages. In the first stage, groups (clusters) are assigned to one of several allocation strategies, where a strategy specifies the probability with which individuals within the group will be treated. In the second stage, individuals within each group are assigned treatment according to their group’s strategy. A concrete contrast is a high-coverage strategy, under which most group members are treated, versus a low-coverage strategy, under which few are. This nested randomization is exactly what supplies the variation needed to separate own-treatment effects from neighborhood effects, because it varies the treatment intensity of a unit’s environment while still randomizing the unit’s own status.
Within this design, define average potential outcomes under a given allocation strategy by averaging over both the individual’s own treatment and the random assignment of others in the group consistent with that strategy. The four effects are then differences of these averages.
The direct effect compares treated versus untreated individuals holding the group’s allocation strategy fixed. It isolates the effect of a unit’s own treatment within a given environment, and it is the interference-aware analogue of the classical average treatment effect.
The indirect (spillover) effect compares untreated individuals under a high-coverage strategy with untreated individuals under a low-coverage strategy. Because these individuals are themselves untreated in both arms, the contrast is purely the effect of the surrounding environment, which is the clean operationalization of spillover.
The total effect compares treated individuals under the high-coverage strategy with untreated individuals under the low-coverage strategy. It is the sum of a direct and an indirect component and answers the question of how much better off a treated unit in a heavily treated environment is relative to an untreated unit in a lightly treated one.
The overall effect compares the average outcome in a group under the high-coverage strategy with the average outcome under the low-coverage strategy, integrating over the treated and untreated members in their strategy-implied proportions. It is the policy-relevant summary for a planner choosing a coverage level for an entire group.
These four estimands clarify a point that the introductory draft made informally. The choice of randomization scheme should be driven by which effect the researcher cares about. If only the direct effect is of interest and spillovers are a nuisance to be averaged out, individual-level complete randomization suffices, because a difference-in-means estimator under complete randomization recovers an average direct effect by averaging over the realized configurations of others. If the total effect is the target and partial interference holds at the cluster level, a cluster-randomized design, in which whole clusters are assigned to treatment or control, identifies it directly, because comparing fully treated clusters with fully untreated clusters bundles direct and spillover effects together. Only when the indirect effect itself, the spillover holding own treatment fixed, is the object of inquiry does one need the full two-stage design, because only the two-stage design varies neighborhood treatment intensity while independently randomizing own treatment. Baird et al. (2018) study the optimal allocation of units across the two stages, showing how the relative precision of the four effects depends on the number of clusters, the cluster size, and the saturation levels chosen, and providing power-calculation tools for partial-population (saturation) experiments of this kind. Sobel (2006) gives an early and influential demonstration, in the context of a housing-mobility experiment, that ignoring interference can make a difference-in-means estimator a biased and substantively misleading estimate of any single well-defined effect.
A short simulation makes the bias from ignoring spillover concrete. Consider a cluster-structured population in which a unit’s outcome rises with its own treatment and also with the fraction of treated peers in its cluster. A naive analyst who regresses the outcome on own treatment alone, ignoring the peer term, recovers something between the direct effect and the total effect, depending on how treatment correlates with peer exposure under the design.
set.seed(2026)
n_clusters <- 200
cluster_size <- 10
n <- n_clusters * cluster_size
cluster <- rep(seq_len(n_clusters), each = cluster_size)
# True structural parameters.
tau_direct <- 1.0 # own-treatment effect
tau_spill <- 2.0 # effect of cluster-level treated fraction
# Individual-level complete randomization within each cluster,
# but with cluster-level saturation drawn at random (two-stage flavor).
saturation <- rep(runif(n_clusters, 0.1, 0.9), each = cluster_size)
D <- rbinom(n, size = 1, prob = saturation)
# Treated fraction among the OTHER members of the cluster (leave-one-out).
treated_in_cluster <- ave(D, cluster, FUN = sum)
peer_frac <- (treated_in_cluster - D) / (cluster_size - 1)
# Outcome with direct effect plus genuine spillover.
Y <- 0.5 + tau_direct * D + tau_spill * peer_frac + rnorm(n, sd = 1)
# Naive estimator: own treatment only.
naive <- coef(lm(Y ~ D))["D"]
# Interference-aware estimator: own treatment and peer exposure.
aware <- coef(lm(Y ~ D + peer_frac))[c("D", "peer_frac")]
round(c(naive_direct = unname(naive),
aware_direct = unname(aware["D"]),
aware_spill = unname(aware["peer_frac"])), 3)
#> naive_direct aware_direct aware_spill
#> 1.436 0.969 2.111The naive coefficient is biased relative to the true direct effect of one because own treatment is positively correlated with peer exposure under the random saturation, while the specification that conditions on peer exposure recovers both the direct effect near one and the spillover near two. The example is deliberately simple, with a correctly specified exposure (the leave-one-out treated fraction); the lesson generalizes, but so does the warning that the recovery depends on getting the exposure mapping right.
28.4 Peer Effects and the Reflection Problem
The experimental designs above achieve identification by manufacturing exogenous variation in a unit’s environment. In observational data, where the environment is chosen rather than assigned, identifying spillover as a causal peer effect is far harder, and the central obstacle was named by Manski (1993): the reflection problem. Manski distinguishes three reasons a person’s outcome may move with the average outcome of a reference group. Endogenous effects arise when a person’s behavior responds to the behavior of the group, which is the genuine peer effect of interest. Contextual (exogenous) effects arise when behavior responds to the group’s fixed characteristics rather than its behavior. Correlated effects arise when group members behave similarly because they face common environments or because similar people sort into the same group. The reflection problem is that, in the standard linear-in-means model, the endogenous effect cannot be separated from the contextual effect, because mean outcome and mean characteristics move together in a way that is collinear, much as a person and their mirror image move in lockstep.
The implication is that a regression of an individual outcome on the group-average outcome, however natural, generally does not identify a causal peer effect. Moffitt (2001) develops the consequences for policy analysis and shows how partial population (saturation) interventions, which exogenously vary the share of a group exposed to treatment, can break the impasse precisely because they supply variation in group behavior that is not a deterministic function of group characteristics. This is the observational counterpart of the two-stage experimental logic above. Bramoullé et al. (2009) show that the reflection problem is not insurmountable when peer groups are defined by a social network rather than by a single shared reference group: when two individuals are linked but one’s contacts are not all the other’s contacts, the network’s intransitivity furnishes instruments (the characteristics of a friend’s friends who are not one’s own friends) that separate endogenous from contextual effects under stated conditions. Lee (2007) develops related identification results for the linear-in-means model when group sizes vary, using the resulting nonlinearity to achieve identification, and Goldsmith-Pinkham and Imbens (2013) bring this network-econometrics apparatus to social and economic applications while emphasizing how fragile the conclusions are to the assumed network and to unmodeled correlated effects.
Two cautions deserve emphasis because they recur throughout applied work. The first is endogenous sorting: people choose their peers, so similarity in outcomes may reflect selection into groups rather than influence within them, and no amount of within-group regression cures a selection problem that operates on the group-formation margin. The second is the sensitivity of network-based identification to the measured network. Links are typically observed with error, the relevant network may differ from the measured one, and small changes in who is assumed connected to whom can move estimates substantially. The credible studies in this literature lean on exogenous group assignment, randomized saturation, or plausibly exogenous network variation rather than on functional form alone.
28.5 Network and Connectedness Measures
When interference operates through a network of economic linkages rather than through a designed experiment, a separate tradition measures the structure and intensity of spillovers directly from observed comovement. The connectedness framework of Diebold and Yilmaz, set out in Diebold and Yilmaz (2009) and refined in Diebold and Yilmaz (2012) and Diebold and Yılmaz (2014), builds spillover measures from the forecast-error variance decomposition of a vector autoregression. The idea is that if a shock to one variable, say one financial institution’s return volatility, explains a large share of the forecast-error variance of another, then the first is connected to, and spills over onto, the second. Aggregating these pairwise shares yields a total connectedness index summarizing system-wide spillover, while row and column sums give directional measures of how much each unit transmits to and receives from the rest of the system, and the difference between a unit’s transmitted and received spillovers is its net connectedness.
Diebold and Yılmaz (2014) reframe these variance-decomposition quantities explicitly in the language of network topology, with the connectedness matrix playing the role of a weighted, directed adjacency matrix and the directional measures corresponding to node-level in- and out-degrees. This vantage point links the time-series spillover literature to the network-interference perspective of the rest of the chapter: in both, a weighted directed graph encodes who affects whom, and the questions of interest concern how shocks or treatments propagate along its edges. The financial-economics applications of this apparatus, to equity and volatility spillovers across institutions and across markets, are well developed, and the framework is attractive precisely because it requires only observational time-series data and a VAR rather than an experiment. The cost is the corresponding caution: because identification of the variance decomposition can depend on the ordering of variables in the VAR, connectedness measures describe predictive and dynamic association rather than the manipulation-based causal effects that the experimental sections of this chapter target, and they should be interpreted accordingly.
28.6 Interference in Quasi-Experimental and Observational Settings
The experimental theory above assumes the analyst controls assignment. The harder and more common situation is that interference contaminates a quasi-experimental design built on the no-interference assumption. This section, which the difference-in-differences and quasi-experimental chapters cross-reference, collects the main threats and the remedies that the published literature has developed.
28.6.1 Spillovers Contaminating Difference-in-Differences Control Units
Difference-in-differences identifies a treatment effect by differencing the post-pre change in treated units against the post-pre change in control units, under a parallel-trends assumption (Section 37.13). That logic presumes the controls are untouched by the treatment. When treatment spills over onto controls, for example when an intervention in one region draws customers, patients, or workers from a neighboring region used as a control, the control change no longer measures the counterfactual trend. The contamination biases the estimate, and the direction of the bias depends on the sign of the spillover. A positive spillover onto controls makes the treated-minus-control difference understate the true effect, because the control group improves partly because of the treatment; a negative spillover (displacement or substitution onto controls) makes it overstate the effect.
The remedy in the spatial-economics tradition is to model the spillover rather than to assume it away. A common approach estimates the treatment effect together with the geographic reach of spillovers by partitioning the control units into rings of increasing distance from treated units, including ring indicators in the regression, and using only the most distant, plausibly unaffected units as the true comparison group. The fitted ring coefficients trace out how the spillover decays with distance, which both purges the main effect of contamination and turns the spillover itself into an object of interest. The broader message is that a credible DiD under suspected interference does not simply assert clean controls; it either selects controls far enough from treatment that contamination is negligible or it explicitly models the spillover gradient and reads the treatment effect off the uncontaminated tail.
28.6.2 Spillovers as a Threat to the Exclusion Restriction in IV
Instrumental variables identify a causal effect under an exclusion restriction: the instrument affects the outcome only through the treatment. Interference quietly breaks this. If unit \(i\)’s instrument shifts unit \(i\)’s treatment, and unit \(i\)’s treatment then affects unit \(j\)’s outcome through a spillover, then \(j\)’s instrument is no longer excludable from \(i\)’s outcome, because instruments assigned to \(j\)’s neighbors move \(j\)’s outcome through their treatment uptake. In an encouragement design, encouragement assigned to one person can raise a friend’s take-up or directly change the friend’s outcome, so the standard local-average-treatment-effect interpretation, which presumes each unit’s outcome responds only to its own treatment and instrument, no longer holds. The two-stage randomized encouragement designs of the partial-interference literature are in part a response to exactly this: by randomizing encouragement at two levels they make the spillover an estimand rather than a violated assumption. In observational IV work the practical implication is that one must argue, not assume, that the instrument has no cross-unit pathway to the outcome, and that the relevant units are isolated enough (geographically, socially, or in market terms) that one unit’s instrument does not move another’s outcome.
28.6.3 Spillovers as a Threat to Matching
Matching and other selection-on-observables methods (Section 33) estimate a treatment effect by comparing treated units to observationally similar control units, invoking SUTVA so that a matched control’s outcome stands in for the treated unit’s untreated counterfactual. Interference undermines this in two ways. First, if treated units exert spillovers on nearby control units, the matched controls are partially treated, so the comparison shrinks toward zero just as in the DiD case. Second, a unit’s appropriate counterfactual under interference is not merely “the same unit untreated” but “the same unit untreated in a comparable treatment environment,” and the standard matching estimand has no place for the environment. Credible practice therefore matches not only on individual covariates but on exposure-relevant features of a unit’s neighborhood, or restricts the control pool to units whose neighborhoods are genuinely untreated, so that the matched comparison holds the spillover environment fixed rather than averaging over an uncontrolled mixture of environments.
28.6.4 A General Stance
Across all three designs the recurring theme is that no-interference is an identifying assumption on the same footing as parallel trends, exclusion, or ignorability, and it deserves the same scrutiny. The disciplined responses are to design or select the comparison so that interference is plausibly absent (isolated controls, distant units, separated markets), to model the spillover explicitly and recover the uncontaminated effect from its tail or its gradient, or to redefine the estimand in interference-aware terms (direct, indirect, total, overall) and adopt a design that identifies the target. Empirical work that takes interference seriously, from the deworming externalities of Miguel and Kremer (2004) in development economics to spillovers in finance and management documented by Elenev et al. (2024) and Roche et al. (2024), consistently finds that the spillover is not a nuisance to be assumed away but often the most economically interesting part of the answer.
28.7 Summary
Interference is the rule rather than the exception in social science, and treating it as a violation to be assumed away discards both validity and substance. The no-interference half of SUTVA, once relaxed, forces potential outcomes to depend on the full assignment vector, and exposure mappings are the device that makes this dependence estimable by reducing it to a low-dimensional summary. Partial interference and two-stage randomized designs identify a coherent quartet of direct, indirect, total, and overall effects, and the choice among individual, cluster, and two-stage randomization should follow from which of these the researcher wants. In observational peer-effects work the reflection problem warns that group-average outcomes do not by themselves identify influence, and network structure or randomized saturation is needed to separate endogenous from contextual and correlated effects. Connectedness measures built from variance decompositions quantify spillover in time-series and financial systems where experiments are impossible, at the cost of a predictive rather than manipulation-based interpretation. Finally, interference is a first-order threat to the quasi-experimental designs that anchor the rest of the book: it contaminates difference-in-differences controls, breaks the IV exclusion restriction across units, and corrupts matched comparisons, and in each case the remedy is to confront the spillover by design, by explicit modeling, or by redefining the estimand, rather than to assume it does not exist.