Research > Research Themes > Policy Evaluation
PSC Researchers identified with this theme are listed below.
We have been operating under a signature theme Population processes and policy evaluation, and in our prior application observed that “[a]lmost all of the PSC research is informative for policy evaluation.” This is still true, but now we emphasize a strong organizational and intellectual current within the PSC that cuts across disciplines and Schools: the evaluation of social, economic, and health policies. Little policy evaluation is of population policy, since few countries have population policies per se. (Smith’s study of the Chinese family planning program and Watkin’s work on family planning policy in Kenya and Malawi are exceptions). But there are many policies whose effects are best evaluated at the population level. This is less a matter of scale than an acknowledgement of another form of population heterogeneity: that effects of treatments (policies) tend to vary across individuals within populations. This insight has led to a reorientation in thinking about how to measure program and policy effects.
At the theoretical level: Rosenbaum, in the Journal of the American Statistical Association and elsewhere, formalizes the role of reasoning about causal mechanisms in studies intended to judge the effects of a policy experiments; and has explained the impact of interference between units in randomized trials, an important, under-studied threat to causal inference. Berk is the long-time editor of Evaluation Review, and a proponent of increased use of experimentation in social policy evaluation; his work in the American Sociological Review and American Journal of Sociology on payments to exinmates and recidivism was the impetus for Smith’s Sociological Methodology paper on specification issues in experimentation; and his Journal of the American Statistical Association paper, with Penn criminologist L. Sherman, on randomized controlled trials for the policing of domestic violence is a classic. Wolpin has also been prominent in pointing to the untested or unremarked upon assumptions in a variety of “natural experiments,” including twin studies and birth date discontinuity designs (in the Journal of Economic Literature, with M. Rosenzweig). Behrman has for many years studied a host of human capital interventions across the life cycle in experimental and quasi-experimental settings, and his work integrating defensible outcome measurement with true cost for health and nutrition programs is part of the Copenhagen Consensus, in which the Danish government convened an Expert Panel to ask “What would be the best ways of advancing global welfare, and particularly the welfare of developing countries, [if] an additional $50 billion of resources were at governments’ disposal?” He is also on the Expert Panel for the “Consulta de San José” follow-up on Latin America sponsored by the Inter-American Development Bank.
Work on policy evaluation can be roughly organized along a continuum ranging from evaluation of existing policies and policy changes to the design of experiments and other interventions to test policies to the formal consideration of policies not yet implemented. Aiken and Clarke have NINR R01-supported work to study the impacts of changes in rules regarding nurse staffing ratios on patient mortality and labor force processes; comparisons over time and between CA and PA and NJ suggest that lower patient:nurse ratios do lead to better outcomes. The need for representative measurement across large populations has led to collaborative research with Smith, on mitigating non-response bias through intensive resampling of non-respondents; this work has large implications for a fundamental problem of population research: representativeness. In the Quarterly Journal of Economics, Stevenson uses variation in state implementation of divorce laws to show that access to a unilateral divorce increases the likelihood of cessation of abusive relationships. Jacobs, in Putting Poor People to Work, shows how 1990s “work first” policies have made college enrollment (human capital investment) more difficult for poor single mothers. D Culhane has published studies for Fannie Mae and HUD on homelessness and housing policy; these show, for example, that housing policy rules are more determinative of length of shelter stay in the amorphous homeless population than are family characteristics, including behavioral health problems. Hannum’s work in Gansu, China, a poor rural western region, details some of the unanticipated negative consequences of those “left behind” under the Chinese market reform policy.
PSC program scientists are active in the implementation and design of experiments for the evaluation of policy. Among studies not previously mentioned are J Culhane, whose Philadelphia Collaborative Preterm prevention project features randomized trials on interventions in high-risk pregnancies (she also wrote in the New England Journal of Medicine on why treatments of periodontal disease in poor pregnant women did not affect birth weight outcomes); Jemmott, whose work concentrates on “scaling up” interventions geared to improving sexual health behavior in minority communities, from the clinic to the community level; and Kohler, Chao, Watkins, and Behrman are working on a variety of incentive schemes for intervening in the Malawi HIV/AIDS epidemic; small payments, for example, are a cost-effective means of encouraging individuals to learn their serostatus. “Re-use” of experimental results in the understanding of differential effects accruing to population heterogeneity figures in several papers or projects, including Mandell’s plan for using phenotypic variation to an experimental response as a basis for seeking genetic differences in response to treatments for autism. Harknett (in Evaluation Review) used propensity score matching techniques to estimate non-experimental impacts on program participants within the context of an experimental research design. The receipt of an earnings supplement substantially increased union formation in one Canadian province but not the other. A subgroup analysis based on propensities of program participation revealed that the positive effect on unions was concentrated among relatively disadvantaged participants. Smith, in the Annals of the American Academy of Political and Social Science, on a township-level randomized trial involving new family planning policies, shows that the effect on contraceptive use could only be predicted given the relative wealth and political embeddedness of the counties participating.
The PROGRESA study in Mexico was a complex, village-level randomized schooling subsidy experiment in Mexico; Behrman and Todd have been among the 4 international expert advisors since the program’s inception. This led to a series of well-estimated “dosage-response” effects on parental fertility and income, and child’s grade attainment, labor force participation, and wages. In a paper in the American Economic Review, Todd and Wolpin ask whether a dynamic behavioral model of parental decisions about fertility and child schooling, estimated using data on families in the randomized-out control group and in the treatment group prior to the experiment (both of which did not receive any subsidy), could predict the effects of school subsidies according to the schedule that was implemented under the program. They could, and the advantage of a dynamic behavioral model concordant with the experimental results is that such a model, unlike an experiment, can be used to forecast long-term program impacts beyond the life of the experiment and to assess the impact of a variety of counterfactual policies. This is the goal of other studies that lack experimental results for validation. De Paula and Todd are using data from Massey’s Mexican Migration Project on payments made to coyotes, smugglers that transport migrants across the border, to infer the willingness-to-pay by potential immigrants in relation to the probability of success in border crossing. Inference is then made to potential migratory policies on both countries. Fernández-Villaverde’s dynamic stochastic general equilibrium models integrate formal modeling with empirical data to understand the dynamics of how society reacts to changing government policies, and to help the government to anticipate the consequences of its decisions; and Ríos-Rull’s use of macroeconomic modeling to evaluate policies, such as social security programs as insurance programs (this is not one of social security’s stronger features). Pauly, in American Economic Review, has sought to infer the health benefits relative to costs of extension of US insurance coverage; Mitchell works on similar issues regarding pension rules in a number of countries; and, their study (with Todd and Behrman) of the Chilean social prevention programs involves not only measurement of effects in Chile, but developing models inferring results elsewhere.
Vision for future. This theme is a great strength of the PSC and has emerged more or less “organically,” as thinking about observation, experimentation, and validation has evolved in a variety of fields. There are two main challenges: First, to maintain a fairly high level of rigor regarding what constitutes policy evaluation, since it is a conceit of researchers everywhere that all research is “policy-relevant.” We want to continue to focus on measurement at the population level that takes into account response heterogeneity. Second, we need to work on both consensus formation and understanding of differences in the application of evaluation methods such as randomized trials along the axis that runs from economics to statistics to medicine; at one end, there is a deep interest in heterogeneity and the modeling of process, at the other, a tendency to see randomized controlled trials as a “gold standard,” perhaps because successful innovations are likely to be applied universally across populations. Given existing collaborations across Schools, this is likely to continue as a strong integrating intellectual theme at the PSC. PSC scientists play a central role in Penn’s proposal for hosting a major new international developing country policy initiative, the International Initiative for Impact Evaluation (3IE), funded by the Gates and Hewlett Foundations and a number of governments. Half of the interdisciplinary inter-School Penn team are PSC program scientists (Aiken, Behrman, Preston).
PSC Research Associates involved in this area of research:
Aiken, Behrman, Berk, Chao, Culhane D, Culhane J, De Paula, Fernández-Villaverde, Hannum, Harknett, Jacobs, Jemmott, Kohler H-P, Mandell, Mitchell, Pauly, Polsky, Preston, Ríos-Rull, Rosenbaum, Smith, Stevenson, Todd, Watkins, Wolpin
PSC Students interested in this research area:
Appiah-Yeboah,
Bangha, Chae,
Chin,
Jennings,
Kuperberg,
Miranda,
Myrskylä, Ruther, Schott, Tesfai

