New Directions. Two research areas connected to existing strengths, with sufficient critical mass (projects and interest) to merit special PSC support during the new funding cycle are:
Population and the environment. Three projects are under way that are distinguished not only by a demographic or population perspective on some aspect of the environment, but by their multi-disciplinary character, hence new faces and new expertise being brought to the PSC. Hannum and MacDonald will coordinate an interest group within the PSC, integrate it within the Energy, Sustainability, and Environment initiative of SAS, and increase the PSC’s exposure in this area within the wider population community.
Air pollution and child health. Hannum has been working with Behrman and others (including PSC post-doc X Liu, I Blair from Dept Pharmacology, X Chen from School of Ed, H Chang and Y Liu from Emory, and Chinese colleagues) on a study of the impact of prenatal ambient air pollution on fetal and child development in South China. Birth cohort data are rare in China, and even rarer are samples from large geographic regions with wide variation in pollution exposure levels and other spatial characteristics. The project will collect prospective birth cohort data from a hospital system that spans cities across South China. Prenatal exposures will be determined by matching the sample to public air monitoring stations network data, and by the collection of non-invasive biomarkers. These data will permit investigation of the impact of air pollution exposure during the prenatal period on birth outcomes and early child development, as well as which investments by parents in the development and well-being of their children ameliorate (or exacerbate) the effects of the environmental air hazards. The study has received substantial Penn funding, including from the Penn-Wharton China Center and from the PSC, and an application is under review at NSF.
Socio-Spatial Carbon Collaborative. This project will establish a household-based, neighborhood-level carbon footprint database for the United States. This will be done via a novel, machine-learning approach to integrate household surveys (e.g., ACS, CES) with a variety of geo-spatial and environmental datasets. This will allow the estimation of GHG emissions footprints with unprecedented spatial, sectoral, and demographic resolution (e.g., gasoline carbon footprint of a low-income black household in Philadelphia). The technique clarifies how carbon moves spatially through the economy, built environment, and everyday life; how these processes intersect with a range of social and spatial inequalities that also shape well-being; and how exposure to potential carbon-pricing would impact different communities. Cohen has received Penn funding from the PSC, the Kleinman Ctr for Energy Policy, the Fels Policy Research Initiative, and the Perry World House; and is being supported by the PSC for an application to NSF.
Built environment and well-being. Multiple theories posit that visible, environmental disorders, such as abandoned buildings, lead to community decline by signaling that a community is uncared-for, incivilities are tolerated, and the ability of residents to engage in shared expectations of social control over neighborhood problems is eroded. As a result, residents are prevented from engaging in positive health behaviors while unhealthy behaviors, such as substance abuse and violence, become sheltered and more prevalent. MacDonald, with former Penn colleague C Branas (Columbia) and numerous others in SAS and in the Schools of Medicine and Design, has shown how vacant and abandoned properties are associated with drug- dependence, firearm violence, stress, sexually transmitted diseases, and premature mortality (e.g., AJPH, 2016). Under R01 AA024941, random assignment of abandoned houses across Philadelphia to a standard, reproducible remediation protocol is being undertaken to study the effects of such policies on ameliorating morbidity (including stress), mortality, and disability related to substance abuse and violence.
“Big Data” and population science. It is not a matter of the PSC moving toward “big data.” “Big data” is (are?) coming to (and at) us. Some of the issues are technical issues, which is why we are building the organizational responsibility for this initiative into the CIT Core. However, technical issues may be hiding more fundamental ones. Suppose that “big data” is defined as data with more observations and/or variables than can be effectively handled with conventional hardware and/or software. Currently, doing machine learning (ML) with 500,000 observations and 200 variables will take days a very fast desktop using R. With new R software, it would take much of a day. But on a cluster with 100 CPUs, maybe an hour—it’s all relative to existing technology, and technology is changing rapidly. In the meantime, why use big data when you can sample? Because (a) many ML procedures are sample size dependent—more data more accurate—in a way conventional inference and estimation are not, and (b) you can work on relatively rare cases, as with Berk’s path-breaking work on forecasting individual criminal behavior. How does this fit with the population sciences, whose intellectual origins are in using large amounts of information to detect statistical regularities between and within groups? There are many deep questions raised by ML which show up in different ways across the “big data” ambit as it is framed within the population sciences. To what extent are the population sciences only (or primarily) about accurate prediction? If prediction is sufficiently good, does “explanation” not count and if not, why not? What is ML telling us about the “truth” of the various functions (X) that we routinely specify and estimate? What does ML reveal about the ignorance (or wisdom) built into conventional models, and what does ML tell us about conventional statistical inference? Further: the choice of loss function can have important ethical considerations. What are they and what do we do about that? Errors in data are not just about misinformation. The nature of that misinformation can have ethical (and legal) consequences. What role should judgment and subject-matter expertise have in a statistical analysis? Other fundamental questions start from different tacks (e.g., what are the populations that administrative data represent [and when does this matter]?); and there are extensive procedural issues as well (data linkage, data availability, data governance, and data confidentiality). As with the environment and population, the aim during the upcoming cycle is not to make a PSC “course correction,” but rather to use the centrality of the PSC to create more positive externalities with respect to new activities of many Research Associates; and also to take advantage of growing connections with other fields (e.g., Berk and Fernández-Villaverde, the PSC Research Associates most involved in the technical and theoretical aspects of ML and “big data,” and who will be sponsoring knowledge sharing activities, are both working with Penn colleagues in computer science.)