True experimental research designs offer the most plausibly unbiased estimates of a treatment effect, but experiments are often infeasible or are frequently too expensive. Thus, we are often relegated to using observational data. Identifying the causal impact of some “treatment” variable T on a dependent variable Y, however, is very challenging when one is using observational data. Hume (1748, Sec. VII) specified the requirements for a causal inference, arguing that, “We may define a cause to be an object followed by another, and where all the objects similar to the first, are followed by object similar to the second. Or, in other words, where, if the first object had not been, the second never had existed.” In modern language, we need to demonstrate that Y occurs if and only if T occurs. Observational studies, or quasi-experiments, where the researcher neither manipulates T nor does the researcher control the assignment of T to the treated and control group(s), cause various problems. This could mean that the treatment may not be the same for all units and the level of the treatment may be confounded with the reaction of the unit to the treatment or the reaction of a unit to the level of treatment to another unit or violations of stability or spillover are again important here, as well as ignorable treatment assignment, which is the equivalent of randomization. Nevertheless, estimating the effectiveness of a specific policy, program, rule, law, or constitution is the goal of much of the empirical research in political science, public policy, and law. In general, estimating effect of some “treatment” variable is the explicit or implicit goal of much research.
Rosenbaum (2002): An observational study concerns treatment effects. A study without a treatment -- often called an intervention or a program -- is neither an experiment nor an observational study. Most public opinion polls, forecasting of financial returns, investigations of fairness and discrimination, and many other important empirical studies are neither experiments nor observational studies. Observational studies can employ data from nonexperimental, nonobservational studies as long as the focus is on assessing a treatment.
We will begin with a discussion of the Neyman-Rubin-Holland counterfactual framework and will begin with the Rubin Causal Model. We will relate the Rubin Causal Model to the Campbell Causal Model, the Heckman Selection Model, the Pearl Causal Model, and the White Causal Model to analyze both similarities and differences. As Shadish, Cook, and Campbell (2002:pp 13-14) described:
Quasi-experiments share with all other experiments a similar purpose -- to test descriptive causal hypotheses about manipulable causes -- as well as many structural details, such as the frequent presence of control groups and pretest measures, to support a counterfactual inference about what would happen in the absence of treatment. But, by definition, quasi-experiments lack random assignment. Assignment to conditions is by means of self-selection ... or means of administrator selection ... others decide which persons should get which treatment.
The Rubin Causal Model (RCM) begins with the fundamental problem of causal inference (FPCI). The RCM assumes that each unit being studied has two potential outcomes: one if the unit is treated and the other if untreated. A causal effect is defined as the difference between the two potential outcomes. The problem lies in that only one of the two potential outcomes is observed. Rubin and others developed the model into a general framework for causal inference with implications for observational research. FPCI is fundamentally a missing data problem and all of the techniques discussed in this course are research designs, given the study, that allow us to come up with a proxy value for the missing data.
We will have plenty of topics and not much time, only 20 hours spaced out over the fall 2013 and spring 2014 semesters (8/15; 8/16; 8/17; 8/19; 8/20; 8/21 in 2013 and 2/15; 2/17; 2/18; and 2/19 in 2014). Beyond a basic discussion of the scientific method, construct validity (and how to demonstrate it); causality (and how to demonstrate it), and the various models used to derive estimates that allow for causal inferences; counterfactuals and the potential outcomes framework; RCTs and randomization inference; propensity scores; matching estimators; instrumental variables; selection models; time-series-cross-sections; panel studies; event studies; regression discontinuity designs; selection bias and error bounds; and synthetic controls. We will relate the classic threats to internal validity from the Campbell Causal Model to the Rubin Causal Framework: ambiguous temporal precedence; selection; history; maturation; regression; attrition (or mortality); testing; instrumentation; as well as interactions of selection with the other threats.
To build a science the contributors to that science must present results that are replicable and reliable. The research behind any project must be transparent. Hence, we will spend a good part of the term discussing good research conduct, including matters of construct validity. Construct validity is all but ignored in political science and economics. There are some very widely-used data sets, from the Penn Tables (find Penn Tables here), the World Bank’s Development Data (find World Bank Data here), the Correlates of War dataset (find COW here), the World Justice Project (find the WJP here), Professor Keith Poole Voteview (find Poole’s Web Server here), and so on that are used widely in political science and economic studies. All of these data sets are commendable for providing documentation that, if one has the resources to carry out the project, it may be possible to replicate the original data sets. This is rare in any science.
The problem then lies in that many scholars modify the downloaded data, without regard to the definition of the construct used by those who collected the data nor with any concrete knowledge of how the data was collected. Often, scholars combine variables from two, three, or more studies. Further, variables are stitched together, by combining the indexes to create larger, master, indexes, as is done with indexes such as done in creating most of the indexes of the Rule of Law, Civil Rights, or Democracy, or as in other fields, such as accounting, and finance. We see scholars in the latter fields creating firm governance indexes with up to 39 separate variables, downloaded from the “WWW-Oracle.” The WJP, for example, collected data on 9 factors (and 48 sub-factors, as well as sub-sub-factor): limited government powers; absence of corruption; order and security; fundamental rights; open government; regulatory enforcement; access to civil justice; effective criminal justice; and informal justice (from http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1966257). The WJP itself combines 400 categorical variables derived from polling 66,000 people the world over together with a dataset of questions given to 2,000 local experts in the same countries. These indexes, if not constructed correctly, clearly violate the tenants of construct validity. As such, there are five tests for construct validity and we will spend some time applying these, retroactively, to the above listed data sets. Each student will be required to send their favorite dataset to the class to give a presentation on it, with demonstrations of how it has been used in the literature.