COURSE CALENDAR
Research Design for Causal Inference in Observational Studies and Experimental Settings, Syllabus August 2013 and February 2014
Mathew D. McCubbins Department of Political Science, Duke University 
Email: mathew.mccubbins@duke.edu Skype: m.d.mccubbins 
Google+: 

Course Description:
True experimental research designs offer the most plausibly unbiased estimates of a treatment effect, but experiments are often infeasible or are frequently too expensive. Thus, we are often relegated to using observational data. Identifying the causal impact of some “treatment” variable T on a dependent variable Y, however, is very challenging when one is using observational data. Hume (1748, Sec. VII) specified the requirements for a causal inference, arguing that, “We may define a cause to be an object followed by another, and where all the objects similar to the first, are followed by object similar to the second. Or, in other words, where, if the first object had not been, the second never had existed.” In modern language, we need to demonstrate that Y occurs if and only if T occurs. Observational studies, or quasiexperiments, where the researcher neither manipulates T nor does the researcher control the assignment of T to the treated and control group(s), cause various problems. This could mean that the treatment may not be the same for all units and the level of the treatment may be confounded with the reaction of the unit to the treatment or the reaction of a unit to the level of treatment to another unit or violations of stability or spillover are again important here, as well as ignorable treatment assignment, which is the equivalent of randomization. Nevertheless, estimating the effectiveness of a specific policy, program, rule, law, or constitution is the goal of much of the empirical research in political science, public policy, and law. In general, estimating effect of some “treatment” variable is the explicit or implicit goal of much research.
Rosenbaum (2002): An observational study concerns treatment effects. A study without a treatment  often called an intervention or a program  is neither an experiment nor an observational study. Most public opinion polls, forecasting of financial returns, investigations of fairness and discrimination, and many other important empirical studies are neither experiments nor observational studies. Observational studies can employ data from nonexperimental, nonobservational studies as long as the focus is on assessing a treatment.
We will begin with a discussion of the NeymanRubinHolland counterfactual framework and will begin with the Rubin Causal Model. We will relate the Rubin Causal Model to the Campbell Causal Model, the Heckman Selection Model, the Pearl Causal Model, and the White Causal Model to analyze both similarities and differences. As Shadish, Cook, and Campbell (2002:pp 1314) described:
Quasiexperiments share with all other experiments a similar purpose  to test descriptive causal hypotheses about manipulable causes  as well as many structural details, such as the frequent presence of control groups and pretest measures, to support a counterfactual inference about what would happen in the absence of treatment. But, by definition, quasiexperiments lack random assignment. Assignment to conditions is by means of selfselection ... or means of administrator selection ... others decide which persons should get which treatment.
The Rubin Causal Model (RCM) begins with the fundamental problem of causal inference (FPCI). The RCM assumes that each unit being studied has two potential outcomes: one if the unit is treated and the other if untreated. A causal effect is defined as the difference between the two potential outcomes. The problem lies in that only one of the two potential outcomes is observed. Rubin and others developed the model into a general framework for causal inference with implications for observational research. FPCI is fundamentally a missing data problem and all of the techniques discussed in this course are research designs, given the study, that allow us to come up with a proxy value for the missing data.
We will have plenty of topics and not much time, only 20 hours spaced out over the fall 2013 and spring 2014 semesters (8/15; 8/16; 8/17; 8/19; 8/20; 8/21 in 2013 and 2/15; 2/17; 2/18; and 2/19 in 2014). Beyond a basic discussion of the scientific method, construct validity (and how to demonstrate it); causality (and how to demonstrate it), and the various models used to derive estimates that allow for causal inferences; counterfactuals and the potential outcomes framework; RCTs and randomization inference; propensity scores; matching estimators; instrumental variables; selection models; timeseriescrosssections; panel studies; event studies; regression discontinuity designs; selection bias and error bounds; and synthetic controls. We will relate the classic threats to internal validity from the Campbell Causal Model to the Rubin Causal Framework: ambiguous temporal precedence; selection; history; maturation; regression; attrition (or mortality); testing; instrumentation; as well as interactions of selection with the other threats.
To build a science the contributors to that science must present results that are replicable and reliable. The research behind any project must be transparent. Hence, we will spend a good part of the term discussing good research conduct, including matters of construct validity. Construct validity is all but ignored in political science and economics. There are some very widelyused data sets, from the Penn Tables (find Penn Tables here), the World Bank’s Development Data (find World Bank Data here), the Correlates of War dataset (find COW here), the World Justice Project (find the WJP here), Professor Keith Poole Voteview (find Poole’s Web Server here), and so on that are used widely in political science and economic studies. All of these data sets are commendable for providing documentation that, if one has the resources to carry out the project, it may be possible to replicate the original data sets. This is rare in any science.
The problem then lies in that many scholars modify the downloaded data, without regard to the definition of the construct used by those who collected the data nor with any concrete knowledge of how the data was collected. Often, scholars combine variables from two, three, or more studies. Further, variables are stitched together, by combining the indexes to create larger, master, indexes, as is done with indexes such as done in creating most of the indexes of the Rule of Law, Civil Rights, or Democracy, or as in other fields, such as accounting, and finance. We see scholars in the latter fields creating firm governance indexes with up to 39 separate variables, downloaded from the “WWWOracle.” The WJP, for example, collected data on 9 factors (and 48 subfactors, as well as subsubfactor): limited government powers; absence of corruption; order and security; fundamental rights; open government; regulatory enforcement; access to civil justice; effective criminal justice; and informal justice (from http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1966257). The WJP itself combines 400 categorical variables derived from polling 66,000 people the world over together with a dataset of questions given to 2,000 local experts in the same countries. These indexes, if not constructed correctly, clearly violate the tenants of construct validity. As such, there are five tests for construct validity and we will spend some time applying these, retroactively, to the above listed data sets. Each student will be required to send their favorite dataset to the class to give a presentation on it, with demonstrations of how it has been used in the literature.
Learning Objectives:
This is a course on research design for causal inference. The course will cover how to design compelling research, the focus of which is on causal inference. We will cover the design of true experiments, observational studies (quasiexperiments), and contrast these intervention studies to classroom or computer simulations. We will cover extensively the design of quasiexperiments where the researcher controls neither the assignment of cases into groups nor the administration of the treatment being studied. Our approach will follow the NeymanRubinHolland counterfactual or potential outcomes framework. This perspective has become increasingly popular in many fields including statistics, medicine, economics, political science, sociology, law, finance, accounting and marketing. The framework assumes that each unit being studied has two potential outcomes, one if the unit is treated and the other if untreated. A causal effect is defined as the difference between the two potential outcomes, but only one of the two potential outcomes is observed. Rubin and others developed the model into a general framework for causal inference with implications for observational research.
Required Materials:
All articles are in a zip file (1.8GB) that can be downloaded from: www.mccubbins.us/DukeSummer2013ResearchDesign.zip
The books listed below are required and available at various online bookstores.
To Purchase:
Rubin, Donald (2006) Matched Sampling for Causal Effects
Guo, Shenyang and Mark W. Fraser (2010) Propensity Score Analysis: Statistical Methods and Applications. Thousand Oaks, CA.: Sage Publishers. (hereinafter called G&F)
Copies of some chapters are available from Professor McCubbins (I encourage students to buy these books; they are readily available and very useful):
Curd and Cover (1998) Philosophy of Science: The Central Issues
Imbens and Rubin (2013) Causal Inference in Statistics and Social Sciences [draft] (hereinafter called I&R)
Chapters from Trochim and Donnelly (2007) The Research Methods Knowledge Base, 3rd Edition. Atomic Dog. You can get most of the chapters from this book at http://www.socialresearchmethods.net/kb/
Chapters from Cameron and Trivedi, Microeconometrics: Models and Applications.
Prerequisites and/or Recommended Preparation:
Must be a Ph.D. student or faculty at Duke or UNC; exceptions by permission of instructor. You may attend without signing up for course credit.
Grading Policies:
There is one assignment: a 10 page term paper and presentation. Presentations on February 19 2014; papers due April 16 2014.
Assignment Submission Policy:
Assignments must be turned in on the due date/time electronically or in hard copy directly to the professor. Late assignments receive no credit.
COURSE CALENDAR
There are a vast number of additional readings (some on additional topics not listed on this syllabus) that aren’t in the zipped folder of readings for this course. These are typically good (or instructively bad) examples of research design in business, the social sciences, medicine, and public policy. I include the additional readings on the syllabus and in the folder, so that as we go through the course, if we decide it would be useful to look at some of these, we can either change the course outline or meet for extra hours. We will meet 10 times, for two hours at a time, on the following dates: August 15, 16, 17, 19, 20, and 21 of 2013, and February 15, 17, 18, and 19 of 2014.
Lecture 1: The Scientific Method
Required:
1.McCubbins and Lake: Chapter on the Scientific Method [unpublished]
2.Lakatos, , available atwww.lse.ac.uk/lakatos
3.Curd, Martin and J. A. Cover. 1998. . New York: W. W. Norton, pp. 126, 675694
4.Leamer, E. E. (1983). Let's take the con out of econometrics. The American Economic Review, 3143.
Recommended:
1.Friedman, Milton, 1953. The Methodology of Positive Economics, in Friedman, Essays in Positive Economics. Chicago, IL: University of Chicago Press.
Additional:
1.Schwartz, Thomas. 1980. . New York: Random House. Pp.353.
Lecture 2: Construct Validity: The Theory of Measurement, Sampling, and Scaling
Required:
1.Trochim and Donnelly, Chapters 2, 3, and 5.
2.Curd, Martin and J. A. Cover. 1998. . New York: W. W. Norton, Chapter 7.
3.Adcock and Collier (2001). Measurement validity: A shared standard for qualitative and quantitative research. American political science review, 95(03), 529546.
Additional:
1.King, Gary, Robert O. Keohane, and Sidney Verba. 1994. . Princeton, NJ: Princeton University Press. Chapters 45.
2.Goertz, Gary. 2006. . Princeton, NJ: Princeton University Press, Chapter 1.
Lecture 3: Validating Constructs
Required:
 Trochim and Donnelly, Chapters 2, 3, and 5, continued.
 Gleditsch, Kristian S. and Michael D. Ward (1997). “Double Take: A Reexamination of Democracy and Autocracy in Modern Polities. JCR.
3.Daniel Kaufmann and Aart Kraay. “Governance Indicators: Where Are We, Where Should We Be Going?” The World Bank Policy Research Working Paper 4370
4.Botero, Juan C. and Alejandro Ponce (2010). “Measuring the Rule of Law.” World Justice Project WPS No. 001.
Additional:
CAPM
1.Eugene F. Fama and Kenneth R. French (1993) Common risk factors in the returns on stocks and bonds. Journal of Financial Economics 33 356.
2.Eugene F. Fama and Kenneth R. French (2004). “The Capital Asset Pricing Model: Theory and Evidence” Journal of Economic Perspectives, Volume 18, Number 3, Pages 25–46.
Papers on GIndex
1.Gompers, Ishii, Metrick 2003 QJE
2.Ravina Sapienza 2010 RFS 2010
3.Bebchuck, Cohen, and Ferrell (2009) RFS
4.DeFond, M., R. N. Hann, and X. Hu. 2005. Does the market value financial expertise on audit committees of boards of directors? Journal of Accounting Research.
Papers on Democracy, the Rule of Law, Knowledge and other Common Indexes
1.Agrast, et. al. and The World Justice Project (201213). “The World Justice Project: Rule of Law Index.”
2.Vreeland, James Raymond (2008). “The Effect of Political Regime on Civil War: Unpacking Anocracy – The Web Appendix,” JCR.
3.Hollyer et. al. (2012). “Measuring Transparency.” SSRN.
4.Banurri, Sheheryer and Catherine Eckel (2012). “Experiments in Culture and Corruption.” SSRN.
5.Keefer, Philip (2010). “The Ethnicity Distraction? Political Credibility and Partisan Preferences in Africa.” SSRN
Lecture 4. Causality: The potential outcomes framework for causal inference.
Required:
1.G&F Chapters 1 & 2.
2.I&R Ch. 1: The Basic Framework: Potential Outcomes, Stability & Assignment Mech.
3.I&R Ch. 3: A Taxonomy of Assignment Mechanisms
4.William R. Shadish (2010) “Campbell and Rubin: A Primer and Comparison of Their Approaches to Causal Inference in Field Settings”
5.Pearl, Judea.
Recommended:
1.Morgan and Winship (2007) Counterfactuals and Causal Inference, Chapters 1 and 2
2.Donald B. Rubin (2008) For Objective Causal Inference, Design Trumps Analysis. The Annals of Applied Statistics, Vol. 2, No. 3, pp. 808840
3.Winship and Morgan (1999) “The Estimation of Causal Effects from Observational Data”
4.William R. Shadish and Kristynn J. Sullivan “Theories of Causation in Psychological Science” APA handbook of research methods in psychology, Vol 1: Foundations, planning, measures, and psychometrics., (pp. 2352).
5.White, H.
Additional:
1.Holland (1986) “Statistics and Causal Inference”
2.Little and Rubin (2000) “Causal Effects in Clinical and Epidemiological Studies via Potential Outcomes”
3.Sekhon (2004): “Quality Meets Quantity: Case Studies, Conditional Probability and Counterfactuals”
Lecture 5: Randomized Controlled Trials
Required:
1.Trochim and Donnelly.
2.Duflo, E., Glennerster, R., & Kremer, M. (2007). Using randomization in development economics research: A toolkit. Handbook of development economics, 4, 38953962.
Recommended:
1.I&R Ch. 4: A Taxonomy of Classical Randomized Experiments
2.I&R Ch. 5: Fisher’s Exact Pvalues for CREs
3.I&R Ch. 6: Neyman’s Repeated Sampling Approach to CREs
4.I&R Ch. 7: Regression Methods for CREs
5.I&R Ch. 8: Modelbased Inference in Completely Randomized Experiments [Lalonde experimental data]
6.I&R Ch. 9: Stratified Randomized Experiments [project star data]
7.I&R Ch. 10: Paired Randomized Experiments [tv data]
Additional:
1.Neyman (1923 [1990]): “On the Application of Probability Theory to Agricultural Experiments. Essay on Principles. Section 9.” Statistical Science 5, 465–472.
2.Cox (1958): Planning of Experiments. Chapters 1 and 2.
3.David Sears. 1986. "College Sophomores in the Laboratory: Influences of a Narrow Data Base on Social Psychology’s View of Human Nature."
4.Lupia, Arthur and Mathew D. McCubbins. 1998. The Democratic Dilemma. New York: Cambridge University Press. Pp. 114, 101148.
5.Miguel, Edward and Michael Kremer. 2004. Worms: Identifying Impacts on Education and Health in the Presence of Treatment Externalities. Econometrica 72:159217.
Lecture 6. Observational Approximations to the RCT
Required:
1.I&R Ch. 12: Unconfounded Treatment Assignment
2.Cook, Thomas D. and Donald T. Campbell. 1979. . Boston, MA: Houghton Mifflin. Chapters 1 and 2.
3.I&R Ch. 29: Regression Discontinuity Designs
Recommended:
1.Guido Imbens (2010) Better LATE Than Nothing: Some Comments on Deaton and Heckman and Urzua
2.Thomas D. Cook (2008) “Waiting for Life to Arrive”: A history of the regressiondiscontinuity design in Psychology, Statistics and Economics
3.Rubin (1990) “Comment: Neyman (1923) and Causal Inference in Experiments and Observational Studies,” Statistical Science 5, 472480.
4.Cochran and Rubin (1973) “William G. Cochran’s Contributions to the Design, Analysis and Evaluation of Observational Studies”
Additional:
1.Cochran, William G. and Donald B. Rubin. 1973. “Controlling Bias in Observational Studies: A Review.” Sankhya, Ser. A 35: 417–446.
2.Cochran (1965): “The Planning of Observational Studies of Human Populations”
3.Cochran (1983): Chapters 1 and 7
Lecture 7. Randomization Inference
Required:
1.I&R Ch. 14: Assessing Overlap in Covariate Distributions
2.Rosenbaum (2002a): “Covariance adjustment in randomized experiments and observational studies.” Statistical Science 17 286–327 (with discussion).
3.Carlos A. Flores, Estimation of DoseResponse Functions and Optimal Doses with a Continuous Treatment
Recommended:
1.Cameron and Trivedi (2005) Ch. 24 Stratified and Clustered Samples
2.Fisher (1935, ch 1–2): Design of Experiments
3.JeanPierre Florens, James Heckman, Costas Meghir, and Edward Vytlacil. 2008. “Identification of Treatment Effects Using Control Functions in Models with Continuous, Endogenous Treatment and Heterogeneous Effects.” NBER Working Paper 14002.
Lecture 8. Heckman Sample Selection Model, Treatment Effect Model & Instrumental Variables
Required:
1.G&F Chapter 4
2.I&R Ch. 24: Instrumental Variables Analysis of Randomized Experiments
2.with One sided Noncompliance
3.I&R Ch. 25: Instrumental Variables Analysis of Randomized Experiments
3.with Two sided Noncompliance
4.I&R Ch. 26: Modelbased Analyses with Instrumental Variables
5.I&R Ch. 27: Instrumental Variables in Observational Studies
Recommended:
1.Angrist and Krueger (2001): “Instrumental Variables and the Search for Identification: From Supply and Demand to Natural Experiments”
2.Imbens and Rosenbaum (2005): “Randomization Inference with an Instrumental Variable,” Journal of the Royal Statistical Society, Series A, vol 168(1), 109–126.
3.Angus Deaton (2010) Instruments, Randomization, and Learning about Development. Journal of Economic Literature 48: 424–455
4.Angrist, Imbens, and Rubin (1996) “Identification of Causal Effects Using Instrumental Variables”
Additional:
1.Heckman (1997) “Instrumental Variables: A Study of Implicit Behavioral Assumptions Used in Making Program Evaluations”
2.Angrist and Krueger (1991): “Does compulsory school attendance affect earnings?” QJE 1991; 106: 979–1019.
3.Bound, Jaeger, and Baker (1995): “Problems with Instrumental Variables Estimation when the Correlation Between the Instruments and the Endogenous Regressors is Weak,” JASA 90, June 1995, 443–450.
Lecture 9. Propensity Score Matching
Required:
1.G&F Chapters 3 & 5.
2.I&R Ch. 13: Estimating the Propensity Score
3.I&R Ch. 17: Subclassification on the Propensity Score
Recommended:
1.Rubin (2006) Ch. 10: Central Role of the Propensity Score in Observational Studies
2.Crump, R.K., V.J. Hotz, G.W. Imbens and O.A. Mitnik (2008) "Dealing with limited overlap in estimation of average treatment effects", Biometrika, Vol. 96, Number 1, 187199, March 2009.
3.Kosuke Imai and David A. Van Dyk. (2004) “Causal Inference with General Treatment Regimes: Generalizing the Propensity Score” Journal of the American Statistical Association, Vol. 99, No. 467
Additional:
4.Rubin (2006) Ch. 11: Assessing Sensitivity to an Unobserved binary Covariate
Lecture 10. Univariate Matching Methods for Controlling Bias in Observational Studies
Required:
1.Rubin (2006) Ch. 3  5
Recommended:
Lecture 11. Multivariate Matching
Required:
1.Rubin (2006) Ch. 8  9
2.Rosenbaum, Paul R. 1989. “Optimal Matching for Observational Studies.” Journal of the American Statistical Association 84 (408): 1024–1032.
Recommended:
1.Rosenbaum, Paul R. 1991. “A Characterization of Optimal Designs for Observational Studies.” Journal of the Royal Statistical Society, Series B 53 (3): 597–610.
2.Hansen (2004) “Full Matching in an Observational Study of Coaching for the SAT”
Additional:
1. LaLonde (1986)
2.Dehejia and Wahba (1999)
3.Smith and Todd (2001)
4. Rubin (2001): “Using Propensity Scores to Help Design Observational Studies: Application to the Tobacco Litigation”
5.Galiani, Gertler, and Schargrodsky (2005): “Water for Life: The Impact of the Privatization of Water Services on Child Mortality”
6.Imbens, Rubin, and Sacerdote (2001): “Estimating the Effect of Unearned Income on Labor Earnings, Savings, and Consumption: Evidence from a Survey of Lottery Players”
7.Angrist (1998): “Estimating the Labor Market Impact of Voluntary Military Service Using Social Security Data on Military Applicants.”
Lecture 12. Time: Event Studies, Panel & TimeSeriesCrossSection
Required:
1.Craig MacKinlay (1997) “Event Studies in Economics and Finance” JEL
2.Cameron and Trivedi (2005) Ch. 21 Linear Panel Models: Basics
3.Cameron and Trivedi (2005) Ch. 23 Nonlinear Panel Models
Recommended:
1.Den Hartog, Christopher and Nathan W. Monroe. “The Value of Majority Status: The Effect of Jeffords’s Switch on Asset Prices of Republican and Democratic Firms.”
2.Donald T. Campbell et al. 1968. “Analysis of Data on the Connecticut Speeding Crackdown as a TimeSeries QuasiExperiment,” Law and Society Review 3, 1: 5576.
Lecture 13. Difference in Differences
Required:
1.Cameron and Trivedi (2005) Ch. 3.4.2, 22.6, and 25.5 Difference in Differences
Lecture 14. Matching Estimators: Synthetic Controls
Required:
1.I&R Ch. 18: Matching Estimators
2.I&R Ch. 15: Design in Observational Studies: Matching to Ensure Covariate Balance
3.Abadie, Diamond, and Hainmueller. 2007. “Synthetic control methods for comparative case studies: estimating the effect of California’s tobacco control program.”
4.Rubin (1980) “Bias Reduction Using MahalanobisMetric Matching”
5.Rubin (1979) “Using Multivariate Matched Sampling and Regression Adjustment to Control Bias in Observational Studies”
6.
Recommended:
1.Diamond and Sekhon (2005): “Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies
2.Sekhon (2006): “Alternative Balance Metrics for Bias Reduction in Matching Methods for Causal Inference”
3.Adam Fremeth, Guy Holburn, Brian Richter. 2012. “Did Chrysler Benefit from Government Assistance? Making Causal Inferences in Small Samples using Synthetic Control Methodology”
4.Abadie and Gardeazabal (2003): “The Economic Costs of Conflict: a CaseControl Study for the Basque Country”
Lecture 15. Regression Discontinuity Design
Presentation by students of their final papers. Pick any topic and write a 10 page paper using one of the techniques discussed in class. Pay particular attention to construct validity and threats to internal validity. Prepare and present a 10 minute talk on your paper. This will be followed by 10 minutes of class questions and discussion.
Required:
1.Hahn, Todd, and van der Klaauw (2001): “Identification and Estimation of Treatment Effects with a RegressionDiscontinuity Design”
2.Imbens and Lemieux (2007) Regression discontinuity designs: A guide to practice
Recommended:
1.Lee (2008): “Randomized Experiments from Nonrandom Selection in U.S. House Elections”
2.Devin Caughey and Jasjeet Sekhon (2009) RegressionDiscontinuity Designs and Popular Elections: Implications of ProIncumbent Bias in Close U.S. House Races
3.Dunning (2008): “Improving Causal Inference: Strengths and Limitations of Natural Experiments.” Political Science Quarterly 61(2):282–293 2008.
4.Richard A. Berk and Jan de Leeuw. 1999. “An Evaluation of California's Inmate Classification System Using a Generalized Regression Discontinuity Design,” Journal of the American Statistical Association 94, 448, pp. 10451052
5.Thistlethwaite and Campbell (1960): “RegressionDiscontinuity Analysis: An alternative to the ex post facto experiment”
Lecture 16. Missing Data
Required:
1.Cameron and Trivedi (2005) Ch. 27 Missing Data
References
Abadie, Alberto and Javier Gardeazabal. 2003. “The Economic Costs of Conflict: a CaseControl Study for the Basque Country.” American Economic Review 92 (1).
Angrist, J and AB Krueger. 1991. “Does compulsory school attendance affect earnings?” Quarterly Journal of Economics 106: 979–1019.
Angrist, Joshua D. 1998. “Estimating the Labor Market Impact of Voluntary Military Service Using Social Security Data on Military Applicants.” Econometrica 66 (2): 249–288.
Angrist, Joshua D., Guido W. Imbens, and Donald B. Rubin. 1996. “Identification of Causal Effects Using Instrumental Variables.” Journal of the American Statistical Association 91 (434): 444–455.
Angrist, Joshua D. and Alan B. Krueger. 2001. “Instrumental Variables and the Search for Identification: From Supply and Demand to Natural Experiments.” Journal of Economic Perspectives 15 (4): 69–85.
Bound, J., D. Jaeger, and R. Baker. 1995. “Problems with Instrumental Variables Estimation when the Correlation Between the Instruments and the Endogenous Regressors is Weak.” Journal of the American Statistical Association 90: 443–450.
Cochran, William G. 1965. “The Planning of Observational Studies of Human Populations (with discussion).” Journal of the Royal Statistical Society, Series A 128: 234–255.
Cochran, William G. 1983. Planning and analysis of observational studies. New York: John Wiley and Sons. Edited posthumously by L. E. Moses and F. Mosteller.
Cochran, William G. and Donald B. Rubin. 1973. “Controlling Bias in Observational Studies: A Review.” Sankhya, Ser. A 35: 417–446.
Cox, David R. 1958. Planning of Experiments. New York: Wiley.
Dehejia, Rajeev and Sadek Wahba. 1999. “Causal Effects in NonExperimental Studies: Re Evaluating the Evaluation of Training Programs.” Journal of the American Statistical Association 94 (448): 1053–1062.
Diamond, Alexis and Jasjeet S. Sekhon. 2005. “Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies.” Working Paper.
Dunning, Thad. 2008. “Improving Causal Inference: Strengths and Limitations of Natural Experiments.” Political Science Quarterly 61 (2): 282–293.
Fisher, Ronald A. 1935. Design of Experiments. New York: Hafner.
Freedman, David A. 2008a. “On Regression Adjustments in Experiments with Several Treatments.” Annals of Applied Statistics 2 (1): 176–196.
Freedman, David A. 2008b. “On Regression Adjustments to Experimental Data.” Advances in Applied Mathematics 40 (2): 180–193.
Freedman, David A. 2008c. “Randomization Does not Justify Logistic Regression.” Statistical Science 23 (2): 237–249.
Galiani, Sebastian, Paul Gertler, and Ernesto Schargrodsky. 2005. “Water for Life: The Impact of the Privatization of Water Services on Child Mortality.” Journal of Political Economy 113 (1): 83–120.
Gilligan, Michael J. and Ernest J. Sergenti. 2008. “Evaluating UN Peacekeeping with Matching to Improve Causal Inference.” Quarterly Journal of Political Science 3 (2): 89–122.
Hahn, Jinyong, Petra Todd, and Wilbert van der Klaauw. 2001. “Identification and Estimation of Treatment Effects with a RegressionDiscontinuity Design.” Econometrica 69: 201–209.
Hansen, Ben B. 2004. “Full Matching in an Observational Study of Coaching for the SAT.” Journal of the American Statistical Association 99: 609–618.
Heckman, James. 1997. “Instrumental Variables: A Study of Implicit Behavioral Assumptions Used in Making Program Evaluations.” The Journal of Human Resources 32 (3): 441–462.
Holland, Paul W. 1986. “Statistics and Causal Inference.” Journal of the American Statistical Association 81 (396): 945–960.
Imbens, Guido W. and Paul Rosenbaum. 2005. “Randomization Inference with an Instrumental Variable.” Journal of the Royal Statistical Society, Series A 168: 109–126.
Imbens, Guido W., Donald B. Rubin, and Bruce I. Sacerdote. 2001. “Estimating the Effect of Unearned Income on Labor Earnings, Savings, and Consumption: Evidence from a Survey of Lottery Players.” American Economic Review 91 (4): 778–794.
LaLonde, Robert. 1986. “Evaluating the Econometric Evaluations of Training Programs with Experimental Data.” American Economic Review 76 (September): 604–20.
Lee, David S. 2008. “Randomized Experiments from Nonrandom Selection in U.S. House Elec tions.” Journal of Econometrics 142 (2): 675–697.
Little, Roderick J. A. and Donald B. Rubin. 2000. “Causal Effects in Clinical and Epidemiological Studies via Potential Outcomes: Concepts and Analytical Approaches.” Annual Review of Public Health 21: 121–145.
Morgan, Stephen L. and David J. Harding. 2006. “Matching Estimators of Causal Effects: Prospects and Pitfalls in Theory and Practice.” Sociological Methods & Research 35 (1): 3–60.
Neyman, Jerzy. 1923 [1990]. “On the Application of Probability Theory to Agricultural Experiments. Essay on Principles. Section 9.” Statistical Science 5 (4): 465–472. Trans. Dorota M. Dabrowska and Terence P. Speed.
Rosenbaum, Paul R. 1989. “Optimal Matching for Observational Studies.” Journal of the American Statistical Association 84 (408): 1024–1032.
Rosenbaum, Paul R. 1991. “A Characterization of Optimal Designs for Observational Studies.” Journal of the Royal Statistical Society, Series B 53 (3): 597–610.
Rosenbaum, Paul R. 2002a. “Covariance Adjustment in Randomized Experiments and Observational Studies (with discussion).” Statistical Science 17: 286–327.
Rosenbaum, Paul R. 2002b. Observational Studies. New York: SpringerVerlag 2nd edition.
Rosenbaum, Paul R. and Donald B. Rubin. 1983. “The Central Role of the Propensity Score in Observational Studies for Causal Effects.” Biometrika 70 (1): 41–55.
Rosenbaum, Paul R. and Donald B. Rubin. 1984. “Reducing Bias in Observational Studies Using Subclassification on the Propensity Score.” Journal of the American Statistical Association 79 (387): 516–524.
Rubin, Donald B. 1973a. “Matching to Remove Bias in Observational Studies.” Biometrics 29: 159–184.
Rubin, Donald B. 1973b. “The Use of Matching and Regression Adjustment to Remove Bias in Observational Studies.” Biometrics 29: 185–203.
Rubin, Donald B. 1979. “Using Multivariate Sampling and Regression Adjustment to Control Bias in Observational Studies.” Journal of the American Statistical Association 74: 318–328.
Rubin, Donald B. 1980. “Bias Reduction Using MahalanobisMetric Matching.” Biometrics 36 (2): 293–298.
Rubin, Donald B. 1990. “Comment: Neyman (1923) and Causal Inference in Experiments and Observational Studies.” Statistical Science 5 (4): 472–480.
Rubin, Donald B. 2001. “Using Propensity Scores to Help Design Observational Studies: Application to the Tobacco Litigation.” Health Services & Outcomes Research Methodology 2 (1): 169–188.
Rubin, Donald B. 2006. Matched Sampling for Causal Effects. New York: Cambridge University Press.
Rubin, Donald B. and Neal Thomas. 2000. “Combining propensity score matching with additional adjustments for prognostic covariates.” Journal of the American Statistical Association 95: 573– 585.
Sears, David O. 1986. "College Sophomores in the Laboratory: Influences of a Narrow Data Base on Social Psychology’s View of Human Nature." Journal of Personality and Social Psychology 51:515530.
Sekhon, Jasjeet S. 2004a. “The 2004 Florida Optical Voting Machine Controversy: A Causal Analysis Using Matching.” Working Paper.
Sekhon, Jasjeet S. 2004b. “Quality Meets Quantity: Case Studies, Conditional Probability and Counterfactuals.” Perspectives on Politics 2 (2): 281–293.
Sekhon, Jasjeet S. 2006. “Alternative Balance Metrics for Bias Reduction in Matching Methods for Causal Inference.” Working Paper. http://sekhon.berkeley.edu/papers/SekhonBalanceMetrics.pdf
Sekhon, Jasjeet S. In Press. “Matching: Multivariate and Propensity Score Matching with Automated Balance Search.” Journal of Statistical Software. Computer program available at http://sekhon.berkeley.edu/matching/.
Smith, Jeffrey A. and Petra E. Todd. 2001. “Reconciling Conflicting Evidence on the Performance of Propensity Score Matching Methods.” AEA Papers and Proceedings 91 (2): 112–118.
Thistlethwaite, Donald L. and Donald T. Campbell. 1960. “RegressionDiscontinuity Analysis: An alternative to the ex post facto experiment.” Journal of Educational Psychology 51 (6): 309–317.
Winship, Christopher and Stephen Morgan. 1999. “The estimation of causal effects from observational data.” Annual Review of Sociology 25: 659–707.
Supplementary materials available on the Internet:
An excellent introduction to statistics and research design is Statistics at Square One 
http://bmj.com/collections/statsbk/index.shtml, see especially Chapter 5 
http://bmj.com/collections/statsbk/5.shtml
Good websites on statistics, econometrics, including free downloadable software for data entry, data analysis, research design, hypothesis testing, document preparation and presentation include:
http://davidmlane.com/hyperstat/index.html
http://members.aol.com/johnp71/javasta2.html#Freebies
http://www.american.edu/econ/notes/soft.htm
Online readings on the scientific method:
http://www.lse.ac.uk/collections/lakatos//
http://galileoandeinstein.physics.virginia.edu/lectures/lecturelist.html
http://teacher.nsrl.rochester.edu/phy_labs/AppendixE/AppendixE.html
http://plato.stanford.edu/entries/popper/
http://www.brint.com/papers/science.htm
http://www.emory.edu/EDUCATION/mfp/Kuhnsnap.html
http://wwwcdf.pd.infn.it/~loreti/science.html
Useful online articles on qualitative research:
http://bmj.com/cgi/reprint/325/7357/210.pdf
http://bmj.com/cgi/reprint/320/7226/50.pdf