### Load Libraries and Packages library(foreign) library(AER) # To obtain a consistent estimate of the impact of kids on labor supply, # some authors have suggested using whether a mother had twins on their # first birth as an instrument for the number of children in the household. # Twins are in many respects random and by definition, the realization of a # twin increases the number of children in the household. Using data from # the 1980 Public Use Micro Sample 5% Census data files, a sample of women aged # 21-40 with at least one kid is constructed. The 1980 PUMS identifies a # person’s age at the time of the census and their quarter of birth. # Because the census is taken on April 1st, we know a person’s year and quarter # of birth and we can infer that any two kids in the household with the same # age and quarter of birth are twins. There are roughly 6,000 1st births to # mothers that are twins. There are over 800,000 observations in the original # data set so to make the problem manageable, a random sample is selected of # approximately 6,500 non-twin births for a total of about 12,500 observations. # The data file is called twins1st.dta. ### Load and Confirm Data df <- read.dta(choose.files()) # twin1st.dta head(df) tail(df) dim(df) class(df) attach(df) #--------------------------------------- 1. ---------------------------------------# # What fraction of women work? # What is average weeks worked among women? # What is median log of labor earnings for women who worked? mean(worked) mean(weeks) median(lincome) median(subset(df, worked == 1)$lincome) median(lincome[worked==1]) # ANSWER: # 60.4% of women worked last year, average weeks worked is 23 weeks. # The median log labor income for women is $1,005 and median log labor income # for women who worked is $5,505.. #--------------------------------------- 2. ---------------------------------------# # Construct an indicator that equals 1 for women that have a second child. # Call this variable SECOND. What fraction of women had a second child? # Consider a simple bivariate regression where WEEKS of work (Y) is regressed # on SECOND (X), Y = ß0 + ß1X + u. What is the coefficient for ß1 in this # regression and interpret the coefficient? second <- kids > 1 summary(second) reg.ols <- lm(weeks ~ second) summary(reg.ols) # ANSWER: # The coefficient on second is -6.8 meaning that, on average, # among women with one or more kids, the presence of the second child reduces # weeks worked by an average of 6.8 weeks/year, ceteris paribus. #--------------------------------------- 3. ---------------------------------------# # Because of the concern that X and u are correlated, use twins on 1st birth (Z) # as an instrument for X in an instrumental variables model. What is the # first-stage and reduced-form estimates for this model? Interpret these # coefficients, that is, what do these coefficients measure? Consider the # regression of X on Z. Why is the coefficient on Z not 1; e.g, don't twins # increase the number of kids in the house by 1? Run the first stage, does having # a twin (Z) increase the kids in the home (X)? # run the 1st stage model reg.1st <- lm(second ~ twin1st) summary(reg.1st) # run the reduced form, impact of twins (Z) on weeks worked (Y) reg.red <- lm(weeks ~ twin1st) summary(reg.red) # run the 2sls model iv <- ivreg(weeks ~ second | twin1st) summary(iv) ### Hausman test `by-hand' for endogeneity of regressors cf_diff <- coef(iv) - coef(reg.ols) vc_diff <- vcov(iv) - vcov(reg.ols) x2_diff <- as.vector(t(cf_diff) %*% solve(vc_diff) %*% cf_diff) pchisq(x2_diff, df = 2, lower.tail = FALSE) # Tests: Weak Instruments, Wu-Hausman, Sargan summary(iv, vcov = sandwich, diagnostics = TRUE) # On average, the presence of a twin increases the probability of having a second # child by 27.5 percentage points, ceteris paribus. Why is this coefficient not 1? # At the time of the birth, the presence of the twin increases family size from 1 # to 2. However, many of the women who had a twin on the 1st birth would have had # a second one anyway so that is the reason the twin1st coefficient is less than 1. # Notice that in the reduced form regression (weeks worked on twin1st) produces # a coefficient of -0.99. Women assigned a twin on the first birth are working # 1 week fewer per year. Notice that -0.99/0.2746 = -3.605 which is exactly the # 2SLS estimate. # According to the OLS model, the presence of the 2nd kid reduces work by almost # 7 weeks per year. In the 2SLS model, however, this number reduces to -3.6. The # OLS estimate is too large by a factor of 2 suggesting large omitted variables # problems in the OLS model. #--------------------------------------- 4. ---------------------------------------# # In this model, we run an OLS model similar to that in part 2 but we add # additional covavariates. First, generate dummy variables for mothers that are # black and other_race. Notice that the estimated impact of havin a second # kid increases in magnitude from -6.8 to -9.26, providing strong evidence that # the observed characteristics of the mother are correlated with whether the # mother had a second child. summary(race) # 1-3 black <- race==2 other.race <- race==3 summary(black) summary(other.race) # OLS with Covariates reg.ols.2 <- lm(weeks ~ second + agem + agefst + black + other.race + educm + married) summary(reg.ols.2) #--------------------------------------- 5. ---------------------------------------# # Now, use twin1st as an instrument for the second child the model above. # Compare these estimates to the results in part 4. Next, compare these results # to the simple 2SLS estimates. What has happened to the labor supply # impacts of having a second child? # IV with Covariates iv.2 <- ivreg(weeks ~ second + agem + agefst + black + other.race + educm + married | agem + agefst + black + other.race + educm + married + twin1st) summary(iv.2)