I need help answering the following questions. My Stata, for some reason, is NOT working.
Complete Questions 1 AND 2.
Please record any work that you undertake in STATA in a log file (commands entered, variables generated etc.) and append a copy of this. Similarly, append a copy of any “do files” that you construct.
QUESTION 1. The gender wage gap. ([url removed, login to view] is a cross section of wage data.)
i) Describe the data and generate some preliminary descriptive statistics which you think might be useful and informative for an analysis of a wage offer equation of the ‘Mincer’ type and the influence of gender and occupational choice on wages (you may want to use the ‘graph’ function in STATA, or the ‘hist’ function. The ‘gladder’ function might also be useful. For information on these commands type e.g.: help gladder).
ii) Once you are satisfied with the nature of the data, undertake ‘Mincer’ type wage regression using the cross sectional data in WAGE1.dta. Include the following explanatory variables:
educ exper tenure female
Motivate this choice of dependent variable and each of the explanatory variables by reference to human capital theory and the literature on the gender wage gap, some papers from which have been provided.
iii) Undertake any diagnostic or other tests that you feel are important to ensure that the test statistics reported by STATA are valid, you are able to use the asymptotic approximations for hypothesis testing and the assumptions underpinning OLS hold.
iv) Explain your regression results in the context of the previous tests. Comment, using asymptotic tests, on the statistical significance of the explanatory variables in the model above. Explain what is meant by a large sample test.
v) In particular, explain in detail the meaning of the parameter on the dummy variable ‘female’ in this context. To what extent, in your opinion, is this approach a good way to estimate the gender discrimination in the work place?
vi) Undertake a new regression which includes individual variables: nonwhite married numdep, the squared terms: expersq tenursq educsq, and the industry and occupational variables: construc ndurman trcommpu trade services profserv profocc clerocc servocc (as described in the appendix).
vii) Perform the same diagnostic tests as you performed above. Then, using a test which is robust to heteroskedasticity, test the joint significance of the occupational variables profocc, clerocc and servocc. What are your conclusions?
viii) Test the regression model for heteroskedasticity using the Breusch Pagan test State clearly your null and alternative hypotheses and the outcome of the test.
ix) Employ Feasible Generalised Least Squares (FGLS) in order to correct for Heteroskedasticity being careful to explain and report your method clearly. Briefly describe the pros and cons of FGLS.
x) Propose and undertake a regression which would allow you to get a clearer picture of how the gender wage gap varies with occupation and industry. Explain your results.
xi) Note how the estimated parameter on the gender dummy variable female has changed from the initial regression in part ii) [hint: when using the est tab function after your regression results have been stored using est store, use the option keep(female) to generate a table which compares the estimated coefficients]. Explain why this change might be occurring as a result of the regressions in parts ii), iv) and part x). (Compare FGLS and OLS!!) Consider the article by Ronald Oaxaca (Referenced below). Given that we have controlled for many different characteristics, is the negative coefficient on female evidence of discrimination?
QUESTION 2. (The impact of union membership on wages). [url removed, login to view] is a sample of 545 males from 1981-1987. That is, this is a panel dataset in which each cross-sectional unit has 8 observations over time. Consider the papers by Jakubson (1991) and Velle and Verbeek (1998).
i) Undertake a pooled OLS wage regression which, among other important explanatory variables has the following features:
a. Controls for the secular changes over time in each year.
b. Allows an analysis of the return to education in each time period.
c. Allows an analysis of the effect of union membership on wages
Interpret the coefficients on union and educ and interpret the estimated changes over time. Explain clearly why OLS is not be BLUE in this case?
ii) Jakubson (1991) estimated the impact of union membership using panel data methods to control for unobserved heterogeneity.
a. Why might there be reason to believe that union membership is endogenous (i.e. correlated to the unobserved heterogeneity)?
b. Explain how the first differences (FD) estimator can deal with individual heterogeneity bias. Undertake the regression above using the first differences (FD) estimator. Why does exper only enter as a squared term in this regression?
c. How do the estimates differ from the OLS estimates? In particular, what do you notice about the coefficient on the union and educ? In relation to union explain what this could suggest about the correlation with the individual fixed effect.
iii) What other estimator can be used to control for the effects of heterogeneity bias? How are we to tell which approach is most appropriate? Undertake this alternative estimation and test to see whether it is more appropriate than the FD estimator. What do you conclude?
iv) Explain the Random Effects Model and undertake a Random Effects (RE) estimation of the model in levels. Compare and contrast the RE model and the Fixed Effects (FE) Model in their assumptions and show how they relate to OLS. Undertake a Hausman test to see whether the FE estimator or the RE estimator is to be preferred, stating clearly the null and alternative hypotheses and interpreting the result.
v) What is your conclusion concerning the impact of union membership on wages.