R-Programming Project

Find a dataset consisting of a categorical response variable having just two categories, several (3 to 6) potential

predictors, with a large enough sample size. At least two of the predictors should be quantitative variables - others can be 0/1 indicators or categorical. Do not use a dataset from the textbook or Stat2Data package.

Data Preparation:

1) Briefly describe the variables in your dataset and where you found the data.

2) Was your data in a format that you could immediately upload into RStudio and analyze, or was some

manipulation of the data needed? If you manipulated the data, explain what you did and how it was done.

Single Predictor:

1) Choose a single quantitative predictor and construct a logistic regression model.

2) Plot the raw data and the logistic curve on the same axes.

3) Construct an empirical logit plot and comment on the linearity of the data.

4) Use the summary of your logistic model to perform a hypothesis test to determine if there is significant evidence

of a relationship between the response and predictor variable. State your hypotheses and conclusion.

5) Construct a confidence interval for the odds ratio and include a sentence interpreting the interval in the


6) Compute the G-statistic and use it to test the effectiveness of your model.

7) Repeat (1)-(6) for a second model with a different single quantitative predictor.

8) Compare the effectiveness of your two models to each other (a formal test is not required).

Taidot: Tilastoanalyysi, R-ohjelmointikieli, Tilastotiede, SPSS Statistics, Datatiede

