Assignment 2

The following assignment is aimed to give you some practice with exploring data and running a linear regression on your own using statistical software. You are welcome to use any statistical software you wish and you are also free to work in groups of up to 3 for this assignment. If you work in groups, please submit one completed assignment per group on ICON. Please make sure to add everyone’s name to the submission, this can be a comment on ICON or on the document itself.

Instructions

What to turn in

Please turn in a document that contains the following:

  1. answers to the questions below
  2. include any relevant statistics/figures that support your answer.

Finally,upload the final document to ICON.

Including Statistical Evidence

This assignment is worth a total of 10 points spread over 7 questions. Please make sure to include any statistical evidence to support your statements, this could include graphics or statistics. This assignment also gives you a choice about what question you are interested in exploring within these data, therefore, including the statistical evidence is extremely important. Failure to include statistical evidence to support claims will result in a 1pt deduction.

Due Date

Due around April 15tj, 2024. No penalty for late submissions as long as it is submitted by May 9th.

Data

The data for this assignment is tuition and salary data for higher education institutions. The data originally come from Tidy Tuesday. I have done some processing to join two tables together with fuzzy string matching for you.

Data to use: The data can be obtained in csv format from GitHub. The data are also posted to the data folder within the IDAS. A short description for each attribute is as follows.

variable class description
name character Name of school
state_name character state name
early_career_pay double Estimated early career pay in USD
mid_career_pay double Estimated mid career pay in USD
make_world_better_percent double Percent of alumni who think they are making the world a better place
stem_percent double Percent of student body in STEM
type character Type: Public, private, for-profit
degree_length character 4 year or 2 year degree
room_and_board double Room and board in USD
in_state_tuition double Tuition for in-state residents in USD
in_state_total double Total cost for in-state residents in USD (sum of room & board + in state tuition)
out_of_state_tuition double Tuition for out-of-state residents in USD
out_of_state_total double Total cost for in-state residents in USD (sum of room & board + out of state tuition)

Note

This assignment builds off of assignment 1, therefore, please complete assignment 1 first. Review assignment 1 briefly prior to starting this assignment.

Questions

Note: Each question is worth 1 pt unless otherwise specified.

  1. Based on your first assignment, are there attributes that could have been missing from the linear regression fitted in assignment 1? That is, is the assumption of all attributes being included in the model appropriate? Why or why not? If not, what other attributes may be of interest?

  2. Add one or more of the attributes identified in question 1 to create a multiple regression model. That is, add one or more of the attributes from question 1 while keeping the attribute from assignment 1 into the regression model. Summarize the interpretation of the regression coefficient estimates for this multiple regression model.

  3. Estimate model fit indices to compare the model from assignment 1 to the multiple regression fitted in question 2. Does the model improve model fit based on that from assignment 1? Why or why not?

  4. Explore and evaluate the model assumptions regarding the residual/error term. Summarize how well the model is meeting the assumptions, citing specific statistics or visualizations to justify your conclusions. In addition, do the model assumptions seem better met compared to those from the regression in assignment 1? 2 pts

  5. Summarize the statistical evidence surrounding the null or alternative hypotheses that are being explored for the coefficients entered into the model. Note, I did not explicitly ask you for null/alternative hypotheses, but you may want to write those down for your reference. In a few sentences, describe if the evidence provides support for or against the null hypothesis. Provide specific statistical evidence to support your justification. 2 pts

  6. Create confidence intervals for the coefficients in the multiple regression model. Justify your confidence level and interpret the confidence intervals in the context of the data. What do these confidence intervals suggest about the magnitude of effect? 2 pts

  7. Based on the justification in #5 and #6, what practical implications does this result have? That is, are the statistical results that you have been describing/summarizing throughout this assignment practically relevant? Be as specific as you can in your discussion about why you believe the results are practically useful or not.

Previous
Next