Assignment 1

The following assignment is aimed to give you some practice with exploring data and running a linear regression on your own using statistical software. You are welcome to use any statistical software you wish and you are also free to work in groups of up to 3 for this assignment. If you work in groups, please submit one completed assignment per group on ICON. Please make sure to add everyone’s name to the submission, this can be a comment on ICON or on the document itself.

Instructions

What to turn in

Please turn in a document that contains the following:

  1. answers to the questions below
  2. include any relevant statistics/figures that support your answer.

Finally,upload the final document to ICON.

Including Statistical Evidence

This assignment is worth a total of 10 points spread over 9 questions. Please make sure to include any statistical evidence to support your statements, this could include graphics or statistics. This assignment also gives you a choice about what question you are interested in exploring within these data, therefore, including the statistical evidence is extremely important. Failure to include statistical evidence to support claims will result in a 1pt deduction.

Due Date

Due around March 25th, 2024. No penalty for late submissions as long as it is submitted by May 9th.

Data

The data for this assignment is tuition and salary data for higher education institutions. The data originally come from Tidy Tuesday. I have done some processing to join two tables together with fuzzy string matching for you.

Data to use: The data can be obtained in csv format from GitHub. The data are also posted to the data folder within the IDAS. A short description for each attribute is as follows.

variable class description
name character Name of school
state_name character state name
early_career_pay double Estimated early career pay in USD
mid_career_pay double Estimated mid career pay in USD
make_world_better_percent double Percent of alumni who think they are making the world a better place
stem_percent double Percent of student body in STEM
type character Type: Public, private, for-profit
degree_length character 4 year or 2 year degree
room_and_board double Room and board in USD
in_state_tuition double Tuition for in-state residents in USD
in_state_total double Total cost for in-state residents in USD (sum of room & board + in state tuition)
out_of_state_tuition double Tuition for out-of-state residents in USD
out_of_state_total double Total cost for in-state residents in USD (sum of room & board + out of state tuition)

Questions

Note: Each question is worth 1 pt unless otherwise specified.

  1. Identify a research question of interest that can use linear regression to answer the question. Please do not choose both early and mid career pay as part of the research questions.

  2. Explore the research question you identified in #1 descriptively. In a few sentences, summarize any potential relationship, including statistical evidence (i.e., figures or statistics) to support your statements.

  3. Fit the linear regression to answer your question from #1. Interpret the regression coefficients in the context of the problem at hand. That is, what do the coefficients mean for the attributes included in the model.

  4. Is the model intercept as interpretable as it could be? What could be done to enhance the interpretation of the intercept? Summarize in a few sentences any theory or data elements that may help to decide how to improve the interpretation of the intercept.

  5. Interpret the model estimates that show how well the model is performing. That is, what model statistics help to understand how well the model representing the outcome attribute? What do these statistics mean in the context of the problem?

  6. Explore and evaluate the model assumptions/data conditions regarding the residual/error term. Summarize how well the model is meeting the assumptions, citing specific statistics or visualizations to justify your conclusions. 2 pts

  7. Write out the null and alternative hypotheses based on your research question from #1.

  8. Summarize the statistical evidence surrounding the null or alternative hypotheses from question 7. In a few sentences, describe if the evidence provides support for or against the null hypothesis. Provide specific statistical evidence to support your justification.

  9. Based on the justification in #8, what practical implications does this result have? That is, are the statistical results that you have been describing/summarizing throughout this assignment practically relevant? Be as specific as you can in your discussion about why you believe the results are practically useful or not.

Next