Practice Problem 1

The following practice problem is aimed to give you some practice with exploring data and running a linear regression on your own using statistical software. You are welcome to use any statistical software you wish and you are also free to work in groups of up to 3 for this practice problem. If you work in groups, please have every member of the group complete the ICON survey.

Instructions

What to turn in

Please turn in a document that includes any relevant statistics/figures created. You will be asked to complete a graded survey on ICON as part of this practice problem.

Finally, upload the final document to ICON and complete the graded survey.

Due Date

Due around February 26th, 2024. No penalty for late submissions as long as it is submitted by May 9th.

Data

The data for this activity comes from the Kaggle. The data contain 104 rows and 14 columns about possums collected from Australia. A data description for each column in the data is shown below.

The data can be obtained in csv format. A short description for each attribute is as follows. These data are also found within the “data” folder inside the IDAS.

variable class description
case integer Observation number.
site integer site.
Pop character Population, either Vic (Victoria) or other (New South Wales or Queensland)..
sex character Sex of possum, either m (male) or f (female)..
age integer Age.
hdlngth integer Head length, in mm.
skullw integer Skull width, in mm.
totlngth integer Total length, in cm.
taill integer Tail length, in cm
footlgth integer foot length, in mm.
earconch integer ear conch length, in mm.
eye integer distance from medial canthus to lateral canthus of right eye, in mm.
chest integer chest girth, in cm.
belly double belly girth, in cm.

Questions

  1. Explore the distribution of the totlngth attribute. Summarize key elements of the distribution, for instance discussing elements related to shape, center, variation, and/or extreme values.

  2. Explore the bivariate association between tail length (taill attribute) and the total length from question 1 visually. From the figure, estimate what is the bivariate association. Does the association appear to be linear?

  3. Compute the bivariate association (i.e., the correlation) between tail length and total length. What is the best interpretation for the correlation?

Next