1. For this project you must find some published or existing data. Possible sources include: almanacs, magazines and journal articles, textbooks, web resources, athletic teams, newspapers, professors with experimental data, campus organizations, electronic data repositories, etc. Your dataset must have at least 250 cases, two categorical variables and two quantitative variables. It is also recommended that you are interested in the material included in the dataset. 2. Utilizing technology preform exploratory data analysis, EDA. (a) For at least one of the quantitative variables, include summary statistics (mean, standard deviation, five number summary) and graphical displays (histogram, box plot and qq plot). Are there any outliers? Is the distribution normal, symmetric, skewed, or some other shape? (b) Create a graphical display looking at multiple variables and their correlation. (c) For at least one of the categorical variables, include a frequency and relative frequency table. (d) Include a two-way table for two of the categorical variables and discuss any relevant proportions. Describe any possible relationship between the two variables. (e) Include a side-by-side plot for at least one categorical and at least one quantitative variable. Describe any association between the two variables. Use summary statistics to compare groups. (f) Create a visualization or preform a statistical computation that is appropriate to your data but not already included. 3. Write your report! (a) Introduce your data set including a reference to where it can be found. Describe all relevant variables that you will use in your analysis. (b) Describe all ways that were neseccary to clean and organize the data. (c) Include all items requested above. Include graphs and text about each. (d) Write a brief conclusion highlighting the most interesting features of your data. The report will be graded by the following criteria: ˆ Statistical analysis – 30 points. The statistical tests are all provided. ˆ Graphical Representations – 30 points. The requested graphical displays are made and included in report. ˆ Data collection – 15 points. The data is gathered in a responsible way. The method of collection is clearly stated and variables are all explained. If a sample of the data is used it is done in a proper way. ˆ Interpretations – 15 points. The results of the statistical analysis are clearly explained and interpreted in the context of the problem. The conclusions accurately reflect the analysis and are well supported. ˆ Writing quality – 10 points. The paper is readable and clearly written. There are few, if any, grammatical or spelling errors and they do not interfere with the clarity of the paper. Numbering on this document is not used in the report in anyway.



Source link