For this portion of the project, you will examine your dataset for incorrect data. Any incorrect data should be removed, corrected, or imputed. Follow these steps:

  • Remove irrelevant data. If you are unsure if it is irrelevant, then keep it.
  • Remove duplicate records that are repeated.
  • Make sure numbers are interpreted as numerical data types.
  • Fix typos.
  • Standardize.
  • Investigate outliers.
  • Check and manage missing values.
  • Format and normalize data if needed.
  • Change categorical values into numbers if needed.

Once you have completed this, you will need to provide a Word document summarizing the pre-processing steps performed on your dataset.

