Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Data Clean-up Dataset to be cleaned up can be downloaded via google drive https:

ID: 3850445 • Letter: D

Question

Data Clean-up

Dataset to be cleaned up can be downloaded via google drive

https://drive.google.com/file/d/0B1D66qK8jxd0YTI3cUU4R1VDa0U/view?usp=sharing

1. Create a separate repository and push the attached dataset (dirty_data.csv)

2. Populate the missing values in the Area variable with an appropriate values (Birmingham, Coventry, Dudley, Sandwell, Solihull, Walsall or Wolverhampton)

3. Remove special characters, padding (the white space before and after the text) from Street 1 and Street 2 variables. Make sure the first letters of street names are capitalized and the street denominations are following the same standard (for example, all streets are indicated as “str.”, avenues as “ave.”, etc.

4. If the value in Street 2 duplicates the value in Street 1, remove the value in Street 2

5. Remove the “Strange HTML column”

Complete the cleanup code and push the changes to the repository.

Submit a link to the repository. The repository will contain:

Combined code (.r or .rmd)

Original (dirty) dataset

New (clean) dataset

Dataset can be found

https://drive.google.com/file/d/0B1D66qK8jxd0YTI3cUU4R1VDa0U/view?usp=sharing

Explanation / Answer

main.R