Programming Language: R Programming Language Data Set: https://archive.ics.uci.e
ID: 3873382 • Letter: P
Question
Programming Language: R Programming Language
Data Set: https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data
Use the adult.data from the University of California at Irvine repository to answer this quesion.
4a) For this question, only consider the people making less than or equal fifty thousand dollars. What is the percent of people making fifty thousand dollars or less (<=50K) that are male? In other words, what is the fraction of males making less than equal fifty thousand dallars divided by the number of all people making less than or equal to fifty thousand dollars?
4b) What is the percent of people making fifty thousand dollars or less (<=50K) that are female? (Once again the denominator should be the total number of people making less than or equal to fifty thousand dollars)
4c) Repeat questions 4a and 4b, by ignoring all the people who worked less than 40 hours per week.
4d) Only consider the people who worked at least 40 hours per week. What is the percent of people making more than fifty thousand dollars that are female? What percent of these people are male? (Here, the denominator should be the number of people who worked at least 40 hours per week and made more than fifty thousand dollars.)
4e) For this question only consider the people who worked at least 40 hours per week. Of the people with a bachelors degree, what is the percent making more than fifty thousand dollars? (i.e. (num working full time with Bachelors making more than 50 thousand) divided by (num working full time with Bachelors)) Of the people with less than 14 years of education what is the percent making more than fifty thousand dollars? Of the people with at least 16 years of education, what percent are making more than fifty thousand dollars? How many people have at least 16 years of education?
USE R PROGRAMMING LANGUAGE
Explanation / Answer
#We will download the .data file from the url using fread command.
#For this we need to install data.table and curl packages.
#When and if it asks if R prompts to install in a personal package, click on yes
install.packages("data.table","curl")
library(data.table)
#We download the file to a table using fread.
mydata <- fread('https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data')
#Using head we see 1st 10 rows to check if data is fine and if there is any header row
head(mydata)
#As there is no column names, we will add our own column names
colnames(mydata) <- c("Age","WorkClass","DateTime","Degree","YearOfEducation","MarriageStatus","Job","Relationship","Race","Gender","CapitalGain","CapitalLoss","HoursWorked","Nationality","Income")
#Q4a: For this question, only consider the people making less than or equal fifty thousand dollars.
#Find percent of people making fifty thousand dollars or less (<=50K) that are male
#For this 1st we create a table lessThan50kTotal for people with income less han 50 K income
lessThan50kTotal <- mydata[mydata$Income=="<=50K"]
#Using subset function we get all rows in lessThan50kTotal table where Gender is Male and then take the count of these rows using nrows.
#We receive this value in variable lessThan50kMale. Using nrow function again to get total rows in lessThan50kTotal table
#we compute the percentLessThan50kMale by dividing lessThan50kMale by total rows of lessThan50kTotal and multiplying it by 100 to get our answer
lessThan50kMale <- nrow(subset(lessThan50kTotal, lessThan50kTotal$Gender=="Male"))
percentLessthan50kMale <- 100*lessThan50kMale/nrow(lessThan50kTotal)
percentLessthan50kMale
#Q4b: Find percent of people making fifty thousand dollars or less (<=50K) that are female
#Using subset function we get all rows in lessThan50kTotal where Gender is Female and then take the count of these rows using nrows.
#We receive this value in variable lessThan50kFemale. Using nrow function again to get total rows in lessThan50kTotal table
#we compute the percentLessThan50kFemale by dividing lessThan50kFemale by total rows of lessThan50kTotal and multiplying it by 100 to get our answer
lessThan50kFemale <- nrow(subset(lessThan50kTotal, lessThan50kTotal$Gender=="Female"))
percentLessthan50kFemale <- 100*lessThan50kFemale/nrow(lessThan50kTotal)
percentLessthan50kFemale
#Q4c: Repeat questions 4a and 4b, by ignoring all the people who worked less than 40 hours per week
#Here we create a sub table lessThan50kActiveWorkers from lessThan50kTotal
#where hours worked is more than or equal to 40 and do the same computations
lessThan50kActiveWorkers <- lessThan50kTotal[lessThan50kTotal$HoursWorked >= 40]
lessThan50kMaleActiveWorkers <- nrow(subset(lessThan50kActiveWorkers, lessThan50kActiveWorkers$Gender=="Male"))
percentLessthan50kMaleActiveWorkers <- 100*lessThan50kMaleActiveWorkers/nrow(lessThan50kActiveWorkers)
percentLessthan50kMaleActiveWorkers
lessThan50kFemaleActiveWorkers <- nrow(subset(lessThan50kActiveWorkers, lessThan50kActiveWorkers$Gender=="Female"))
percentLessthan50kFemaleActiveWorkers <- 100*lessThan50kFemaleActiveWorkers/nrow(lessThan50kActiveWorkers)
percentLessthan50kFemaleActiveWorkers
#Q4d: Only consider the people who worked at least 40 hours per week.
#Find the percent of people making more than fifty thousand dollars that are female?
#What percent of these people are male?
#Now we will create a sub table from our original table mydata
#where hours worked are more than or equal to 40 and income is more than 50 k
activeWorkersHighIncome <- mydata[mydata$HoursWorked >= 40 & mydata$Income == ">50K"]
#Now we compute the female and male percentage as before
activeWorkersHighIncomeFemale <- nrow(subset(activeWorkersHighIncome, activeWorkersHighIncome$Gender=="Female"))
percentActiveWorkersHighIncomeFemale <- 100*activeWorkersHighIncomeFemale/nrow(activeWorkersHighIncome)
percentActiveWorkersHighIncomeFemale
activeWorkersHighIncomeMale <- nrow(subset(activeWorkersHighIncome, activeWorkersHighIncome$Gender=="Male"))
percentActiveWorkersHighIncomeMale <- 100*activeWorkersHighIncomeMale/nrow(activeWorkersHighIncome)
percentActiveWorkersHighIncomeMale
#4e: For this question only consider the people who worked at least 40 hours per week.
#Of the people with a bachelors degree, what is the percent making more than fifty thousand dollars?
#(i.e. (num working full time with Bachelors making more than 50 thousand) divided by (num working full time with Bachelors))
#Of the people with less than 14 years of education what is the percent making more than fifty thousand dollars?
#Of the people with at least 16 years of education, what percent are making more than fifty thousand dollars?
#How many people have at least 16 years of education?
#For the whole 4e question only consider the people who worked at least 40 hours per week
activeWorkers <- mydata[mydata$HoursWorked >= 40]
#num working full time with Bachelors
bachelorDegreeActiveWorkers <- nrow(subset(activeWorkers,activeWorkers$Degree=="Bachelors"))
#num working full time with Bachelors making more than 50 thousand
bachelorDegreeHighIncomeActiveWorkers <- nrow(subset(activeWorkers,activeWorkers$Degree=="Bachelors" & activeWorkers$Income == ">50K"))
#Of the people with a bachelors degree, what is the percent making more than fifty thousand dollars
percentBachelorDegreeHighIncomeActiveWorkers <- 100*bachelorDegreeHighIncomeActiveWorkers/bachelorDegreeActiveWorkers
percentBachelorDegreeHighIncomeActiveWorkers
#Of the people with less than 14 years of education
lessThan14Education <- nrow(subset(activeWorkers,activeWorkers$YearOfEducation<14))
#Of the people with less than 14 years of education and making more than fifty thousand dollars?
lessThan14EducationHighIncome <- nrow(subset(activeWorkers,activeWorkers$YearOfEducation<14 & activeWorkers$Income == ">50K"))
#Of the people with less than 14 years of education what is the percent making more than fifty thousand dollars?
percentLessThan14EducationHighIncome <- 100*lessThan14EducationHighIncome/lessThan14Education
percentLessThan14EducationHighIncome
#Of the people with at least 16 years of education
atleast16Education <- nrow(subset(activeWorkers,activeWorkers$YearOfEducation>=16))
#at least 16 years of education and making more than fifty thousand dollars
atleast16EducationHighIncome <- nrow(subset(activeWorkers,activeWorkers$YearOfEducation>=16 & activeWorkers$Income == ">50K"))
#Of the people with at least 16 years of education, what percent are making more than fifty thousand dollars?
PercentAtleast16EducationHighIncome <- 100*atleast16EducationHighIncome/atleast16Education
PercentAtleast16EducationHighIncome
percentAtleast16Education <- 100*atleast16Education/nrow(activeWorkers)
percentAtleast16Education
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.