Requirements : You are to create a program in Python that performs the following

ID: 3595611 • Letter: R

Question

Requirements :

You are to create a program in Python that performs the following using the pandas packages:

1. Loads the ss13hil.csv file that contains the PUMS dataset (assume it's in the current directory) and create a DataFrame object from it.

2. Create 3 tables:

TABLE 1: Statistics of HINCP - Household income (past 12 months), grouped by HHT - Household/family type

Table should use the HHT types (text descriptions) as the index

Columns should be: mean, std, count, min, max

Rows should be sorted by the mean column value in descending order

TABLE 2: HHL - Household language vs. ACCESS - Access to the Internet (Frequency Table)

Table should use the HHL types (text descriptions) as the index

Columns should the text descriptions of ACCESS values

Each table entry is the sum of WGTP column for the given HHL/ACCESS combination, divided by the sum of WGTP values in the data. Entries need to be formatted as percentages.

Table should include marginal values ('All' row and column).

Any rows containing NA values in HHL, ACCESS, or WGTP columns should be excluded.

TABLE 3: Quantile Analysis of HINCP - Household income (past 12 months)

Rows should correspond to different quantiles of HINCP: low (0-1/3), medium (1/3-2/3), high (2/3-1)

Columns displayed should be: min, max, mean, household_count

The household_count column contains entries with the sum of WGTP values for the corresponding range of HINCP values (low, medium, or high)

3. Display the tables to the screen as shown in the sample output on the last page.

Additional Requirements

1. The name of your source code file should be tables.py. All your code should be within a single file. 2. You need to use the pandas DataFrame object for storing and manipulating data.

3. Your code should follow good coding practices, including good use of whitespace and use of both inline and block comments.

4. You need to use meaningful identifier names that conform to standard naming conventions.

5. At the top of each file, you need to put in a block comment with the following information: your name, date, course name, semester, and assignment name.

6. The output should exactly match the sample output shown on the last page.

What to Turn In

You will turn in the single tables.py file

HINTS

To get the right output, use the following functions to set pandas display parameters:

pd.set_option('display.max_columns', 500)

pd.set_option('display.width', 1000)

To display entries as percentages, use the applymap method, giving it a string conversion function as input. The string conversion function should take a float value v as an input and output a string representing v as a percentage. To do this, you can use formatting strings or the format() method

Output should be same as following sample program output

Introduction For this assignment, you will work with a survey dataset and use the matplotlib package to visualize data. The data set you will be working with comes from the 2013 American Community Survey (ACS) data. According to census.gov, ACS "is a mandatory, ongoing statistical survey that samples a small percentage of the population every year -giving communities the information they need to plan investments and services." [see http://www.census.gov/acs/www/ More specifically, you will be using the ACS Public Use Microdata Sample (PUMS), which census.gov describes as "files [that] are a set of untabulated records about individual people or housing units." You can download the 2013 ACS 1-year PUMS data for llinois Housing Unit Records here: http://www.census.gov/acs data documentation/pums data You can also access documentation for the PUMS dataset, including the Data Dictionary, here http://www.census.gov/acs/www/data documentation/pums documentation

Explanation / Answer

NOTE: Take care of Python indentation problem, indentation may alter while submitting answer.

import pandas as pd

import numpy as np

#Loads the csv file as Dataframe object

df = pd.DataFrame.from_csv('ss13hil.csv')

pd.set_option('display.max_columns', 500)

pd.set_option('display.width', 1000)

# TABLE 1

l=[]

l.append("Dummy row")

l.append("Married Couple Household")

l.append("Other family household:Male householder, no wife present")

l.append("Other family household:Female householder, no husband present")

l.append("Nonfamily household:Male householder:Living alone")

l.append("Nonfamily household:Male householder:Not living alone")

l.append("Nonfamily household:Female householder:Living alone")

l.append("Nonfamily household:Female householder:Not living alone")

df_HHT = pd.DataFrame(l)

df_mean=df.groupby('HHT')['HINCP'].mean()

df_std=df.groupby('HHT')['HINCP'].std()

df_count=df.groupby('HHT')['HINCP'].count()

df_min=df.groupby('HHT')['HINCP'].min()

df_max=df.groupby('HHT')['HINCP'].max()

frames=[df_HHT,df_mean,df_std,df_count,df_min,df_max]

result = pd.concat(frames,axis=1,)

result.columns=['Household/family type','mean','std','count','min','max']

result=result.drop(result.index[[0]])

result=result.sort_values(['mean'],ascending=False)

print result,' '

# TABLE 2

l2=[]

l2.append("English only")

l2.append("Spanish")

l2.append("Other Indo-European languages")

l2.append("Asian and Pacific Island languages")

l2.append("Other language")

l2.append("ALL")

df_HHL = pd.DataFrame(l2)

df2=df.groupby(['HHL','ACCESS'])['WGTP'].sum().unstack(level=1)

s=df2.sum().sum()

df3=df2.sum()

l3=[0]

for i in range(len(df3)):

l3.append(df3[i+1])

df4=pd.DataFrame([l3])

df2=pd.concat([df2,df4],ignore_index=True)

df2=pd.concat([df2,df2.sum(axis=1)],axis=1,ignore_index=True)

df2=df2.apply(lambda x:100*x/s)

df2=df2.applymap(lambda x:'%.2f%%'%x)

frames=[df_HHL,df2]

result = pd.concat(frames,axis=1,ignore_index=True)

result.columns=['Household Language','Dummy','Yes w/ subs.','Yes wo/ subs.','No','All']

result=result.drop(result.columns[[1]],axis=1)

print result,' '

# TABLE 3

c1=[]

c1.append('low')

c1.append('medium')

c1.append('high')

dfc1 = pd.DataFrame(c1)

m11=df['HINCP'].quantile([0,0.33333333]).min()

m12=df['HINCP'].quantile([0,0.33333333]).max()

dft31 = df[(df['HINCP'] >= m11) & (df['HINCP'] < m12)]

m21=df['HINCP'].quantile([0.3333334,0.66666667]).min()

m22=df['HINCP'].quantile([0.3333334,0.66666667]).max()

dft32 = df[(df['HINCP'] >= m21) & (df['HINCP'] < m22)]

m31=df['HINCP'].quantile([0.6666666667,1]).min()

m32=df['HINCP'].quantile([0.6666666667,1]).max()

dft33 = df[(df['HINCP'] >= m31) & (df['HINCP'] <= m32)]

dfc2=pd.DataFrame([m11, m21, m31])

dfc3=pd.DataFrame([m12, m22, m32])

dfc4=pd.DataFrame([dft31['HINCP'].mean(),dft32['HINCP'].mean(),dft33['HINCP'].mean()])

dfc5=pd.DataFrame([dft31['WGTP'].sum(),dft32['WGTP'].sum(),dft33['WGTP'].sum()])

frames=[dfc2,dfc3,dfc4,dfc5]

dft3=pd.concat(frames,axis=1,ignore_index=True)

dft3=dft3.applymap(lambda x:'%.4f'%x)

dft3=pd.concat([dfc1,dft3],axis=1,ignore_index=True)

dft3.columns=['HINCP','min','max','mean','household_count']

print dft3

Navigate

Requirements : Write a java program that reads a set of integer lattice points,

Requirements : written in C in bash shell • For this project I want you to write

Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.

Requirements : You are to create a program in Python that performs the following

Question

Explanation / Answer

Related Questions

Navigate