Create a data frame called Indego.df and read the contents of the csv file into
ID: 3814634 • Letter: C
Question
Create a data frame called Indego.df and read the contents of the csv file into it.
Indego.df = read.csv("Indego_trips.csv")
Get summary information on this data frame. What are the different pieces of information provided in the file? How many trips are there total? How many of these trips are One Way? How many are Round Trip?
What are the different types of users? What percentage of the users had the Indego30 pass? What percentage had the IndegoFlex pass? What percentage were Walk-up customers?
Indego30, Walk-up, IndegoFlex pass
Look at trip #5. How long did this trip last? (The duration data is given in seconds.) What is the station ID where the trip started? What is the station ID where the trip ended? Using the station list provided on the above website, can you find the actual address where the trip started and where it ended?
If you look at the summary information for this data frame, you will see that there are mins, means, etc provided for the ID number of the starting station and the ID number of the ending station. However, mean ID number is not a useful quantity. Convert both of these ID vectors into factors. Then, rerun the summary() function and report on the top 5 most frequently used starting and ending stations, and how many trips originated or ended at each one.
Look up the help file for the plot() function. Using the information that you just learned, plot the latitude vs. longitude for the starting point of each trip (start_lat and start_lon).
Set a vector t equation to the trip durations. Then, set tmin equal to t in minutes and thr equal to t in hours.
Create a new data frame called indegomktg.df which includes the trip durations in hours and the passholder type. Output its summary information.
Create a vector called shortpass that includes the passholder types for all the trips that lasted less than 1 hour. What percentage of the short trips were completed by Indego30 pass holders?
Assume that you are looking to market podcasts and audiobooks that are stored on the bikes themselves (disposable earphones also available). The podcasts tend to be around 30-40 minutes in length, whereas an audiobook is generally around 2-3 hours. There will be short ads in each file which will generate revenue for Indego. In order to increase the chances that the customers will listen to the offerings (and therefore the ads), figure out a good way to market podcasts or audiobooks at the beginning of the trip based on the passholder type.
Explanation / Answer
# -*- coding: utf-8 -*-
"""
Created on Mon Apr 10 20:52:19 2017
@author: raska
"""
import pandas as pd
#FilePath=r"C:\Users aska\Indego_trips.csv"
Indego_df=pd.read_csv(r"C:\Users aska\Indego_trips.csv")
print(" Get summary information on this data frame")
print(Indego_df.describe())
print(" Different pieces of information provided in the file")
print(list(Indego_df))
print(" How many trips are there total? ")
print("Total rows: {0}".format(len(Indego_df)))
print(" How many of these trips are One Way?")
print(len(Indego_df[(Indego_df['trip_route_category']=='One Way')]))
print(" How many are Round Trip?")
print(len(Indego_df[(Indego_df['trip_route_category']=='Round Trip')]))
print(" What are the different types of users?")
print(Indego_df.passholder_type.unique())
print(" What percentage of the users had the Indego30 pass?")
print(round(len(Indego_df[Indego_df.passholder_type=='Indego30'])/len(Indego_df)*100,5))
print(" What percentage had the IndegoFlex pass?")
print(round(len(Indego_df[Indego_df.passholder_type=='IndegoFlex'])/len(Indego_df)*100,5))
print(" What percentage were Walk-up customers?")
print(round(len(Indego_df[Indego_df.passholder_type=='Walk-up'])/len(Indego_df)*100,5))
print(" ##Look at trip #5.")
dfRow5=Indego_df.iloc[4]
print(dfRow5)
print(" #How long did this trip last? (The duration data is given in seconds.)")
print(str(dfRow5['duration']))
#Stations Dataframe
Indego_stations_df=pd.read_csv(r"C:\Users aska\indego_stations.csv")
print(Indego_stations_df)
print(" #What is the station ID where the trip started?")
print(str(dfRow5['start_station_id']))
print(" #What is the station ID where the trip ended?")
print(str(dfRow5['end_station_id']))
#
#Using the station list provided on the above website, can you find the actual address where the trip started and where it ended?
start_st_df=pd.merge(Indego_df, Indego_stations_df,
left_on=['start_station_id','end_station_id'],
right_on=['Station ID','Station ID'],
how='inner')
end_st_df=pd.merge(Indego_df, Indego_stations_df,
left_on=['end_station_id'],
right_on=['Station ID'],
how='inner')
Station_nm_withID=pd.merge(start_st_df,end_st_df,on='trip_id')
Station_nm_withID=Station_nm_withID.rename(columns = {'trip_id':'trip_id'
,'start_station_id_x':'start_station_id'
,'Station Name_x':'start_station_name'
,'end_station_id_x':'end_station_id'
,'Station Name_y':'end_station_name'
})
StationNmandID=Station_nm_withID[['trip_id','start_station_id','start_station_name','end_station_id','end_station_name']]
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.