Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

MySQL Homework: Write queries to solve the following questions (I\'ll give acces

ID: 3890609 • Letter: M

Question

MySQL Homework:

Write queries to solve the following questions (I'll give access to the database connection info.):

MySQL connection information

user name: MSBAC

password: spark

The database hostname/address is: mysql-instance1.cho77jnglrlv.us-west-2.rds.amazonaws.com

1. (3 pts) What are the top lists? (as indicated by the column: list)

2. (3 pts) The data in top300 was collected on a daily basis. Can you tell the data collection period?

3. (3 pts) On Aug 31, do we miss any data for any of the top lists? Please provide evidence to support your answer.

4. (3 pts) Is the data for the list "Top Free" complete?

5. (3 pts) There are apps that appeared in the list of "Top Free" every single day for the whole period. How many such apps are there?

6. (3 pts) How many apps have been ranked number 1 in the “Top Free” list?

7. (3 pts) In the "Top Free" list, which two primary categories appear most often?

8. (3 pts) What is the shortest time in number of days between an app’s release date and the date an app makes to the top list? What do you think about this information?

9. A. (2 pts) During the data collection period, there are 300 apps per top list per day per Apple Store. Is the data in top300 complete? How do you know?

B. (2 pts) If the data is not complete, can you make an estimate about the amount of data missing in the top300 table?

10. (3 pts) Is the apps table complete? That is, do we have data on all apps that appear in the top 300 list? Please provide evidence supporting your answer.    

11. Bonus question (2 pts): Have you noticed any other missing data or data of error? If so, please indicate them here.  

CIS 505: Introduction to Enterprise Analytics Relational Model of the apps, app reviews, appcategories, and top300 Tables | app-categories id INTI 12) ? developer BIGINT(15) category VARCHAR(5) primary TINYINT(1 ) id INT 12) name-length SMALLINT(5) release-date DATE devices SMALLINT(5) game center TINYINT(1) screenshots TINYINT(3) screenshots ipad TINYINT(3) developer BIGINT ( 1 5) + developer-name VARCHAR(100) seller-name VARCHAR(100) seller-url VARCHAR(50) price DECIMAL(52) 300 idx INT(11) insert-time TIMESTAMP num-of-categories TINYINT(3) category-primary VARCHAR(30) languages TINYINT(3) filesize BIGINT(15) description-length INT(10) rating oount INT(10) average-rating FLOAT content rating VARCHAR(3) Inde x xes Apps table -Description_length: number of characters in the app's Name: app name - - Content rating: the age group Name length: number of the app belongs to characters in app name - Release date: date of release -Devices: number of device models where the app can be installed -Id: app id (referencing id column in -Game_center: whether the app apps table) -Rating count: number of ratings - Average_rating: the average rating supports game center (0: no, 1: -Screenshots: number of for the app screenshots on iPhone -Screenshots_ipad: number of -d: app id (referencing id column in -Developer: id of developer -Developer_name: name of apps table) Developer: developer id developer Primary: whether the category in the row is the primary category for the app (0: no, 1: yes) - .Continue Apps table -Seller name: name of seller - Seller_url: seller's website URL - Idx: the primary key - Insert time: the time when the -Number of categories: the number of categories the app data of the row was collected -List: which top list("top free" - Category_primary: the "top paid", "top grossing", "top free ipad", "top paid ipad, "top "primary category of the app -Languages: number of languages the app supports Filesize: number of bytes of the -Id: app id (referencing id - column in apps table)

Explanation / Answer

--1. (3 pts) What are the top lists? (as indicated by the column: list)
select distinct list from top300;

--Explanation : list column of top300 contains different top list like "top paid", "top free". We can use distinct to fetch type of list available in top300

--2. (3 pts) The data in top300 was collected on a daily basis. Can you tell the data collection period?
select to_char(max(insert_time),'hh:mm:ss') as 'start_time' to_char(min(insert_time),'hh:mm:ss') as 'end_time' from top300
--above query will provide max time period in terms of hours minutes and second of the day
select max(insert_time) as 'start' min(insert_time) as 'end' from top300
--above query will provide time period in terms of day

--3. (3 pts) On Aug 31, do we miss any data for any of the top lists? Please provide evidence to support your answer.
Select count(*) from top300 where insert_time='08/31/2017' and list is null

--Explanation : As per the table structure idx(primary key) and id(foreign key) can't be null. The only column which can be null is insert_time. The above query will return the recrd count for such record having insert_time as '08/31/2017' and list is null. If we are getting count more that 0 then data is missing for 31 Aug.


--4. (3 pts) Is the data for the list "Top Free" complete?
Select count(*) from top300 where list='Top Free' and insert_time is null

--Explanation : As per the table structure idx(primary key) and id(foreign key) can't be null. The only column which can be null is insert_time. The above query will return the recrd count for such record having list as 'Top Free' and insert_time is null. If we are getting count more that 0 then data is incomplete

--5. (3 pts) There are apps that appeared in the list of "Top Free" every single day for the whole period. How many such apps are there?


--6. (3 pts) How many apps have been ranked number 1 in the “Top Free” list?
select count(*) from top300 where list='Top Free'

--7. (3 pts) In the "Top Free" list, which two primary categories appear most often?
select top 2 from (select list,count(list) as 'total_count' from top300 group by list order by total_count desc)

--8. (3 pts) What is the shortest time in number of days between an app’s release date and the date an app makes to the top list? What do you think about this information?
select id,name,min(top_time) from (select id,name,datdiff(apps.release_date,top300.insert_time) as top_time from apps join top300)