Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

In this part of the project, you will various queries using Hive, Pig and Hadoop

ID: 3900091 • Letter: I

Question

In this part of the project, you will various queries using Hive, Pig and Hadoop streaming. The schema is available below, but don’t forget to apply the correct delimiter:
http://rasinsrv07.cstcis.cti.depaul.edu/CSC555/SSBM1/SSBM_schema_hive.sql
The data is available at:
http://rasinsrv07.cstcis.cti.depaul.edu/CSC553/data/ (we will use Scale4)

Use Lineorder.tbl

Implement the following query: select sum(lo_revenue), p_brand1 from lineorder, part, supplier where lo_partkey = p_partkey and lo_suppkey = s_suppkey and p_category = 'MFGR#12' and s_region = 'EUROPE' group by p_brand1;

using Hive and Pig .

Explanation / Answer

Step 1: Configure Hadoop and PigStorage

Step 2: Load data into the storage

Step 3: Load data for part and supplier tables as well

Step 4: Execute below in grunt

> lineparts = JOIN lineorder BY lo_partkey, part BY p_partkey;
> linesupply = JOIN lineorder BY lo_suppkey, supplier BY s_suppkey;
> filterlineparts = FILTER lineparts BY p_category = 'MFGR#12';
> filterlinesupply = FILTER linesupply BY s_region = 'EUROPE';
> filterData = JOIN filterlineparts BY lo_partkey, filterlinesupply BY lo_partkey;
> Group_data = GROUP filterData BY p_brand1;
> revenue_sum = foreach Group_data Generate
(p_brand1.name,SUM(lo_revenue);

Query in HiveQL:

select sum(lo.lo_revenue), p.p_brand1
from lineorder lo
join part p on (lo.lo_partkey = p.p_partkey)
join supplier s on (lo.lo_suppkey = s.s_suppkey)
where p.p_category = 'MFGR#12' and s_region = 'EUROPE'
group by p_brand1;

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote