Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

I need help on this Scala / Spark homework: **Please build an RDD using sc.textF

ID: 3713014 • Letter: I

Question

I need help on this Scala / Spark homework:

**Please build an RDD using sc.textFile(…) for reading the words in. Given a text file “basketball_words_only.txt”, complete the following tasks:

1. Write a MapReduce program in Scala to find 1) the words that account for at least 3% of the document “basketball_words_only.txt”, 2) the 4 most frequent words in the document.

Remember Apache Spark uses lazy computation on RDDs. While many advantages exist, a disadvantage is that a same RDD may be recomputed. Please avoid this kind of recomputing in your program.

Below shows the correct output:

Words that account for at least 3% are "the","is","basketball","and",

the appears 10 times

basketball appears 8 times

is appears 6 times

and appears 6 times

2. Still using “basketball_words_only.txt” as input, write a MapReduce program in Scala to find, for each word in the file which word that follows the most.

Example, in “basketball_words_only.txt”, the word “basketball” is followed by

? “is” five times

? “has” two times

? “court” once

Then “is” is the word that follows “basketball” the most

Another example, the word “the” is followed by

? “ball” three times

? “court” twice

? “most” once

? “basket” once

? “end” once

? “game” once

? “team” once

If there are multiple such followers that appear the most, pick any one arbitrarily.

At last, display the “most frequent” follower for “basketball”, “the”, and “competitive” as follows:

"basketball" is followed by "is" 5 times.

"the" is followed by "ball" 3 times.

"competitive" is followed by "basketball" 2 times.

The text file (Basketball_words_only.txt):

basketball is a team competitive sport in which two teams of five active players each try to score points against one another by throwing a ball through a 10 foot high hoop under organized rules basketball is one of the most popular and widely viewed sports in the court points are scored by passing the ball through the basket from above the team with more points at the end of the game wins the ball can be advanced on the court by bouncing it dribbling or passing it between teammates disruptive physical contact fouls is not permitted and there are restrictions on how the ball result be handled violations through time basketball has developed to involve common techniques of shooting passing and dribbling as well as players positions and offensive and defensive structures while competitive basketball is carefully regulated numerous variations of basketball has developed for casual play in some countries basketball is also a popular spectator sport while competitive basketball is primarily an indoor sport played on a basketball court less regulated variations have become exceedingly popular as an outdoor sport among both inner city and rural groups

Explanation / Answer

To solve the above problem, we need to write the code and then we can count the word. To do so, we can write the script code as

val text=sc.textFile("basketball_words_only.txt");

val count=text.flatMap(line=>line.split("")).map(word=>(word,1).reduceByKey(_+_) counts.collect;

After writing the above script code we have to run the code. To run the code we write the following code as

spark-shell -i WordCountScala.scala

Once we are runing the code, it will count the word.

Now as per the given question,

Words that account for at least 3% are "the","is","basketball","and", the appears 10 times,

basketball appears 8 times , is appears 6 times and appears 6 times

Now in the counting process, we can put these as a keywords. to get such a values, we need to put all of them as akey value.

for example,

basketball appears 8 times . Here, one keyword is basketball and other key is 8. so the expression can be written as

val text=sc.textFile("basketball_words_only.txt");

val count=text.flatMap(line=>line.split("")).map(word=>(word,1).reduceByKey("basketball"+(word.count<=8)) counts.collect;

Hence when the script will run, it will count the appearance of basketball exactly up to 8 times.

Simillarly,other key attributes can also be computed in the same manner.

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote