Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

A website streams movies to customers’ TVs or other devices. Movies are in one o

ID: 3835022 • Letter: A

Question

A website streams movies to customers’ TVs or other devices. Movies are in one of several genres such as

action, drama, mystery, etc. Every movie is in exactly one genre (so that if a movie is an action movie as

well as a comedy, it is in a genre called “action-comedy”). The site has around 10 million customers, and

around 25,000 movies, but both are growing rapidly.

The site wants to keep track of the most popular movies streamed. You have been hired as the lead engineer

to develop a tracking program.

i) Every time a movie is streamed to a customer, its name (e.g. “A Very Harold & Kumar 3D Christmas”)

and genre (“Comedy”) is sent to your program, which then updates the data structures it maintains. (Assume

your program can get the current year with a call to an appropriate Java class, in O(1) time.)

ii) Customers want to know the top k most streamed movies in genre g in year y. (If y is the current year,

then accounting is done up to the current date of the year.) For example, what were the top 10 most streamed

comedy movies in 2013? Here k=10, g=”comedy” and y=2013. This query is sent to your program which

should output the top k movie names, in descending order of number of times streamed.

a) Describe the data structures used to implement the above.

(Please give a description not code and highlight your main points and why it works)

Explanation / Answer

Description

-------------------

To maintain the such kind of data as given in above question, we need following two major things.

1. Fast retrival ( i.e Hashing so that we can directly fetch the item or records on the basis of index from an array).

2. Records should be stored into sorted order either asending or desending order so that we can fetch top k most records from array).

We can use best use of Trie and Min heap algorithum which caters above two properties.

The idea is

1. Whenever any movie is being inserted, it will get one index and stored in array ( like a hashmap in java, every element is stored in array on particular index and that index is calculated by applying hashing technique). We can use trie here.

2. Element should be stored in array in sorted order so in that case we can use min heap.
Min Heap is nothing but type of binary search tree the only difference is BST is not stored in array where in min heap, element gets stored in array or cotinues memory.
we are taking min heap here just beacuse of we want to store element according to most streamed or used movie in array ( i.e maintian integer count for the same)

By doing this we can store movies in array according to most streamed format or frequent watching or any other charactersticks.


Trie and Min Heap are linked with each other by storing an additional field in Trie ‘indexMinHeap’ and a pointer ‘trNode’ in Min Heap.
means every element of array has one node which maintains ‘indexMinHeap’ means index of the movie. The value of ‘indexMinHeap’ is maintained as -1 for the movies which are currently not in Min Heap (or currently not among the top k frequent streamed movies). For the movies which are present in Min Heap, ‘indexMinHeap’ contains, index of the movie in Min Heap.
The pointer ‘trNode’ in Min Heap points to the leaf node corresponding to the movie in Trie.


Read all movies one by one. For every movie, insert it into Trie. Increase the counter of the movie, if already exists. Now, we need to insert this movie in min heap also. For insertion in min heap, 3 cases arise:

1. The movie is already present. We just increase the corresponding frequency value in min heap and call minHeapify() for the index obtained by “indexMinHeap” field in Trie. When the min heap nodes are being swapped, we change the corresponding minHeapIndex in the Trie. Remember each node of the min heap is also having pointer to Trie leaf node.

2. The minHeap is not full. we will insert the new movie into min heap & update the root node in the min heap node & min heap index in Trie leaf node. Now, call buildMinHeap().

3. The min heap is full. Two sub-cases arise.
….3.1 The frequency of the new movie inserted is less than the frequency of the movie stored in the head of min heap. Do nothing.

….3.2 The frequency of the new movie inserted is greater than the frequency of the movie stored in the head of min heap. Replace & update the fields. Make sure to update the corresponding min heap index of the “movie to be replaced” in Trie with -1 as the word is no longer in min heap.

4. Finally, Min Heap will have the k most frequent movies of all movies present in given arraY. So we just need to print all movies present in Min Heap.


In brief, what we are doing here is whenever movie comes for insertion, increase the count and reorder the poistion in heap so that sorted order can be maintianed every time and maintiaing node. Heap is resposible to mainting ordering and trie is responsible for maintianing index.

Below data structure will be using both trie and Min heap

TrieNode
------------
   indexMinHeap; // the index of the word in minHeap
   TrieNode* movie; // represents movie object.

MinHeapNode
--------------

   TrieNode* root; // indicates the leaf node of TRIE
   unsigned frequency; // most streamed count

Please let me know if you face any difficulties to make understanding on above algorithum.

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote