Search engines often index their collections of documents so that they can easil
ID: 3888608 • Letter: S
Question
Search engines often index their collections of documents so that they can easily return an ordered set of documents that contain a given query word, w. Such a data structure is known as an inverted file. In order to construct an inverted file, we might start with a set of n triples of the form, (w, d, r), where w is a word, d is an identifier for a document that contains the word w, and r is a rank score for the popularity of the document d. Often, the next step is to sort these triples, ordered first by w, then by r, and then by d. Assuming each of the values, w, d, and r, are represented as integers in the range from 1 to 4n, describe a linear-time algorithm for sorting a given set of n such triples.
Explanation / Answer
The algorithm used for this is called the radix sort.
Since, we already know the range of the elements, we can use this knowledge to make a linear time sorting algorithm specifically for this case.
ALGORITHM:
Step 1: Create an array of size same as the known range.
Step 2: Go through the values in w and insert them into the array created in step 1.
Example: wi is inserted into array[i]
Finally just read out the array from index 0 to index 4n leaving the gaps. This requires a total of n basic operations. So, time complexity is linear.
Step 3. Repeat step 2 with d and r respectively.
Finally we end up with sorted triplet.
Total complexity for each step is linear. So for 3 steps, it is still linear.
If you have any query, you can in the comment section.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.