Data Warehousing 2- Indexes are optional structures associated with tables or cl
ID: 3595516 • Letter: D
Question
Data Warehousing
2- Indexes are optional structures associated with tables or clusters, and exploiting indexes to reduce scanning of data is of crucial importance a) Describe the role of indexing in data warehousing, and b) Explain different types of indexing methods that are used in data warehousing and c) State the advantages and disadvantages of each method. d) Provide an example for each indexing methods and e) Discuss an indexing method that you would recommend if you were a data warehouse designer.Explanation / Answer
WHAT IS AN INDEX? An index is a small table having only two columns. The first column contains a copy of the primary or candidate key of a table and the second column contains a set of pointers holding the address of the disk block where that particular key value can be found.
The advantage of using index lies in the fact is that index makes search operation perform very fast. Suppose a table has a several rows of data, each row is 20 bytes wide. If you want to search for the record number 100, the management system must thoroughly read each and every row and after reading 99x20 = 1980 bytes it will find record number 100. If we have an index, the management system starts to search for record number 100 not from the table, but from the index. The index, containing only two columns, may be just 4 bytes wide in each of its rows. After reading only 99x4 = 396 bytes of data from the index the management system finds an entry for record number 100, reads the address of the disk block where record number 100 is stored and directly points at the record in the physical storage device. The result is a much quicker access to the record (a speed advantage of 1980:396).
The speed advantage discussed above aims at finding a software solution, whereby searching in a smaller sized index gains more speed than searching in the larger sized table. A hardware advantage can also be found. Because it is smaller in size in comparison to the main table, the index can be put in a high speed memory module (primary or cache memory) than in the hard disk. It increases the speed of the searching even further.
The only minor disadvantage of using index is that it takes up a little more space than the main table. Additionally, index needs to be updated periodically for insertion or deletion of records in the main table. However, the advantages are so huge that these disadvantages can be considered negligible.
Types of Indexing
Primary
Dense
Sparse
Clustering
Secondary
Primary Index
In primary index, there is a one-to-one relationship between the entries in the index table and the records in the main table. Primary index can be of two types.
Dense primary index: the number of entries in the index table is the same as the number of entries in the main table. In other words, each and every record in the main table has an entry in the index.
Sparse or Non-Dense primary index: for large tables the dense primary index itself begins to grow in size. To keep the size of the index smaller, instead of pointing to each and every record in the main table, the index points to the records in the main table in a gap.
Clustering Index
It may happen sometimes that we are asked to create an index on a non-unique key, such as Dept-id. There could be several employees in each department. Here we use a clustering index, where all employees belonging to the same Dept-id are considered to be within a single cluster, and the index pointers point to the cluster as a whole.
The following diagram is used to describe the storage of the clustering index in disk block. The clustering index contains entries for five departments. The pointers of these departments point to the anchor record of the disk blocks. Each disk block can contain four records each. If the number of records in a cluster exceeds four, a separate disk block is allocated for the cluster. A pointer associated with the cluster (much like the link pointer or the next pointer in the linked list) points to the subsequent disk block belonging to the same cluster.
Secondary Index
While creating the index, generally the index table is kept in the primary memory (RAM) and the main table, because of its size is kept in the secondary memory (Hard Disk). Theoretically, a table may contain millions of records (like the telephone directory of a large city), for which even a sparse index becomes so large in size that we cannot keep it in the primary memory. And if we cannot keep the index in the primary memory, then we lose the advantage of the speed of access. For very large table, it is better to organize the index in multiple levels.
Multilevel Index
The Multilevel Index is a modification of the secondary level index system. In this system we may use even more number of levels in case the table is even larger.
It depends as what data is stored in the warehouse. But in general I would recommend secondary or multilevel indexing based upon the size of the warehouse.
I have solved your question. Please do not forget to give a positive like to the answer. Thank you.
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.