Suppose a data file for Student (Name, SSN, Address, Phone, Email…). Given that
ID: 3916654 • Letter: S
Question
Suppose a data file for Student (Name, SSN, Address, Phone, Email…). Given that the data file’s record size is 400 bytes, disk block size is 1024 bytes, and total number of records are 3000000. For index entry, the field size is 9 bytes, and the pointer size is 8 bytes. Answer the following questions:
a. What will be the average linear search cost using data file?
b. What will be binary search cost using data file?
c. What will be binary search cost using index file (dense index)? Discuss why using indexes is almost always better.
Explanation / Answer
In our data we have block size is 1024Bytes and each record size is
400Bytes. we cant store 3 records in a same block because 400*3 = 1200Bytes
Now we strategies to store records in a block.
Strategies for storing a record file into blocks:
1.Spanned strategy: It allows partial part of record to be stored in a block.
here we store 1024/400 records in a block
i.e. 2 full records +partial 3rd record
Advantage : Space used efficiently..
Disadvantage: we need to search 2 blocks for 1 record sometimes.
It is suitable for variable length records.
2.UnSpanned Strategy:
No record can be stored in more than one block.
Here space was wasted.But number of block access reduced.
IT is Suitable for fixed length records.
Given Data:
Record Size =400Bytes
Block size =1024Bytes
Number of records is 3000000
In index
Key size = 9Bytes
Block pointer size = 8Bytes
1.Linear Search:
In Linear search we dont use index file so we linearly search on Data file.
Time complexity is depends on number of blocks in a data file.
Fixed record size so go for unspanned .
no of blocks = (Total Records )/(number ofrecords in a block)
= 30,00,000/2
= 15,00,000 Blocks
So we need to search 1.5 million blocks in linear search.
2.Binary Search:
To apply the binary search Reacords are must be in Sorting order.Otherwise
This strategy can't work.Assume it is in sorting Order Then Time complexity
is follows like this.
Here also we dont use the index file.We directly apply binary search on
data file.
Number of blocks need to search = Log2(number of blocks )
= Log2(15,00,000)
= 20.5165 = 21 block accesses.
3.Binary Search using Index file(dense):
In Index file we put Key value and Block pointer for each record.
Dense means for every key value one entry in index file.
So record size will be 9+8 =17Bytes
number of record entries in each block = 1024/17 = 60.23 = 60 record entrie per block
number of blocks in index file = 30,00,000/60 = 50000 blocks
We use binary search on this index file So number of block accesses is
= Log 2(50000) + 1 Here 1 for data file access.
= 16+1 = 17 block accesses.
Why Indexes is alway best:
Without index file we need to apply search on Data file here number of records
Block is very less . If we want to search on more records then we need to access
more blocks . It is costly.
If we have index file .Index file contain only key value and direct address of
That record address.So here number of records per block increase it reduces the number of block accesses.
Index file is already in sorted based on key values.
i.e. Indexes is always best.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.