Why is the cache hit rate for this column_major code 0% for both direct mapped a
ID: 3865618 • Letter: W
Question
Why is the cache hit rate for this column_major code 0%
for both direct mapped and fully associative?
################################################################
#
# Column-major order traversal of 16 x 16 array of words.
#
#
#
# To easily observe the column-oriented order, run the Memory Reference
# Visualization tool with its default settings over this program.
# You may, at the same time or separately, run the Data Cache Simulator
# over this program to observe caching performance. Compare the results
# with those of the row-major order traversal algorithm.
#
# The C/C++/Java-like equivalent of this MIPS program is:
# int size = 16;
# int[size][size] data;
# int value = 0;
# for (int col = 0; col < size; col++) {
# for (int row = 0; row < size; row++) }
# data[row][col] = value;
# value++;
# }
# }
#
# Note: Program is hard-wired for 16 x 16 matrix. If you want to change this,
# three statements need to be changed.
# 1. The array storage size declaration at "data:" needs to be changed from
# 256 (which is 16 * 16) to #columns * #rows.
# 2. The "li" to initialize $t0 needs to be changed to the new #rows.
# 3. The "li" to initialize $t1 needs to be changed to the new #columns.
#
.data
data: .word 0 : 256 # 16x16 matrix of words
.text
li $t0, 16 # $t0 = number of rows
li $t1, 16 # $t1 = number of columns
move $s0, $zero # $s0 = row counter
move $s1, $zero # $s1 = column counter
move $t2, $zero # $t2 = the value to be stored
# Each loop iteration will store incremented $t1 value into next element of matrix.
# Offset is calculated at each iteration. offset = 4 * (row*#cols+col)
# Note: no attempt is made to optimize runtime performance!
loop: mult $s0, $t1 # $s2 = row * #cols (two-instruction sequence)
mflo $s2 # move multiply result from lo register to $s2
add $s2, $s2, $s1 # $s2 += col counter
sll $s2, $s2, 2 # $s2 *= 4 (shift left 2 bits) for byte offset
sw $t2, data($s2) # store the value in matrix element
addi $t2, $t2, 1 # increment value to be stored
# Loop control: If we increment past bottom of column, reset row and increment column
# If we increment past the last column, we're finished.
addi $s0, $s0, 1 # increment row counter
bne $s0, $t0, loop # not at bottom of column so loop back
move $s0, $zero # reset row counter
addi $s1, $s1, 1 # increment column counter
bne $s1, $t1, loop # loop back if not at end of matrix (past the last column)
# We're finished traversing the matrix.
li $v0, 10 # system service 10 is exit
syscall # we are outta here.
Explanation / Answer
Hit Ratio: You want to increase as much as possible the likelihood of the cache containing the memory addresses that the processor wants. Otherwise, you lose much of the benefit of caching because there will be too many misses.
Direct Mapped Cache: The direct mapped cache is the simplest form of cache and the easiest to check for a hit. Since there is only one possible place that any memory location can be cached, there is nothing to search; the line either contains the memory information we are looking for, or it doesn't.
Unfortunately, the direct mapped cache also has the worst performance, because again there is only one place that any address can be stored. Let's look again at our 512 KB level 2 cache and 64 MB of system memory. As you recall this cache has 16,384 lines (assuming 32-byte cache lines) and so each one is shared by 4,096 memory addresses. In the absolute worst case, imagine that the processor needs 2 different addresses (call them X and Y) that both map to the same cache line, in alternating sequence (X, Y, X, Y). This could happen in a small loop if you were unlucky. The processor will load X from memory and store it in cache. Then it will look in the cache for Y, but Y uses the same cache line as X, so it won't be there. So Y is loaded from memory, and stored in the cache for future use. But then the processor requests X, and looks in the cache only to find Y. This conflict repeats over and over. The net result is that the hit ratio here is 0%. This is a worst case scenario, but in general the performance is worst for this type of mapping.
Fully Associative Cache: The fully associative cache has the best hit ratio because any line in the cache can hold any address that needs to be cached. This means the problem seen in the direct mapped cache disappears, because there is no dedicated single line that an address must use.
However (you knew it was coming), this cache suffers from problems involving searching the cache. If a given address can be stored in any of 16,384 lines, how do you know where it is? Even with specialized hardware to do the searching, a performance penalty is incurred. And this penalty occurs for all accesses to memory, whether a cache hit occurs or not, because it is part of searching the cache to determine a hit. In addition, more logic must be added to determine which of the various lines to use when a new entry must be added (usually some form of a "least recently used" algorithm is employed to decide which cache line to use next). All this overhead adds cost, complexity and execution time.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.