Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

1. There are 4 caches with the organization and block size as indicated below. A

ID: 3778651 • Letter: 1

Question

1.   There are 4 caches with the organization and block size as indicated below. All caches hold 128 words where each word is 4 bytes. Assuming a 32-bit address.

a. A direct-mapped cache with block size of 16 words

b.   2-way set-associative cache with block size of 8 words

c. 4-way set-associative cache with block size of 4 words

d.   A fully associative cache with block size of 32 words.

Complete the table for each cache.

Cache a

Cache b

Cache c

Cache d

total # bits need for word + byte

displacement

32

   32

    32

32

# bits needed for index

   4

2

   2

7

# bits needed for tag

   22

25

26

18

# bits (total) per set (including valid

and dirty bits)

56

59

70

52

2.   Here is a series of addresses in hexadecimal:

20(w), 3C(r), 10(r), 16(w), 20(r), 04(w), 28(r), 6(r), 10(w), 17(w)

Assume a LRU replacement algorithm. For the four caches in problem 1, draw each cache as it would appear at the end of the series of references. Include the valid bit, dirty bit, and tag. Show the contents of the memory block using the byte address range such as M[20-23] for the word with address 22.

I already completed question 1. Just need help with question 2.

Cache a

Cache b

Cache c

Cache d

total # bits need for word + byte

displacement

32

   32

    32

32

# bits needed for index

   4

2

   2

7

# bits needed for tag

   22

25

26

18

# bits (total) per set (including valid

and dirty bits)

56

59

70

52

Explanation / Answer

2.

The key to the problem is to note that two addresses that map to the same set in a set-associative cache may map to a different block in a direct-mapped cache. A simple example is with a four-word cache and 1 word blocks. The example sequence is addresses 0, 2, 4, 0, 2. In the direct-mapped cache, 0 and 4 map to block 0, while 2 maps to block 2. Accesses to 0 and 4 miss because they conflict in block 0, but the second access to 2 hits. The hit rate is 1/5

With a 2-way set-associative cache, all three address map to the first set. Thus after the first two misses, 4 kicks out 0, 0 kicks out 2, and 2 kicks out 4. The hit rate is 0/5.

Memory-stall clock cycles = Instructions/ Program * Misses/Instruction * Miss penalty

Misses/Instruction = Instruction miss rate + (Data miss rate* Data references/Instruction)

Miss penalty = 6 + Block size in words

Data references/Instruction = 50%

Cache 1: Block size = 1 word

Miss penalty = 6+1 =7 cycles

Instruction miss rate = 4%

Data miss rate = 8%

CPIstall

= (Instruction miss rate + (Data miss rate* Data references/Instruction) * Miss penalty

= (4% + (8%*50%)) * 7 = 0.56

CPI = CPIstall + CPIperfect = 2.0

CPIperfect = 2.0 - 0.56 = 1.44

Cache 2:

Block size = 4 words

Miss penalty = 6+4 = 10 cycles

Instruction miss rate = 2%

Data miss rate = 5%

CPIstall

= (Instruction miss rate + (Data miss rate* Data references/Instruction) * Miss penalty

= (2% + (5%*50%)) * 10 = 0.45

Cache 3:

Block size = 4 words

Miss penalty = 6+4 = 10 cycles

Instruction miss rate = 2%

Data miss rate = 4%

CPIstall

= (Instruction miss rate + (Data miss rate* Data references/Instruction) * Miss penalty

= (2% + (4%*50%)) * 10 = 0.4

So cache 1 spends the most cycles on cache misses.

NOTE:      A direct- mapped cache

64 Kbytes = 16 K words = 2^14 words = 2^14 blocks

block size = 4 bytes => offset size = 2 bits,

#sets = #blocks = 2^14 => index size = 14 bits

tag size = address size - index size - offset size = 32 - 14 - 2 = 16 bits

bits/block = data bits + tag bits + valid bit = 32 + 16 + 1 = 49

bits in cache = #blocks x bits/block = 2^14 x 49 = 98 Kbytes

for a direct- mapped cache with 64 KBytes of data and 8 word blocks, assuming a 32-bit address

64 Kbytes = 2^14 words = (2^14)/8 = 2^11 blocks

block size = 32 bytes => offset size = 5 bits,

#sets = #blocks = 2^11 => index size = 11 bits

tag size = address size - index size - offset size = 32 - 11 - 5 = 16 bits

bits/block = data bits + tag bits + valid bit = 8x32 + 16 + 1 = 273 bits

bits in cache = #blocks x bits/block = 2^11 x 273 = 68.25 Kbytes

4-way set associative cache

block size and #blocks does not change

#sets = #blocks/4 = (2^14)/4 = 2^12 => index size = 12 bits

tag size = address size - index size - offset = 32 - 12 - 2 = 18 bits

bits/block = data bits + tag bits + valid bit = 32 + 18 + 1 = 51

bits in cache = #blocks x bits/block = 2^14 x 51 = 102 Kbytes