© Distribution of this video is restricted by its owner
Transcript ×
Auto highlight
Font-size
00:00 Yes, that shit, Um, the lecture. So, actually,

00:07 see if we can figure it out chats or raising hands or something as

00:16 . Us too. If anyone has had a computer architecture course.

00:34 one. Alright, that's OK. today I'll talk about memory system,

00:46 on which is a little bit, , something and probably repeat for those

00:53 you who have had it computer architecture since a lot off performance issues has

01:01 deal with the memory system not to performances when needs to understand the memory

01:09 . So that's why I'm going to today basically talking about it, that

01:15 kind of to frame levels from this . Cash is in the local memory

01:20 try Thio. I explained how they because that helps in trying to understand

01:28 you wish your code but do in to get good performance. And

01:34 the reason is that memory system tends be the limiting factor for so many

01:40 . So first caches and then talk main memory and the examine one you

01:47 , uh, doing this memory benchmark stream and groups. For those of

01:54 who actually did make them directly But figuring out the gigabytes per

02:00 you probably discovered that there was a of 10 or more difference in the

02:07 of the bandwidth you got in the cases. And the question is really

02:12 ? So I'll try toe give insight to why that is in this

02:16 The first cash is, um, terms and hopefully familiar to most of

02:21 , even though you haven't time. classes, hit and missile caches and

02:27 hit is when the process to try receive the street data Alright, data

02:35 to cash that is present there. nice one doesn't have to go the

02:42 memory to find the data on Mrs the opposite when things are up

02:48 So that means that you don't have older data from main memory into the

02:55 . And I'm just talking about this that it's the simplest context in which

02:59 is just one level of cash. , but it's all effectively the

03:05 even though there's more complicated between several of cash. But for now,

03:10 concept of hit and miss and penalties the second and the administrators simply just

03:20 minus the hit rate hit trade. , the frequency of instructions that actually

03:28 or data references that hits in Then the cash line concept has been

03:34 about. I did when I talked the processor architectures talked about cash.

03:39 and I talked about cash lines Isis about Leighton sees and banquets, and

03:45 try to shed some more light on these things are important. And there's

03:51 , Dr Johnson. Yeah? Did go back a sled? Um,

03:55 the miss penalty. So that does include the time it takes to determine

04:00 the hit or miss right? Its's the actions thereafter. Good point.

04:06 they'll be precise. Um, the penalties, yes, should be not

04:13 time taking to discover whether the item there or not. And I'll talk

04:18 little bit about that on several slides today. How I missed things actually

04:25 . Um, so there is kind an overhead, regardless, whether it's

04:31 or miss, that is related to out whether things air president or not

04:40 and for that overhead, I'm not how you actually measure. Ah,

04:53 hit and Miss right, you can through pocket, for instance.

05:00 so we'll see if it gets a bit clear. But police otherwise come

05:05 to the question, that girl. , thank you. So then there

05:11 the notion of locality that has talked lot about in terms off both architecture

05:16 and caches and codes. And there two kinds. There's one known as

05:22 , and one knows a special on is simply that data item. Mhm

05:32 used again, whether it's red or , but it's access the same.

05:38 them again before too long, so not precise exactly what short or long

05:46 . But the idea is that they pay too many instructions before it touched

05:55 again, and we'll talk about that some future lecture. There's something called

06:02 Use Distance that tells how many instructions goes on between of touching the same

06:07 and again, the special locality uh, locality. We respect toe

06:16 that are nearby. We'll speak in , so it's not the same right

06:22 again, but it's like, you , walking down the line, and

06:27 you touch things down the line, are next will be addressed soon,

06:32 if you jump around so that's spatial is what you have in the stream

06:37 when you had the fact that destroyed . So you go to the next

06:41 in the next instruction, whereas in you don't have special locality because you

06:47 around all over them. Yeah, . But, um, there are

06:55 or three types of mrs and non talks about compulsory capacity and conflict.

07:01 I'll talk a little bit more about not so much about the compulsory,

07:06 that is simply that data has to at least once loaded into cash because

07:16 in principle do not reach for things the first level of cash. So

07:25 though there is a little bit, , great area but compulsory simply,

07:31 has to be written at least once or written two main memory. So

07:37 no way around it that main memory to be accessed at least once for

07:42 item being used in the computation. that's the compulsory part, the capacity

07:48 that caches are small compared to main , and they're small because they're

07:55 Relative thio main memories designed and towards end of the lecture, hope to

08:00 to that. So cash is air , so there's no way Typically,

08:06 data set could fit in cash. that's why you're out of chapel cash

08:14 . Then when you load new data currently in cash, it basically overrides

08:20 is in cash. So that's the , miss that. There's no room

08:26 the data. And then, of , if you overwrite someday, I

08:30 to decide how to do it, , to deal with the over it

08:34 data in order to preserve correctness. that's a different story. So when

08:40 comes Thio what to do? Which of, um, cash line,

08:49 , to overwrite on. Then there a replacement policy in place that they

08:55 talk about. And then there's yet concept off conflict, Mrs. And

09:03 has to deal with where the cash from memory is being placed in the

09:12 . So I'll talk about the strategy for doing that as well as what's

09:19 as associative ity on the next few . Um, so this has to

09:26 with no the placement part. So does cash line, which is again

09:32 group off memory locations treated as one unit When your reference memory. So

09:40 does it end up in the So in the direct map cash,

09:45 only one place unique place police cash . Memories can go. It has

09:50 choice. Um, in the fully cache than the cash line from memory

09:58 go anywhere in the cash and then most common ones on the set.

10:06 associative caches. That means there are group cash locations toe, which the

10:17 line can be mapped. So it a few choices. And that's when

10:23 about the processors. Uh, if go back and look at this

10:27 you will find that most of them eight way or 16 way or four

10:34 associated caches. That means cork a cash knows okay, different locations in

10:41 cash to which particular cash line in can, um, go. So

10:51 said at the moment, So One extreme, uh, the direct

10:57 is the one way associated cash because only one place they can go

11:02 Fully associated cash is simply a que with a K is equal to the

11:07 of cash lines that can fit in cash. Um, so when it

11:15 to replacement strategies is associate for direct cash. It's really not choice.

11:22 it's not an issue to be Where is when you have fully associative

11:28 set associate caches? There is more one way, uh, cash line

11:34 go to. So then you have decide which cash line should be over

11:42 , or everything there is sometimes called means that you will have to make

11:47 that the content on that cash if it's not already updating their

11:51 needs to be updated in memory or level passions. So the commonest,

12:00 common studies yeah, I would is through the for sure hell are

12:04 ? So that means the things that been, um, touch the longest

12:11 ago, so least recently used is thing that is selected or eviction or

12:20 . Other ones that are not uncommon sort of randomly picked one of the

12:24 , regardless of whether it was recently or used quite some time ago,

12:31 the other ones and practice things are quite a simple and clean as

12:39 um, strategies, at least a . You tend tohave all kinds off

12:48 to try to improve program behavior, I'll talk a little bit more and

12:56 I'll stop for more questions. so that was kind of the loading

13:03 . Where does it go? And you have choices, which how do

13:08 choose where it goes? Then there's other side When you want to write

13:14 . Yeah, if in today all of multi core systems and at least

13:23 everyone, definitely caches or private to level to tends to be private to

13:31 level threes or not. But if cash line is saying level one if

13:36 write to it, that means if so happens that on that data is

13:43 residing in some other cash, then you write it to it in your

13:50 , then it becomes invalid and all other caches, or if somehow it

13:59 a modified in memory than whatever is caches are no longer value.

14:06 of course, conversely for a particular . If somebody else right in their

14:11 align that too old, then your becomes invalid. Order it so the

14:18 coherence mechanisms are trying to deal with to make sure that data, if

14:24 lives in several caches, are in . On that, everybody accesses correct

14:34 . We'll talk more about that when talk about an open MP, because

14:38 an issue, uh, in multi systems and sharing of cash lines.

14:47 huh. There's then policies associated with . Are you right to cash andare

14:56 that? This problem, more or , I will say equal frequent

15:03 It depends, in fact, on systems with the cash hierarchy. It's

15:09 necessarily the same policy for all levels the cash, but the basic approaches

15:17 what's known us right through. All back right through is, as one

15:22 of intuitively can think of. It when you write to cash, main

15:26 also gets updated. So that's why kind of written through two main

15:32 So that's kind of in nice and that everything is in sync. On

15:37 other hand, it can generate a of excess traffic domain memory, that

15:43 , it's very slow compared to the , so that's why it tends not

15:52 be used as much as the right , and the right back is only

16:00 . The cash line is only written main memory when it needs to be

16:05 or over written, so that saves lot off traffic. Two main

16:12 Of course, that causes a little of problem, because when you're right

16:18 Monticello, watch the bus. Other can watch the bus and figure out

16:23 happening, and they influences us to the cash line is invalid or not

16:31 the right back. Other cashes in , Principal don't know, except through

16:36 mechanisms. What has happened to Cashman's? There is even more to

16:43 story, and there's something known as Allocate No right allocate. So,

16:51 I think I mentioned, talked about . Most processes, in fact,

16:57 right allocate processes policy. So what means is, when you want to

17:03 to memory, then if the cash is not in that holds, the

17:10 is not present in the cash. first have to load that cash line

17:16 the cash updated in the cash and potentially write it back, depending on

17:23 it's a right through right back so that means in the right

17:30 there may be a potential extra load cash lines from memory. So that's

17:37 that search, for instance, stream there's no data reuse. So that's

17:43 cash line is not present when you the output from stream, so it

17:47 to go ahead and load. Cash and stream does not count for

17:54 So that's why when I look at Max Stream performance, it's not likely

18:03 ever yet to the peak memory bandwidth they're an extra load that is not

18:10 for. But there is, tricks around that to not,

18:20 many processes used for this get better for streaming tight applications. And that's

18:27 no one has non temporal stores that kind of listed under the no right

18:32 where you basically bypassed the cash. through some mechanism or other, you

18:41 have to load something in to cash modify it and write it back to

18:47 address memory. And by passed the iris, Dr Johnson, yes,

18:55 will that be used in a situation you're fairly certain that the overhead introduced

19:00 the cash is gonna be more time yeah, what's taking us if

19:05 So that's so when can find out the instructions that the compiler generates,

19:15 in the analysis that the compiler does your source goal that has figured out

19:20 it's safe to use is, bypass instructions instead of the typical instruction

19:28 accesses cash. Okay. And so has what they call a non temporal

19:37 . The arm processors have it, most processors today have it because off

19:42 need to get a good performance when is fairly clear and safe, the

19:51 the cash. So a little bit than about how the caches are kind

20:03 architected or designed nothing and try to just now a little bit about the

20:12 question that was asked before. so the structure off the memory address

20:23 , ISS has shown on this slide when you have a cash system that

20:28 much every computer today has. So a cash taka cash index and a

20:35 line awesome, and the cash line this perhaps the easiest to understand.

20:45 respect to the memory system there's a off memory locations that actually did atomic

20:52 but obviously you need Nah, all bits in the cash line so many

21:02 the processors today has in the Most of them are 64 bytes cash

21:07 . Some has won 28 bytes, if you use single position or four

21:13 or double this eight fights, so means there's a few words in each

21:18 line, so you need a way tell which one you are interested

21:22 So that's the cash line offset to out where you want to go within

21:29 cash line, the other to try explain the next service, likes what

21:34 do. Yeah, but the cash has to deal with figuring out where

21:46 the cash a particular cash line goals the cash stock. Then it's the

21:53 off the address that is used to out if the data item is somewhere

22:00 cash or not, and hopefully the several slide would give some,

22:07 at least pictorial understanding of harvest. kind of puts together, so here's

22:14 little bit more on the same So and the little image box on

22:22 left hand side. Libya pick below middle it shows that the cash line

22:29 this kind off one park. And there's a cash transfer. The cash

22:33 , a justice part off what's in center of the memory system to retrieve

22:38 cash line and then the cash A justice interpreted in these two

22:44 The attack and the index, where the index tells where in cash things

22:53 . So on the tags goes into known as a director. That is

22:56 thing, that research trying to figure if the data is in cashew

23:03 and then the indexes used to figure where it lives in the cash.

23:09 here is not a couple of slides try to explain on this mapping

23:18 and I'm afraid that Cash Line would using somewhat ambiguously in talking. Tried

23:26 keep it as the peace and refer . Use it for the piece of

23:33 or collection off memory locations in main memory that has the actual

23:42 And then sort of that content gets into the cash. And on this

23:47 , the notion of frame is used locations in the cash where in cash

23:52 from memory goals but it's sometimes hard keep them separates. Or sometimes cash

23:58 mean some the data in the frame the cash now. So here is

24:07 first illustration off for the associative that has in this case, the cash

24:17 hold eight cash lines. So on case, called frame locations here of

24:25 that is there different places for the lines. And since fully associative,

24:31 means anyone on the cash lines in can use any one of the frames

24:37 0 37 in the direct map And I'll show that on the next

24:43 how these things actually work up the . That cash is just anyone.

24:51 in memory can only use one off eight locations. We cannot be mapped

24:59 anyone of this seven, so that's other extreme from the fully to the

25:06 . Then they set their associative In this case is illustrated by having

25:12 sets on within each set. You two options, two frames, so

25:20 is kind of a two way, you like set associative cash, so

25:27 next slide shows and kind of a . And if it takes the cash

25:33 who doesn't address 15 and the simple . It can choose as I mentioned

25:39 or the eight location in the for associate cash in the direct map

25:45 It only has one choice on its figured out. Just taking the cash

25:53 address and doing it. Ahmad, size of the cash. So if

26:00 take 15 in this case, they line addressing main memory mob the number

26:05 locations. Eight. Then you get your seven. So that's it's only

26:12 and that figure to the set associative now. Do you dio sort of

26:23 the number of sets instead? And its set, you have three

26:29 So if you do ma before you the number three, so that means

26:33 goes to set three. But it's from this exercise determined which one of

26:39 locations and that set it's going to used. That comes to using the

26:46 policy to figure that out. now, um, one more comment

26:55 want to make on this slide and I'll take questions if their questions on

27:02 . But both in the direct map such associative caches, this restrictive placement

27:15 be quite a drawback if the code very regular regular memory access pattern.

27:31 my canonical example is fast for your for those who you are familiar with

27:39 , Um, the data access patent to be having strikes that have powers

27:45 to, And that means in direct cash. That tends to be that

27:55 , um, data request made, fact, map to the very same

27:58 line. So when you get poor cast utilization, you get that

28:03 better in the set associative. But could still be the case that,

28:10 the whole range O R sequence our accesses, ends up, come to

28:15 same sex, and that means very cash behavior. So I said that

28:21 take questions at this point. If anyone, I will do An example

28:31 summer. Huh? CNN the Was there something in the chat?

28:38 you tell me? No. I said there is nothing in the

28:42 . All right. So I guess doing the example, I will illustrate

28:49 these others interpretation works in the three . Direct map for the associative and

28:54 such associative. Um, so in direct map. We had these three

29:03 the tag, the index and the . And, um, when a

29:09 map, the index basically just tells what cash line the or what location

29:18 the cash the cash lamb is a to. So when you try to

29:22 out whether the data you want is in the cash or not, the

29:31 thing you have to do into inspect for that particular cash location and compare

29:41 to the tag associated with the data want to use. So it's a

29:45 simple comparison to figure out whether the is indeed present in the cash or

29:52 . And then the offset within the is pretty straightforward. So this is

30:00 direct cash now, so we It's very simple and it's inexpensive,

30:07 s already mentioned that has potential problems being able to benefit from the

30:18 Take the other extreme, and that's fully associative cache than there is

30:25 um, basically, location in the needs to be interpreted. So the

30:32 stock is all the bits, in addition to the bits used to

30:39 where in the cash on your Let's so then the picture looks more

30:44 this. What's supposed to happening? now, since the cash line can

30:48 any player in anywhere in the cash has Thio look up all the different

30:57 for all the frames or locations in cash for a cache line can go

31:06 figures any one of those happens to . Attack associate with a cache

31:11 You're trying to access data from all access to. So basically, you

31:18 to do the search for comparison or target tag would all the tax

31:26 So this is a lot more expensive in terms off, um, deciding

31:35 it's there or not. So that's time was, it's more cumbersome and

31:41 . So yes, aan den. guess the such associative cache is kind

31:48 a trade off between the fully associative offering some flexibility compared to the direct

31:56 direct map cash, but limit the that is required. Ah, compared

32:04 the four the associative cache. So the cash index is the number of

32:12 used to identify which set is I'm to be used to map the cash

32:23 in memory. So then the picture somewhat like this. Where can you

32:29 a tag? And the search for the cash line is present in the

32:34 or not is just searching the tags with the cash line or the frames

32:42 , except that is targeted by the index. So right, so any

32:55 on that? So it's just the offs and this overhead That's sort

33:07 Yeah, depending upon what the process designers decided to do and how they

33:15 the search mechanisms for figuring out how cycles it takes to determine what's in

33:22 or not. No thanks from, , a little bit of comments here

33:32 . What function do, too? to improve cash performance and I'll do

33:37 little bit. Walk through of an , um, are going through the

33:44 plus miss and turn off the of course, once cashes to profess

33:50 know I want to reduce the time the hit. I want Thio.

33:55 is miss rates. I want to his penalty all of these things,

34:01 how do you do it? And a trade off, so I think

34:03 the next cider trying to comment on few of those. So, of

34:08 , larger cache sizes that is beneficial many regards. Um, because that

34:16 more data is closer to the functional . So, you know reduces this

34:27 right? Well, not opportunities, chances, because there's more stuff in

34:33 cash. On the other hand, more the larger the cash is than

34:38 hit time increases, because if the process becomes no longer so, on

34:47 other hand, you can also similar associative ity that reduces the conflict,

34:55 , because there's more places a cache can go to. Um, but

35:02 , the search then increases to figure whether it's there or not. Another

35:09 is committed to increase the cash line , so you large you load the

35:15 chunk at that time. And so means the number off Mrs May decrease

35:27 you get more. So if you spatial locality. Yeah, clearly,

35:35 . One of the data that comes in a given cash line stretch on

35:42 other hand, for a fixed side . That means jewelry, cash lines

35:47 in the cash for them. It increase the conflict Mrs. And of

35:55 , since you loved the larger number data items, the Miss penalty is

36:01 figure one for Darch cash lines. that's why it tend to be that

36:07 have stayed for a very long not use in 64 by cash

36:13 And if you go back and though at I think there was the

36:19 party nine that actually used for uh, the levels of the cash

36:24 28 5 rights. This is a bit just of the trade off between

36:35 and distance. Um, not trade between l one and L two years

36:42 one tends to have stayed rather even though you can more gap

36:48 Um, a, uh, cash caches of largest sites on the

36:56 the level once tends to have stayed they are insights, and that is

37:02 speed reasons that by being small insights are close to the logic units or

37:09 distance, do it on distance means in this case and come to that

37:17 towards the end of today's lecture, that matters not quite speed of light

37:22 related, so small means tend to fast. And so that's why

37:30 as, um, chapter small and they have to do, is not

37:36 far away. And the penalty for an L one is modest. Then

37:42 speed of Elbaneh spin kind of driving . Yeah, and the other things

37:49 be aware that there are moves between levels of cash unless you have bypassed

37:57 , which is exist. But it's the normal motor business. Okay,

38:06 to the example. Any questions? . All right. So I'll try

38:18 work through an example how the cash l l are You kind of

38:27 and I think it is useful to how the cash is work. So

38:32 this case, that once is not . That a 16 bit or to

38:39 words Eso This is kind of a . Addressable little system that has 256

38:49 and idealistic. Okay, I That means things are instead of k

38:56 forest off 1000. Okay, I , um, sort of marginal 10

39:05 . So, in this case, 56 k, uh, works in

39:10 memory, then it has a four work cash, and it sets associative

39:18 for cash lines per set, and cash line size is 64. Works

39:27 My Word Again was two bites. Means is 1 28 by cash line

39:32 this little example. And then we're to do an exercise to figure out

39:37 benefit of the cash. By assuming the cash is 10 times faster than

39:46 . That's a way underestimate its in more like a factor of 100

39:54 But now the numbers becomes smaller and works for this illustration. Purposes.

40:00 in this case is kind of the does the loop loops 15 times and

40:08 each low goals and get 43 52 from successful memory location. So it's

40:16 of simple, like Houston Street. E have little A B's and C's

40:26 going through this example. So first to try to understand how the address

40:35 going to be petitioned. So we'll if this can work by question and

40:44 , uh so hopefully suggest my see or you guys can just pick

40:51 So first question is so how maney bits do we need and then I'm

41:01 six a to 56 k. Is that just a memory? I

41:07 , memory adjustment right to 56. , it is five. And

41:14 How many bits is the address to able to address the 2 56

41:21 So how many kids? It is . That so 16 words?

41:32 So the address eso 1600 to That is what you're saying will be

41:41 , it would be divided, uh, that much boards could fit

41:45 that Hominy grits. Mhm. So there is 2 56 K items

41:57 be retreated, right each 16 so it doesn't tell. You know

42:03 many bits there are in memory, you need to be able to distinguish

42:08 256 k addresses. What works? should be the log of 2.

42:19 times 1000 24. Right? Right based too. Yeah, I guess

42:28 . So that's 18. So 56 is to to the pyre eight

42:35 the K. I is 10. wasn't. There's 18 bits. So

42:44 so The next question then we needed offset field in terms of the

42:52 So how many bits do we need find it worked within the cash

43:03 Okay. How many choices are there the cash line? 256. Not

43:32 the cache line. I'm sorry. , four from right. So we

43:40 need to bits if we have four . Cash line has 64 words.

43:49 sorry. Yes, it's confusing. why I was going through this

43:53 That, uh then there are is few different things to keep track

43:57 One is the number off memory addresses we went through. Figured out.

44:03 two to the 18 different memory addresses we need to keep track called than

44:11 cash line has 64 words. So need to be able to distinguish between

44:16 items, So Okay, it should logged based to 64 this time.

44:23 , Correct. So that's two to six. Sorry. I'm so used

44:28 using these things. I'm always thinking things that forest of six parts of

44:32 . So I have them easily in head. So then the so that's

44:40 of the whole address Field is 18 on. Now we have one part

44:46 the address field that is the There is six bits, then the

44:50 thing we're going to try to figure is how many bits do we need

44:57 figure out which set on cash? is going to be map into.

45:08 how many sets did they have? , it wasn't stated, So we

45:22 to figure out. I mean, it's order. And how do we

45:26 out how many sets there are? , we do have some information,

45:36 we know that one cash line Go . Yeah, forecast length or said

45:44 64. Cashman's so 64 divided by . We start that many sets would

45:51 there because four cash line occupies. said yes. So say that

46:03 So how many words is in one , how many food try it is

46:17 14 organized the status with you. each set holes for cash lines,

46:37 of which is 64 words. so set in fact, has four

46:46 64 to 2 56 works. So set size is 2 56 works.

47:04 now the total number of words in cash is tour times 10 24 which

47:14 40 96. So 40 96 is total on each set. Your car

47:21 2 56. So that means, fact, that there are 16 sets

47:31 is 40 96 divided by 2 So that means sixteens, that's that's

47:43 bits. So there is a little what the picture is. So you

47:51 , everyone is with me how this exercise work. Yeah, whenever we

47:57 word, that's that's the unit that cannot break down any further,

48:02 Yes. Okay, that's yeah, this context and most words in most

48:15 , actually, it's the so like position. The word would be 32

48:25 . It doesn't mean that the computer could be byte addressable a bit

48:30 but And this a simple example, the smallest adjustable unit. All

48:40 so now on the same part, to do it with me, no

48:46 . And then we're going to figure how this cash works with the

48:53 So no cash is very simple. that memory access stage 10 times as

49:00 as the cash access or no cash every references. A main memory reference

49:08 we had 15 integrations and each integration for 3 52 accesses. And so

49:16 just multiply the numbers together and we something like 652 800 No. 66

49:24 to 800. Ah, times t that time unit is. So that's

49:33 and not so interesting. So trying to figure out how they said

49:36 you, it actually works. So the issue is that the cash could

49:48 64 or the cash line size was . Um, and it's the figure

50:00 and they go through these 40 to references. You know, a cache

50:07 at the time. That means that have to retrieve 68 cash lines to

50:20 or loads all the 43 52 And now there was, Ah,

50:33 cash. So this reads wrongly. sorry. So there's, um But

50:46 slides are consistent and not matching the in the previous slides. And now

50:50 using cash lines and earlier was 64 cash line size in bytes. It's

50:58 68 of this, and now I'm going to use for this exercise that

51:04 of a cache line size, UM, 64 items and not worry

51:13 much about how this works out. see you on the next slide.

51:21 make it more clear. What does four sets and I guess, 64

51:32 , 16 Dean sets with four cash , purse set and 64 words for

51:46 line. So the picture on the boat getting straight? Sorry, I

51:50 myself confused. So the time for fetches from cash it's is very

51:57 It hit in the cash it, the cash access time is. And

52:04 , the number off access is for and then the number of iterations.

52:10 it's just simple product. Then the issue is figuring out how many missus

52:17 goingto happen when the cash is It doesn't fifth or hold all the

52:27 , so we'll figure these things So in this case, the Miss

52:33 is again the block size. In case or cash line, size actually

52:39 , Just get blocked is very often used a synonym for cash line.

52:46 basically, the cash line size 64 the memory access time, which was

52:54 . So that's dependent for each myth . Then we need to figure out

52:58 number of Mrs and this is what's on the next few slides here.

53:07 basically, first you get this cold compulsory, Mrs, because there's nothing

53:13 the cash. When you start, have to read everything. And so

53:17 this case, we had the 16 that we figured out when each,

53:25 set had four, um, cash in each catch line had I

53:34 um, the 64 bytes. So know, as you read this

53:48 So we had 64 loading off cash . So what you see in black

54:04 and start to load the first cash on you, go into set

54:10 and then next cash wrangles into set . And after you have read 16

54:19 lines, you have put one cash into each one of the 16

54:25 and then you use kind of the location or frame in a given

54:31 So then the subsequent trash lines that into, um, the second

54:39 so to speak, in each one the cash sets, and it's all

54:46 fine until your gap to run And you have basically filled up thing

54:52 the slots in the cash. So you read the 65th item or item

55:00 64 then you have tow overwrites So in this case, what's the

55:08 like you. So you overwrite, , cash line zero. And then

55:15 over canceling one. And so you have four. Mrs. When you

55:24 the first integration of the loop off 15 and the next time around,

55:33 again. So now you don't any have the cash line zero in the

55:42 because that was over written towards the of the first iteration. So now

55:48 need to reload. Cash line And now again with you.

55:57 policy than you replace cash line You go to basically sort of the

56:05 slot, in a sense. So you end up over to get cash

56:12 . 0123 over. Right, the slot in the first for sets,

56:20 your good again for a while until need to load cash line number

56:28 Well, 16 was, in over written, um, now by

56:36 line zero. So now you need load cash Line 16. And with

56:41 l. A. Your policy it ice Cash lane 32. So now

56:47 get the sequence off form or Mrs 16 through 19 work overrated. And

56:56 you're OK again for 20 to 31 and then they miss again.

57:04 in this case, uh, in , you miss most five off the

57:13 line loads in this iteration. You the first four on. Then you

57:19 another four because, um, these set before was over Written because of

57:29 l. A. You keep doing . And then again, like in

57:33 first iteration, yeah, again have , uh, find a place for

57:42 Line 64 through 67. And then again goes bardella your policy. So

57:50 it again over rice. What is the second, um, slopped in

57:57 first four set. So this basically this behavior because of it, Unless

58:05 possible behavior. So in the you end up figuring out that even

58:11 you had it all up in terms the number off its and this is

58:16 yes, things improved, but even you were only four cash lines short

58:22 this simple example, and the cash 10 times faster. Ah, they

58:28 improve the performance with more than a , too, because the expense off

58:32 line, this is and the replacement . So any questions on that other

58:40 will talk about main memory. So was what I had plan to talk

58:45 in terms of cash in and illustration how things works. So move

59:03 then. So a little bit about is it's a bit old slide and

59:09 haven't found any good, you know data. But I'm sure things have

59:13 Bean more relaxed than it was at time. This line was done in

59:19 of importance of speeds of Amazon claims 100 million second costume, 1% in

59:25 in terms off DeLay or leighton see when at the time of half a

59:31 cost from 20% in, uh, and brokers stock crater. You guys

59:43 that making custom, you know, million for milliseconds, uh,

59:50 So in some businesses, delays are expensive. This is just a

59:58 Oh, it's like that. I somewhere exactly from John Mark, helping

60:03 basically pointing out again the gap in between CPUs and main memory. And

60:11 try to just love it both. , that is and how it's trying

60:18 be addressed to keep. They got small as possible. So in Terms

60:28 , Bandit talked about it before. is that processor is over. Time

60:34 gotten more and more memory channels. basically the band with two main memory

60:42 increased by adding sort of parallelism in off memory accesses. Another big part

60:52 tends to be slow relative to is how things are packaged and

61:00 and I will talk about that as in the next several slides. So

61:07 use has had an advantage compared to server processors because they have had,

61:12 , or a wider paths to memory use different kind of packaging known as

61:20 . DVD are supposed to the other members, and I'll talk more about

61:25 later. And more recently, one again tried to increase their which of

61:32 data passed to memory by using these called high bandwidth memories, and I

61:37 talking about that so focus a little on service, but also point out

61:43 , too, other types of computing and then talk about the memory

61:48 So first, several pieces and the things one talks about and and the

61:54 that is based computers, I'm knows about Davis. And that's what's

61:59 as dual in line memory modules. that's basically packaging on memory ships

62:06 In order to, um, I would say the memory bus Switz

62:14 , most memory Busses or 64 bits were his memory chips. Rarely on

62:21 has output 64 bit, so they much fewer, typically for eight or

62:27 but sometimes 32 bits. So that you need a number of memory chips

62:31 order to map match the width of memory channels. And what's on this

62:37 from top to bottom is different and the current generation is known as

62:43 DDR Four memories, and I'll talk about uh, you see these kind

62:49 dims and it is just a little board with a bunch of chips on

62:55 that that makes up remember and come into how these things are structured on

63:01 little circuit board and a Securitas Perhaps more than anything else that in

63:08 not to plug in the right wrong of memory into your memory slot,

63:13 is a little notch that make things sit in a property. Yeah,

63:20 is also a concept that is important understand in terms of these names that

63:25 something known as ranks and the rank the collection. Off memory chips are

63:34 matches the weight off the memory as I said today today to pretty

63:41 , or 64 bit slight. So rank is a collection of chip memberships

63:48 ends up being able thio except or 64 bits at the time. In

63:56 of data, these chips can be on one side off the circuit board

64:04 it can be mounted. Uh together, sort of big on one

64:10 . So it's not, uh the on one side is not sufficient to

64:15 memory, but so there get some them also on the back side,

64:19 to speak off this, um, circuit board. So that's coming.

64:25 why I said double sided card. the rank covers both sides are supposed

64:30 single side of the car. So though they have chips on both sides

64:35 the card, it doesn't mean that a single rank because the chips on

64:40 side may be fully capable of matching memory about sweat. So in that

64:46 , it becomes a double side, rank or to rank memory module.

64:54 there's also these dims that basically has of these ranks on the same side

65:00 the car. So this tells a bit of these things works. The

65:06 of these ranks is that it's partially packaging part, but any only one

65:14 at the time can talk to the bus. So when you have several

65:21 and given them, you need to which one off these ranks they want

65:29 communicate with. So there's selection of that is necessary known as Chip

65:38 How many chips you need depends on width off each one of these chips

65:42 the witness, the number of bits outputs and as I mentioned, typically

65:46 through 16 is are the most And that's why you know, takes

65:55 chips that are its wind to make for 64 bits work, So I

66:01 I'll look on this slide at the . You can see 1234567 and it's

66:11 or nine. Actually, um, a big thing in the middle about

66:15 there's four equals for equal size pieces silicon, effectively on the right and

66:23 on the left. So that is likely something that is a using X

66:30 memory chips s O. That means . And that also means this theme

66:37 air correcting code that uses eight So 64 plus eight that's gives you

66:44 or nine ships. Um, there a little bit off how this is

66:50 used in configuring your computer systems, typically the number of bits it is

66:59 on and the memory chip is now to the width of the chip.

67:08 all right, like in this to get a bit chip, maybe

67:12 four or times eight. And that you how many gigabytes of memory your

67:21 holds. Because if you have times memory chips, then for 64 bit

67:29 , then you need 16 of them match the bus. And if it's

67:33 gigabit per chip, then you get for gigabytes. On the other

67:38 if you use time 16 chips, only need the quarters, many

67:44 So for the same size membership, only get the quarter, the amount

67:49 memory. So these are things that chooses among women configures, uh,

67:54 on here is a little bit more things. May looks as a picture

68:00 a circuit board and shows basically two and the world socket thing and is

68:07 blue things that a framed in red are where these deems goes on the

68:13 . Boards on this case each on two CPUs can host four games.

68:21 just so happens that are placed two each. I decide on the CPS

68:28 issues. So how you configure these ? I don't go through quickly s

68:34 , not something that you do when write your code. But if you

68:38 a computer system, there's something one be aware of how things works.

68:44 these are examples that with a different of members channels for, um,

68:50 . So on the top is basically memory channels and it shows what's really

68:55 logic. Can either have two or all these names. We're memory

69:01 but there's also cases where you can have one on. There is more

69:09 to this story because, as it to illustrate on the top role here

69:17 when you have to on these so that's that's their speed is like

69:24 33 megahertz. Where is on the hand side? Where you have three

69:30 for channel is is 800 megahertz That's a reflection of the fact that

69:36 more games you put on a memory , the harder it gets to service

69:43 . So it tends to be that more names to put on the channel

69:49 even the more ranks you put a channel, um, the lower the

69:56 rate ends up being so here's another from HP just showing different configuration that

70:03 again, that they tend to um, the our memory you can

70:09 three per channel and and total 18 from this is what they call maximum

70:16 and maximum number of them's. And the bottom you have something where speed

70:21 more important But it also limits the of, uh, deanship companion.

70:28 here is just another example that shows at typically the number of ranks it

70:34 support. Per channel is eight, it doesn't matter how the ranks are

70:40 among dips, and that has in to do with the address ing

70:46 which rank you want. So basically ranks is three bits. So there's

70:55 I pretty much said already, so here is a little bit more

71:02 a few pictures showing you on the , and these kids, they will

71:06 dims, and so is, And how this, uh, selection

71:14 um ranks is being carried out and come. There's just a few fairly

71:22 illustration how these things works and a of the number of memories channels on

71:27 of these processors out there. And one more important aspect in trying to

71:35 how things actually work. Um, it comes to the main memory

71:40 and that is, that depends a . There's more than one memory

71:46 and it's also typical that memory controller mawr than one memory channels on these

71:58 . Uh, independent case on the where there's a memory controller from Memory

72:02 , which is not typical, it's typical that the memory controllers serves to

72:12 three memory controllers. Something dental has two memory controllers for their six memory

72:21 , and I'll come back to that a little bit. This is kind

72:24 the Numa aspect that I talked about in terms of the processes in the

72:29 socket Andi that why some, memory is a bit further away because

72:38 order to get to another sockets even though the address space is known

72:43 this shared memory, in that it's not equal distance in time or

72:51 . So here is kind of the was referring to that I was coming

72:55 in terms off what the poor memory or if it's just a single

73:01 needs to deal with. So this basically a number. Of course,

73:06 has a number of friends running, that needs to access parts of memory

73:14 that they all send their request to memory controller. That man needs to

73:19 and all figure out on which channel this data located, then, on

73:26 channel on which rank is it and then within that ranch where it

73:33 located. And that comes to something bankroll and columns that is, to

73:39 with her. Actually, memory chips designed, so there's a whole lot

73:44 selection that needs to be addressed. then the memory controller also need to

73:48 about laced in seas and outstanding request keeping track all what belongs to whom

73:55 things comes back from memory. Um, all right, well,

74:02 when you show this is just another of dealing with memory that is more

74:07 embedded in mobile systems that you don't this slots for modular memories like them

74:17 to plug into slots. In take the memory chips and baskets older

74:21 onto, um, the circuit So here's the white boxes are infected

74:28 chips that are sold it on to board in there. China. Bigger

74:33 are processes other things that is not used to get tight. Bandwidth is

74:39 take memberships, stack them on top each other, and then you get

74:46 communication. Pastor gets data from the chips they're stacked on to each

74:51 Uh, no through something known as silicon the tortillas lease. So this

74:58 a relatively expensive way of doing And technology hasn't been affordable lunch in

75:05 last couple of years, but it's being used for high end processors,

75:10 particular for GP use. And here a little bit off soldier girl back

75:18 look at process of lecture slide for use. You will get some of

75:23 data for how I wanted terrorist for h p. M. I will

75:29 to cover couple of more things in last few minutes here and now,

75:36 guess follow up next lecture to finish up. But memory is using the

75:43 technology as the processors are. Everything today is used Thing was known.

75:49 most of our complementary metal oxide There's two different designs one for dynamic

75:58 access memory D ram and one forest random access memory, which is

76:04 Um And so here, um, the thing that for density industrial main

76:11 type of memory cells and S rahm or cash type memory cells. So

76:19 are the Iran cells, essentially one cell and, uh, SDRAM is

76:27 six transistors, so that means it's larger, then the ram. So

76:34 means it's not as dense part of reason why it's more expensive. The

76:40 one, since it's also used for , are you one Speed is also

76:47 for speed that Zo other design constraints also may accept more expensive. We

76:53 there, perhaps more well, one properly is that the there,

77:03 it actually forgets, which is the , so it doesn't keep the information

77:09 long. So that's why it, order for the Iran to retain

77:17 it needs to go through what's known a refresh cycle and let me see

77:22 my next time was supposed to So So this is the way things

77:30 being laid out. So each one these cross points its effect kind of

77:34 memory location in the Iran. So organized as the Matrix. So you

77:42 both roles and column addressed off, an item. But my Tom is

77:49 absolute. I would like to start try Thio, cover that and try

77:54 address, and this is part of reason why the Iran or is so

78:03 I'll try to explain that next and then I'll take questions.

78:15 you're welcome to obviously ask questions next when I continue talking about memory.

78:25 huh. Okay. If you no this time, then my little stop

5999:59

-
+