© Distribution of this video is restricted by its owner
Transcript ×
Auto highlight
Font-size
00:00 Uh huh. Okay. So I continue talking about tasks. Open it

00:16 and then I'll talk about the concept facility I guess before I actually start

00:26 lecture if there's any comments, questions you so yes I want to make

00:33 student stuff. I think there's anything . Uh start so I started to

00:48 about these concepts of tasks that open support since version 3.0 or open up

00:55 standard uh talked about construct itself and scope of uh variables and what I

01:08 to call inheritance rules so what but get propagated kind of downward in the

01:20 . Today I'll talk about a little more about synchronization uh and then I'll

01:26 with the other patterns. So with the barrier as well as the task

01:35 uh that is again is synchronization. it's kind of the opposite of the

01:40 wait for parallel regions in that case is a way for making sure that

01:47 again uh synchronized before things moves up gave examples last time. So today

01:56 this notion of test group that is way of having tests synchronized. So

02:03 is simply what it's like to declare a test group and then there is

02:11 potential collection of test uh and then being a task group construct that means

02:23 code segment to which the test price means the test generating within that

02:33 all get synchronized before things moved So at the end of this test

02:40 all the task will be completed and task book two seats before all the

02:47 are complete. Um So that I was just construct of this not too

03:02 from the barrier but it device them potentially you can hire a test being

03:11 . Um It was one example and forgot that. Uh So this is

03:23 just another example of again that uh group, the supply to segment of

03:30 code that was similar to what's shown the previous, it's like but not

03:35 to all tasked. So there's some generated before this test group is

03:46 So there are lots of different ways controlling what gets done in parallel and

03:56 to again coordinate to synchronize tasks. it's quite rich and it's richer than

04:03 using the fork, join other parallel and stole the about the test.

04:09 of course it also gets more complicated keep track of the logic that it's

04:16 so there's also a workshop in construct is simply task loop and it's pretty

04:24 the first instance act as a parallel construct for regions but it has more

04:33 in how uh the integration range is up long tests and I think I

04:45 samples on the carbon slides, I'm to kill me. So uh and

04:56 . Mm So yes, so I simplify this. No shadow grande sized

05:03 they number of tests generating. So just a little bit more flexibility again

05:10 how to oh, divide up the situation domain. Mhm. Um And

05:20 yes. Okay. It wasn't on side, there will be on the

05:24 flight but disaster can also have classes The kind of four construct and have

05:34 in terms of the variables are then between the task and the just generating

05:45 that but it's of course that Tesco . Mhm. So I will talk

05:55 think hopefully in the next few session these various entities about two uh manage

06:04 . Mhm. Uh So this is the grain size. So in this

06:18 there's a loop branch and it says the minimum a number of iterations in

06:23 book that gets assigned to test is . So that's the notion of the

06:31 size and I think maybe a strip but the grain size then can be

06:39 and it has arranged between the minimum but no more than twice a minute

06:45 specified. So that's the only thing about this particular cold segment. Just

06:51 illustrate the grain size class can be for the task. Look Yeah.

07:03 next scheduling and this is where I there's a lot more flexible than in

07:11 style of regions. I'll try to each one of these scheduling options.

07:24 . In the next few sites. . Yeah. So Yeah.

07:34 So at first I guess I should which says on the slide that

07:40 tasks can be executed in any order I think was just the last

07:49 but it's also that was tried to last time is that the tasks um

07:58 be deferred. So there is no um Thailand when the particular task and

08:10 started to get executed says up to runtime system and the operating system to

08:16 out, went to take on tasks which wants to take off. So

08:24 can be as it says on top , that could be executed immediately,

08:31 there's no guarantee that that's the case they can be deferred, which means

08:36 essentially that task, it basically placed a pool and then threads whenever they

08:42 to or rather the ransom system decides time to do something about the task

08:48 threats goes and to take tests of from the pool of tasks to be

09:00 . Yeah, it also is the that it's not guaranteed that a threat

09:13 do all the work for a particular . So unless one kind of controls

09:23 , the runtime system, a switch task between threats. Okay, So

09:33 that's what sent towards the bottom of slide, that one can fly them

09:40 test to make sure that it's the said, it's used to complete the

09:48 , will do the entire work for task or again, can be untied

09:54 the opposite. So that's another thing it is different compared to dealing with

10:02 regions and on as tasks. Um yes, I guess just some cautions

10:14 john dealing with tasks because of the and what might happen. So,

10:21 this is, I don't think if remember correctly in the assignments, the

10:29 are not doing too much good is that correct? Yes.

10:36 So this is just for you to aware of and maybe it's something

10:42 if you do a project for the of the course, uh, that

10:48 open mp at that point, there be some of these concepts that they

10:52 to explore and so that's why and an important part of open mp.

10:57 I I just want you to be of this test construct. Yeah.

11:09 , so this is an if clause then if causes similar to what we

11:14 an example for private regions that one the knife claws, that person

11:20 well, if there is enough iterations the look for instance, then find

11:28 his look. But if the, only a few iteration is probably not

11:34 the overhead of paralyzing the use and a parallel region and having third steel

11:41 it and similar we can do with f course that one can control with

11:46 test should be generated enough basically, sunday, whether the of course is

11:53 or false. Yeah. And here again, in terms of the scheduling

12:03 that as seven mentioned before, there be deferred or not and you can

12:15 have this this inclusion option that it in that case handle, buddy fred

12:28 creates the task. So um I that's enough commentary on this point.

12:43 wow. And there's here's where I it gets more interesting in time for

12:51 flexibility. The test, this even the next construct is dependent

13:00 So this is the way in the . He'll simply basically telling the frontal

13:07 again that the task effectively be suspended favor of another task. Yeah.

13:15 huh. So depending on what the of your code is, sort of

13:20 task maybe uh that's critical in some than some other tests. So one

13:30 want in that case that threads gets to dealing with another task. In

13:38 it again get, in a a higher priority get executed. And

13:48 is just called example when the task is used for particular function. And

14:02 I will talk about the next construct then I'll stop and see also.

14:07 Russia's and the comments of the task . So um well, the results

14:15 priority call. Sorry, I forgot that. Probably this again, it's

14:20 advisory in some sense, it's a expression of importance but it doesn't the

14:32 system doesn't necessarily obey with the desires it basically then gets a priority and

14:40 upon the resources available at that given . Um then it's most likely we'll

14:47 to comply with priority given through their , but like with the number of

14:54 requested as we talked about several It's hence suggestions to the random

15:02 what the program I believe it's a thing to do. And then there

15:10 just depend force that I mentioned that one can specifying or tests are dependent

15:20 other tasks. So that means attacks not proceed until the dependent the task

15:29 which is dependent has been completed. it's a more flexible mechanisms to order

15:39 . Then the 4th joint model in the standard parallel region. So one

15:45 basically build a dependence tree not as more flexible through the task yield and

15:53 depends crosses And I think there's some in this case that best is

16:06 Again, what tasks are depending on honest This case task one that the

16:15 text than is also serving as input the other tasks. So in that

16:23 my eyes built sort of the logic or how the tests are supposed to

16:29 . Mm So they depending depend out there can be a list of

16:37 So it doesn't need to be just single one and I think that was

16:49 correct. So there is, it's third example that just to look at

16:56 case you get to the point where want today experiment the task and again

17:02 dependence graphs. Oh, a test then of the task and the task

17:10 , arguments are good uh things to around. Yes, so I'll stop

17:22 a minute here and see if. you should want to comment on more

17:26 the task constructs and not to manage or not really. Nothing new.

17:45 , so I think I'm going to these examples for the moment and I'm

17:54 come back to it. Uh the of the lecture depending upon how quickly

18:01 go, because I want to talk let's not ask thread affinity and that

18:10 an important concept, whether one use or not. So I went through

18:17 bunch of slides here, get the part now and I will start an

18:27 . So I think once I've talked thread affinity, I hope I've answered

18:35 question that think was asked in the first lecture about harman controls, what

18:46 do. And in part we talked using the Freddie Freddie, I'd two

18:55 of control what threat ends up doing works. But as I also

19:03 Yeah, probably the most critical thing the relationship between threads and where the

19:12 lives that they operate the ball And well as how threads are allocated.

19:22 course depending upon whether the application is bound or are you or memory

19:34 And that was part of the early of the lectures and assignments too,

19:40 this. Matrix multiply that is usually limited and matrix vector and stream and

19:47 of the others that are memory so then threads allocation becomes important,

19:58 this is not what I'm going to into. So I have first to

20:06 example again, the first one is matrix multiplication again. So yes,

20:18 Way one can do things as usual known as scattered or spread it,

20:23 a couple of different names, that the same thing. So vocabulary is

20:29 kind of unified, but it means the spreading things out, whether it's

20:35 scatter and spread and compact, sometimes called close, that means I'm trying

20:42 kind of practice things and make threads as close to each other as

20:49 So now in this maintenance multiplication example was for investment single server in this

20:58 , there were two versions of one is the old example, so

21:04 was only four course in one case six core and the other and then

21:12 a couple of graphs, then then um and the pretty much totally straight

21:21 is compute rates and measured in terms gigabytes per second for this server and

21:28 is a blue line that is for scattered allocation and the red line,

21:33 is for the compact dedication. So the scatter, basically all course Were

21:40 over four threads and in this case courts, so Basically each core got

21:49 threat and that then got a much performance as we can see even you

21:59 them more than double the performance. and the compact then did one used

22:10 the hyper threading up option. So gives then to threats to each core

22:20 the hyper threat option. But as can see from the performance program that

22:30 very successful in terms of performance. the point of hyper threading is that

22:42 can be quite useful when functional units charismatic units are and and not

22:56 but since functional units are shared among thread uh in a hyper threading environment

23:03 similar daily, a multi threading as called by some other vendors. So

23:09 don't get really more compute capability by hyper threading. So this is the

23:17 example so by using is compact and allocation than basically the number of functional

23:26 that got used that cut in So yeah, that's why it's important

23:36 understand general application what's critical now in of also the lower graph looks at

23:45 energy efficiency of the competition and unfortunately color coding that these folks did that

23:52 their plot a little bit messed up to you. The following point

23:58 So now blue it's still the scatter compact ended up being yellow. So

24:11 and in this case lower is Less energy to do the job.

24:17 clearly better. So that says that scatter was both faster more than twice

24:25 performance but and also more energy So it's about 12. About 60%

24:38 on the energy using the scatter allocation to the compact allocation. So any

24:55 on this. So using, understanding to Use 3rd allocation is important,

25:09 for performance and for energy efficiency in competition. 1, 1 question is

25:19 in case of compact happen if the has multiple A V X units,

25:26 say one skylight in that case shouldn't , uh improving complex nothing as

25:35 Yes. Yes. Yes. So you have replicated functional units, like

25:43 the case that was just mentioned, have to every excuses for instance.

25:50 , then uh it may be a bit more complex than just using hold

25:56 , hold compact representation, but you like to have a thread education that

26:01 both A B X units in the and there is, I've tried to

26:16 some comments in a bit on but one comment I will make now

26:22 that is the john just trying to them, Make sure my uses all

26:33 units. If there is more than in each core is the case that

26:44 unused, all functional units in the , what might happen is that the

26:51 , it gets reduced because it gets hot so then participation may play your

26:59 so one probably might not get one get more than a single functional unit

27:06 , but one is not likely to double. So it's probably advice things

27:14 to use one of the x unit all the course and go back and

27:22 up with the second one if one more threats and Mhm. All

27:31 So this we're in terms of energy in terms of charismatic work for what

27:42 fox. But what in the two ? So scatter was almost four times

27:52 good. Yeah, is made to multiplication and that behaves quite differently as

28:02 can see. So in this case compact was the better choice except for

28:11 clock right soil front. So they guess I should have said that on

28:14 horizontal axis but the corporate um so this because the low clock rate scatter

28:24 still okay but at a higher cooperate actually the compact representation is the winner

28:31 terms of performance and it's also they they're in terms of the energy efficiency

28:42 this case, forget it. Yes. So, so it again

28:53 because maybe you expect from application tends be limited by memory accesses, not

28:59 the functional units. And that's why must use the different behaviors in this

29:04 that the compact in fact ended up better. And so this is measurements

29:11 this is from the work group game group at UC Berkeley. And the

29:18 in part why the compact also is from the energy point of view is

29:24 the firmer in modern processors if the are not used, they shut them

29:31 in a very low power state. that reduces the power consumption for the

29:38 of course compared to in the case you just just one trip the core

29:43 that case you didn't get the performance scattering things out on. So it

29:49 better off to in this case save by shutting down course and just limiting

29:55 number of course engaged in this job again, it's not generally compute

30:08 Mhm So any questions and so these again two Very simple computations that illustrates

30:18 difference and how one would manage trade location um both for performance and energy

30:32 . Mm and this is just a of what I already said.

30:42 so now I'm going to talk more you than controller but third allocation but

30:50 also the data allocation part. And think so cigars mentioned that in the

30:58 in the last lecture of the lecture This so called 1st touch principle.

31:08 that is simply that the thread and kind of get allocated together. So

31:20 get looking to or rather the threat allocated to um where where the data

31:33 that it works on 1st. So in this case it's just one

31:40 then the data gets allocated in the um for the threat is allocated to

31:47 core for which the data was So that's uh obviously not. So

31:56 if one, I'll give examples in next few slides but okay. For

32:02 single tragic cold, that's the right to do. But when you have

32:12 threads then it may be an So if you later on in this

32:19 were to have a parallel region that on the data but and then the

32:23 for when you increase the number of gets allocated to other course then the

32:31 is longer farther away and it gets slower to access and more energy consuming

32:39 access because all of them that needs talk to a single uh memory

32:49 So Now in case one work to in this case two threaded core and

32:56 you have the for loop private That now has two threads and then

33:02 uh huh allocation of era is them between where the two threads are

33:16 So this is kind of a better . So this this is what there

33:22 systems too. So so this it's an example. Yeah, so

33:35 that's the son that if there is well, I think yes, I

33:38 say, but if the single thread all the data, then it ends

33:43 in the memory associated with a core the thread runs and then as I

33:48 then everybody else will have to go grab and work with that piece of

33:54 . So obviously the thing that was of done on the second part of

33:58 previous line is then too make sure the initialization of rate is done in

34:06 private region, so then automatically in has split up among where the threads

34:14 are allocated. So the design is things to pay attention to and do

34:25 even basic open mp assignment is to track over data is and where the

34:34 end up being allocated. And there just another example again where the Data

34:44 initialized in the best sequential one, it's just in a single place and

34:49 things uh it's not very efficient. , as I said, there's nothing

34:55 logically with a cold it will run as I said, it's not normally

35:00 because data is allocated Yes. Where single threat was running that generated the

35:09 . So in this case one can again, similar to what was

35:12 Any other example that one has initialization paramount regions where multiple threats.

35:26 And then there was just an example commands one can use to get an

35:31 of what happens. And this is showing in this case book.

35:38 threads and if I use this particular knew my A C T L.

35:46 a scent even shows expected latency between different um course or threats how they're

35:58 . So there are ways of getting also how good the thread allocation is

36:03 using some of the comments, not . Okay. You use the L

36:10 E P I believe so, And one of them also might be

36:13 than one everything, number six Yeah, so this was um now

36:32 got to try to talk talk about you control with where controls them and

36:38 threads ends up. But any questions comments so far about uh this first

36:48 and how data and threats are allocated to each other if one doesn't explicitly

36:57 things. Okay, But so now two parts of you control things,

37:12 talk about uh about scatter and compact spread them close. But the first

37:23 then is and then uh one prevents from wandering around which is binding.

37:35 open. Empty. This used on notion of places for to which threats

37:46 be allocated and to which they can be bounce. So one thing is

37:54 allocate things, but Unless one also things to be bound to the particular

38:02 , they always may have its own of doing the execution of the cold

38:09 move threats around. Mhm. So standard places uh to which threats can

38:27 executed, it's kind of um confusing some degree, but there is also

38:35 threat concept in terms of the So we have seen that many designs

38:45 have like until the orphan hyper so they have Up to two threads

38:51 core. Mhm. And so one then assign threats are being executed to

38:57 of these two hardware threads if you , but then I can also advocate

39:06 simply to course. So the higher concepts and not bother too much,

39:12 which of the hardware threads gets the executing threat or we can simply allocate

39:21 for socket. So those are the targets replacement of threads for execution is

39:29 score and stuff. Then I can places and I will give examples that

39:39 is more illustrating them but this is documentation or you want places might be

39:46 how to specify them. So can specific example or number identities for places

39:56 I can make a list of places I can also groups places into

40:06 So uh so the petitioners, the in order, this that this continues

40:14 for breast as you cannot have and in the places, so it's a

40:20 start to finish um with constant executive and let's see uh and I think

40:34 , so basically so in principle like other um constructed essentially trip with notation

40:48 is being used but then one can part of the chip, it notation

40:56 so there is basically lower bounds, looks towards the bottom of the

41:00 this is lower bound length and So if you omit the stride and

41:05 seems that the strike is one and course if you omit the length and

41:10 just a single place, so there's few different ways of Looking at the

41:22 thing and again, the last Yeah, is basically first the Middle

41:29 . Is that the partition? So basically Each partition has four places and

41:35 there is four different partitions and if look at it there's continuous range

41:42 3 and 4567. So these are gun continues for complying with the notions

41:49 place petitions. But then one can use this triplet notation. So and

41:57 second from the bottom is just zero and that means there was no third

42:06 of Australia's one. So the best 01, 2, 3 and there's

42:11 of them. So it was against location, the length and then potentially

42:18 strike. So that's what the last uses this try notation that vessels as

42:26 between different instances Is four. So basically start zero. The next one

42:33 at four and the next one starts eight. Yeah, so that was

42:41 places are defined. And then there's spine clause that allows you to prevent

42:48 operating system from moving the threat during execution. Mhm. So and that's

43:00 master threat definitely policy as I So if for instance the place is

43:06 socket, that means there are many threats that can execute in the same

43:12 As the master 3rd. So that that case in potentially makes sense to

43:18 the master definitely policy. Yes. place is kind of rich enough to

43:28 manage several threads concurrently. So I . Okay, so I will give

43:39 particular questions. I'll take them As or 2 more slice the mental.

43:48 , yes, I think the next is the graphic administration of all the

43:52 here because as but the open mp use instead of compact, compact was

44:07 my colleagues at Berkeley used instead of MacoS is the official open empty name

44:19 placing threads as close as possible and they have spread instead of scatter.

44:27 now close. So then it depends there are a few words fred's than

44:35 number of places then as previous kind straightforward. So one just allocate threads

44:49 the places and one starts with allocating . Two places where the master thread

44:59 is executing, right, so, I'll illustrate that the next slide,

45:07 it's confusing in the text. On other hand, if there's more threats

45:11 places, Then one basically does the of threads in order to much the

45:20 of places. And I also show something on the next slide,

45:29 so on the top there is the of fewer threats and places and in

45:36 case the parents or master thread was and running place number five. And

45:48 that means in this case there was threads. So the thread allocation starts

45:55 plays five and then it goes through places in Iran Robin fashion because there

46:04 fewer threats and places, that means will be in this case a couple

46:10 places that don't get assigned in the , there is no kind of

46:17 On the other hand, on the when they're small threats and places and

46:25 we have these clothes allocation policy in case there were 13 I guess threads

46:39 there are eight places, so there small threads some places, so that

46:45 in this case on the front assign to threats to places than my could

46:53 old 16 uh blocks of threats. , of course there was only

47:02 So that doesn't quite work out, it means well first go through the

47:08 two threads per place or again starting the master or parent threat was

47:17 So the first two threads gets allocated place, five connects to play six

47:23 and then run still try to use the places. So that means some

47:31 places only get the single threat but can be implementation dependence, that could

47:37 be the case that place, number threads, 10 and 11. And

47:43 there's the remainder part of in that would get allocated to place, number

47:50 , Okay, any questions on And then spread is then trying to

48:04 things out and then there's also a depending on when there are fewer threats

48:10 the number of places or if there more threads then a number of places

48:15 this case, the blocking also takes and I think I have another graph

48:22 that tooth, so when I want do this spread type allocation then and

48:33 are few threads on this case. I took an example with three threads

48:40 then the eight places get partition up place petitions and each petitioning, so

48:51 this case one would need three petitions there were three threads and three threads

48:58 kind of fewer than the number of , so and in this case it

49:02 not an even divide again, so place partitions only have two places in

49:09 case only one. And then the starts again with the petition that in

49:18 the parent or massive friend runs, that gets 3rd # zero and then

49:24 and a round robin fashion so The partitions and each place petition gets one

49:33 and at the bottom is the other when there's more threads than petitions.

49:44 . Uh This case, in fact doesn't end up being all, that

49:52 not different at all compared to the close partition case because of how the

50:01 relationship between the number of places and being worked out. But I think

50:13 basic idea is perhaps best illustrated in top one and I have a few

50:17 examples um showing again this close and allocation of threats, so the or

50:33 . Thanks again, you made, want to play with him in the

50:38 MPI assignment that you do have. . Um yes. So that's what

50:49 said really when things are not. then there's a buying clothes that

50:54 prevent they're always from moving threats around I know that at some point,

51:05 you're experimenting with bind. So I know if you have any comments to

51:09 with the bind. Yeah, Thank . Obviously by definition it prevents moving

51:19 trends within a place. Uh You have some outputs but I need

51:26 find them. Maybe I can show in next picture. Okay.

51:32 I'm sorry. I've been think of but I know we did talk about

51:36 and I can't remember what example you play with. We'll try to demonstrate

51:42 next time. Um, somewhere sorta instead of the homemade examples. So

51:51 is I think it was done on old version of stampede. Not the

52:00 one I believe. So in this , as you can see there was

52:11 socket server, 12 cores per socket hyper threading. So two threads per

52:21 and so on the right hand side , see the kind of illustration with

52:26 kind of rounded corner boxes being the . And then you have the course

52:32 the soccer than the brownie squiggles. don't say hyper threads in each corner

52:43 the bomb. You can also see you can get the listing of what

52:50 call the physical idea that has done socket I'd and then within each socket

52:56 have the core idea that again is or it's unique for each soccer.

53:02 it starts the lower for its And then you can see the threat

53:10 that in this case goes through which common default allocation of threads at.

53:21 there are other versions too. But this case basically you can see that

53:27 get allocated one threat per core, moves from um Score zero and then

53:37 next gets in core, one on same socket and it goes through until

53:44 headache allocated a threat to each one the core on the first socket and

53:48 they moved on to the second But they're also allocation schemes where you

53:55 between the sockets first and I'll come to that later. So there's ways

54:01 again figuring out how you do the and the coach, you know,

54:09 examples on the next side. So is there is some part illustrating different

54:17 of specifying The places to which one something allocated. This had not been

54:25 um, place statement. So by and it does, does the

54:34 course in this case. So it's of different examples. So basically two

54:40 for corporate office and this case was buying clothes. So the allocation is

54:48 going through um, the first in this case on the first socket

54:56 should say, whereas in the spread this case again it was going through

55:06 the threads out among all the cores but not alternating by the sockets But

55:16 place partitions of size two And placing thread in each partition of size

55:25 So this is I guess this example here's another way all more forcefully listening

55:37 particular places so it gives the same as the previous ones. But instead

55:46 having just close construct explicitly assigned threats course. So there's nothing unique otherwise

55:59 that. And here is an example the triplet notation without The stride argument

56:10 assume the stride is one. So said, well starting place is zero

56:19 I have four threats to allocate. that's the cold on four. So

56:24 just increments by one because this study implicitly one. Mhm and Mark.

56:30 then on the right hand side create spreads location by using again in this

56:39 the trip at plantation and then have strike to be too mhm Oh.

56:49 . So any questions so far? this is not playing with hyper

56:53 Well, you know, have to explicitly. I'm dealing with the

57:04 Okay, so now it says that hardware threat is the target And then

57:12 wanted eight threads and in this case default was to take the first kind

57:21 threat in each core. Yes. then just go through all the

57:30 in this case that gets one thread To get the eight threats allocated and

57:40 also invest has used the course and the same outcome, in which case

57:49 the system doesn't care which one of two threads um gets allocated or taking

57:57 of the threat that one wants allocated the court. Mhm. This is

58:16 , I guess the only interesting part if I look at the second place

58:22 material that starts at um Yeah, eight, right. And then it

58:31 eight instances. So in that case get the second statement here to allocate

58:39 to the second threat in each court I think that's or one can instead

58:49 using course, which is the right side. Well then make your place

58:56 Of the zero and the 8th So that means now this is basically

59:01 gets one thread and there's no one . eight threads so it just Create

59:09 such instances or for they ain't tends be allocated. Uh huh. So

59:22 just more example playing around with um case, again, their places on

59:31 left hand side places these threads and then it's also the bank calls is

59:39 , so in that case we'll set in the first socket or if it's

59:47 then it gets allocated across. So is many different ways you can specify

59:54 the same thing but point can also control things down to which hardware threat

60:02 more. This is just the printer get that demonstrates which fred X to

60:18 executed is allocated where in terms of hardware threats. So these are our

60:28 different ways of illustrating the same So in this case and looks up

60:35 left most illustration here, there was of eight threads and um it was

60:45 cores and Listed to surpluses, corporate or 16 cores. So in this

60:54 there is um the first threat gets , disappear and then the second threat

61:05 advocated to the next core etcetera. basically one thread per core um in

61:13 case and this is just a different of printer for the same thing and

61:21 certainly the more readable. So I this option are worked. Um oh

61:28 , 1 things printed out. So investment system and sort of eight cores

61:36 socket. So this is what you see in the right most one that

61:41 huh since they wanted A threads but 16 course they got spread out using

61:48 other course. So this is, know, core zero, got 246

61:53 eight and then it repeats for the socket. Any questions, nope too

62:14 ? Um so this is another I don't think then I wanted take

62:24 look at it, but I think skip talking about it, I wanted

62:30 talk about something else. Let's Yeah, think I'm more or less

62:41 about these things. So let me to what I wanted to talk about

62:46 . So there's plenty of examples symbol put out in terms of seeing how

62:51 can specify where things are now um I said sometimes, well I can

62:59 between sockets um uh huh And during allocation when there are several sockets in

63:09 old on this kind of round robin sockets or one can do was done

63:15 example, one threat for core. huh. In the socket until all

63:22 has got the one threat, then move on to the next um socket

63:28 that can be controlled. So mm is so Intel has a fairly sophisticated

63:39 for how to control allocation of So this is vendor specific IBM as

63:48 . I haven't seen what they indeed use but I'm using this example from

63:55 but IBM as a similar but not way or managing our friends are

64:04 Two execution threats are allocated to the and in the case of IBM they

64:15 four threads per core. So that's bit more options than name them.

64:22 so they have this what they call the type and the premiers and the

64:28 and I'll try to illustrate what these are in the next couple of slides

64:34 certain, simply just started from the here, the position where the third

64:41 starts um type talked about in this they call it compact like the birth

64:51 folks. Not close as open empty um talk about it and the scatter

64:59 then they have a couple of more of our controlling thread, the location

65:06 commute. Yes, I will illustrate . But that is relating to what

65:13 said. We're the alternate between Soccer's or Kind of Core 1st. So

65:25 they have basically and some a show the next slide mm hierarchy of entities

65:35 then they can kind of change the of the levels in the hierarchy.

65:44 it is more explaining of what the attributes are for type. But let

65:54 show this thing that I wanted to you. So this, it's kind

66:02 the default. All right. as an old and it has a

66:11 of sockets and I don't know why use package three instead of package

66:16 package zero and three. I have idea why they can. If it's

66:20 typo in this from where I be borrowed this slide from, from an

66:26 folks. But anyway, saw logically old consistent sockets and each socket has

66:34 number of cores and then it's for in the case of to threats if

66:41 threading is enabled, so sure have . And this compact or close

66:53 And the thing that happens is what illustrated on the previous sites do

66:59 Okay. In this case if not in this case, eight threads

67:09 allocated. Starting with core zero and in core zero hard reference zero.

67:16 then the next execution threads gets advocated the other hardware threat. In course

67:23 . And then you move on the and then you move on to the

67:28 socket and fill up the thread score core and the next socket. So

67:35 is exactly how it works. and this month is the scatter

67:45 So in this case attracted, one through the course in order. So

67:52 first socket zero and the core then core one in the same

67:58 And if there are more courts to in the socket but move one corner

68:03 the time and then move on to next socket. So that means you

68:13 through and basically get one threat per per socket and then you'll come back

68:22 . You see other harbor threat and core in order from Soccer zero onto

68:35 Sockets. So as you can see different. So one goes actually,

68:44 this one sorry. And this alternated Sakis, I'm sorry for confusing.

68:51 are miss read slides. So as can see first in this case,

68:58 this is alternating between sockets for the auction right against Oh, I don't

69:08 , I'm sorry. First threat in zero core zero. Then the next

69:17 is in the next socket and then back to the first socket and then

69:24 the next core and then back to other socket and take the next core

69:32 then goes back again to the first . But then Use the first hardware

69:39 or second threat in the court. . Oh, and then uh,

69:47 other example here from the folks that I use the offset here, that's

69:56 I wanted to show it to. in this case There are also

70:00 3. So in this case, , it starts basically in the third

70:15 in this case, but here is the first threat is allocated, but

70:20 things are progressing from there. So is just illustrating the offset where you

70:28 the trade allocation and then, so of the first compact that didn't have

70:33 offset attributes. It started Back in zero. But then I think the

70:43 one is where they also commuted the most important player in terms of this

70:54 of places to which things can get , Um and then the offset

71:02 So it starts in this case, . And again, 4,

71:09 But even though it is compact the next threat is not allocated.

71:18 other hardware thread in the same but it went on to the next

71:26 . So basically levels got promoted. the most important part? So there's

71:33 different ways that the vendors have Two. Um let uses allocate threats

71:48 course to get the maximum out of resources basically, whether it's again memory

71:54 limited or compute limited and where data allocated to make sure that there is

72:03 between the execution and most of the that threats interact with or use for

72:11 execution. And I think that's then a here's examples of performance tuning but

72:23 tongue is pretty much up but we'll highlight them the thing I skipped for

72:30 up for those interested in the um stuff. So I'll just tell you

72:37 it is. The standard examples Yeah. Uh Uh huh. It

72:44 further back than I thought, so . Mhm. So what it is

72:52 three examples, I encourage anyone interested the past construct as well as open

72:59 code for can only, for in addition to matrix application, a

73:04 factor is Jacoby method. Fight confidential the grid, nearest neighbor communication or

73:14 In the two D Array. It's common and there is a few slides

73:18 shows first just the standard way of it, then how to use the

73:24 construct and some this question of how can group some blocks of roles if

73:32 do it by roman or do it columns to reduce the number of tasks

73:38 get the better performance and less And then the other example is so

73:45 false idols method is another h m for solving linear systems of equations and

73:51 use that to create flexible. Yeah update rules, they called my

74:00 If anyone is familiar with it, a very simple method, but it's

74:04 very efficient, so it doesn't converge that quickly. So this guy Alcedo

74:10 converges to be quickly, but then can basically create what's known as away

74:16 , so that's what often used to out how to order computations or schedule

74:24 . And here's an example then how create way France, they are stable

74:30 . And then there's also uh some the test construct on eventually evolves to

74:38 to figure out how to yet enough and the parallels between way France.

74:44 what can have this recently called wavefront that you have this color diagonal lines

74:52 basically shows how one went up and um before the previous wave front is

75:00 , as long as they basically trailing other, you can have several away

75:05 going on concurrently and updating degree and the other generic test cases, value

75:15 . And I believe one of the , if I remember correctly, I

75:22 remember which one is using L. . Well, maybe I'm just

75:29 Yeah, but there is also a version of any decomposition among the slice

75:35 escape for that sometimes. So I those for you yourself to produce and

75:43 you have questions on them, if do take time to look at

75:47 we'll be able to answer questions on . Okay, my time is

75:58 So it so far amount of things have brought up in terms of open

76:07 . Um but much of it this , something that could be of interest

76:15 you in terms of being projects. that turns out to be an interest

76:22 otherwise, um you don't need all it for doing your assignments, n

76:32 commerce from you. Yeah. Looking stop the recording

-
+