© Distribution of this video is restricted by its owner
Transcript ×
Auto highlight
Font-size
00:04 what? Yeah. So today I continue to talk about open empty and

00:22 leave room for suggest to continue to or some features of open empty And

00:30 didn't cover all the slides I had the last lecture. So I will

00:39 with some of the slides that were the last lecture and then and some

00:45 points for open mp. So um , so there was I think a

00:56 bit of discussion to lash examples or about work scheduling. I'll talk about

01:03 first and then some more about quantum and I'll talk to a simple example

01:12 open mp and hopefully get into a bit or something known as tasks.

01:19 is a more flexible way of dealing using open mp. All right.

01:25 we're scheduling and I refuse slides about scheduling but this is in terms of

01:33 regions, hard to divide up the among threads in a parallel region.

01:41 it's kind of most these who thought in terms of loop constructs was a

01:47 of iterations and the thing I showed example earlier, I think it was

01:53 Spectrum application maybe. And there were 1000 integrations and there were four threads

02:01 each thread that A Block of 252 . So that's kind of the default

02:08 of doing business. But I can what's known as the number of

02:17 Uh I'm not in the context of loop that is assigned to a friend

02:25 that's known as the chunk. So chunks to is just the number of

02:29 in the case of a loop or amount of kind of atomic piece of

02:36 sort of block work. That is scientific threat and one can introduce something

02:45 a static or dynamic assignment of the . Two, the threats. So

02:55 there are more chunks than there are and threads, just the chunks or

03:01 block of iterations assigned in a kind round robin manner. Whereas in the

03:09 case however, get the job done , get to grab a new

03:16 So basically the chunk are kind of on the Q. And the threats

03:21 pick off the next chunk in the and I have some pictures of

03:27 Let me give you some more intuition how it actually works. Then there

03:34 also another version known as guided, which case the chunk size is not

03:45 but it tends to be on the with the big chunks and one can

03:52 the chunks. That's and that's depends how the compiler guys decided to implement

04:02 guided procedure and the junkies tend to the names of this largest chunk and

04:11 it gets subdivided and then one can let the runtime system decide what is

04:19 good chunk size. So the later or more work at runtime and it's

04:25 flexible and less sort of control from user and you kind of use those

04:35 ideas when things may be highly data in the making directorate for making,

04:42 small partition, there are no conditions the loop. So the work is

04:48 at compile time, particularly if the size is known, but since there

04:56 no data dependency, but generally that's true. So in that case my

05:01 want to choose this more flexible scheduling hope that the runtime system has a

05:08 guidance for when it knows what's how to assign the chance today.

05:18 so there's kind of a little bit terms of uh when things may

05:24 as I said, when things are known at compile time, the static

05:29 probably the best option because than the or the effort in dividing up the

05:38 is done at compile time. So one time things can be very

05:44 Whereas when there starts to be unpredictability the amount of work in a looped

05:50 the dynamic me ends up being the version and the more one goes down

05:58 the top towards guidance and towards then their own time system has to

06:04 more work. And sometimes uh, has kept around. So in like

06:13 instance, the order option that one see from previous executions or during the

06:21 on the various chunks has been done try to make some predictions of what's

06:27 good allocation of work and threats might okay. So here's some of the

06:35 , one of the graphical illustrations of concepts of scheduling. So at the

06:42 is the static and as you can chunks are fixed in size and the

06:51 in Iran Robin way grabs the chunks they are and to which the work

07:01 divided and dynamic the which thread gets chunk it's not know ahead of

07:11 So it depends how quickly the work finished and the threads king has started

07:19 different times and whoever. So in case threat number three, I guess

07:24 ready to grab a chunk first and thread to and then thread one,

07:29 can also see that three and four he got the chunk was it did

07:34 very quickly and went back to grab chunk And in fact in this case

07:39 4th fret ended up in the end 3, 3, 3 chance.

07:48 in terms of the guided, there a couple of options of how things

07:51 have been implemented and as you can that it starts out with the biggest

07:57 and then in the next round of something, the chunk size have been

08:03 and successively reduced. And the other that to try to reduce um the

08:11 of scheduling but potentially Children starting with large chunks and then reduce the number

08:19 chung soon. In the end there be a few rounds of grabbing chucks

08:28 I think there's a couple of other of uh in this case, I

08:36 know exactly what the code was for this, but as an illustration of

08:42 happened in a particular case and in case using the static um, scheduling

08:50 that's it's not uncommon. So the friend on the first friend I need

08:59 may be used in a particular way end up in that case getting to

09:07 more work than what most of the threads are doing. So in this

09:12 , if in the threat case it's almost a factor of two difference

09:18 the thread that the maximum amount of and the one with the least amount

09:23 work on the other hand saying cold using dynamics scheduling things ended up actually

09:33 very nicely work or load balance between threats. So again, Montana,

09:44 , this is, you know how huh work is assigned to the

09:52 so hard to do. They work was using scheduling features in open M

10:01 and there is just another graph of ways of so on. And the

10:11 in this case using it chunks of fire for dynamic and think I need

10:17 it's not time. It's basically showing integrations in the loop are divided up

10:24 threats. And so any question on scheduling club. Okay town in runtime

10:44 . Both I had used it in other examples before and so we shall

10:49 Houston and the this demo uh on . So yes, a few

10:59 so I just wanted to comment on particularly, never get maximum number of

11:07 and I'll talk a little bit more now. They always just signed thread

11:12 the next slide or two. Mhm , a little bit tricky. In

11:18 of this cap max threads, which nothing has been requested in terms of

11:27 for the region then it actually gets the maximum number of threads available for

11:40 region. On the other hand, one has Before one gets to this

11:47 time function has one way or another to specify or properly have requested a

11:58 number of threads then it takes his from that specification request for a number

12:04 threads so it doesn't really tell you it happened. Forget max threads is

12:13 after a set number of threats has issued. So so what did returns

12:21 not necessarily the maximum available. It also be the next the number of

12:27 requested if that has happened before. wrong time function is called. And

12:35 a certain number of threads that is in a specific number of vets,

12:42 and then think the Houston an example also get the number of threads which

12:48 the number of threads that has been to parallel region but there was and

12:55 examples used also Get the 3rd I'd by using they get threatened number or

13:04 against the idea for a particular Some at the wrong time. This

13:12 queer functions and we discussed this the last week that three different ways and

13:22 Fontaine request threads and one is to what's known as the environmental variable.

13:28 on threads but it has the lowest and then they run fund function set

13:33 of threats takes priority over the environmental and the claws in a parallel

13:43 A statement that's the highest priority. it overwrites the other two effects.

13:49 not the same. And then this and it's just this a few times

13:58 a certain number of threats is not in the number of threads you

14:02 it's a request And on the next it's a busy one. So,

14:10 I encourage you to take a look it and I'll try to make some

14:14 to it because it's probably not feasible see on the screen, but maybe

14:22 is on the projector screen. But I think I'll start from the

14:32 in fact, So there is an save close to the bottom of this

14:39 . So so that can be kind direct and assess the number of threads

14:49 the region or it can be some flexible. So if this variable then

14:58 is false, that means they want fix number of threads for the execution

15:04 apparel region and as I said, the last LCF on this. So

15:12 says, so if I asked for threads then what's available and I'll come

15:18 to threads available in a bit then up to you ever implemented their own

15:26 system, what is going to But if you request fewer dressed and

15:35 available then you get what you Now on the other hand, if

15:46 okay that the number of threads may then if you ask for less than

15:55 available then you may get what's requested you may also get fueled. Mm

16:03 if you ask for more threads is in the dynamic threat scenario you get

16:11 number of threads. It is at as many as the number of threads

16:20 . Now the number of threads it's a function of So it's kind

16:26 slightly above the middle of the There is um says looks wasn't when

16:34 won't continue so so here's that's available is said as the maximum number of

16:46 that the hard work and support minus threats have already been used, especially

16:55 free capability of the hardware. So is capable of as a maximum minus

17:02 already been had told up by their friendships, two other regions in the

17:11 . So it's uh huh somewhat elaborate used by their unconscious system too,

17:20 to a set number of threats and what we try to stay as stress

17:25 it is that request not to guarantee what you get. So any questions

17:38 with runtime functions or threatened assignments to regions. Mhm Okay, so it

17:52 just a simple example, I think similar to worry where they talked about

17:57 I will not comment further on So and here's an example. Again

18:05 illustrating That you set the learner make . So that means now it's a

18:11 number of threads and in this case gold asked for as many friends as

18:16 are. Ah yeah, system can in terms of the number of processors

18:26 then that's the crucial thing and get ID for the threads and then it

18:35 yes, that one on the that's to global variable num threats. So

18:43 , basically says I think I said 11 but So in this case it's

18:50 single yesterday one updated but this is good way or actually then making certain

19:00 which may be useful um hopefully not correctness, but in terms of understanding

19:06 performance of the code that you check number of threats was actually signed because

19:13 , it may not be what was . Okay. Toy. And there

19:20 assignments have suggested to do this kind procedure used request a number of years

19:26 then you check what the runtime system gave you uh it's just quickly common

19:40 and a little bit distinction between the threat is totally logical entity and the

19:46 fred yes, The one sort of continuation of the parent threat in the

19:56 region. This we talked about last in so yes, demo. In

20:04 of that thread ideas are local to Prairie region. This is just given

20:12 complexity of open empty in this current . It's just a good again,

20:21 cheats sort of speak in terms of most commonly used open MPI functions that

20:30 can get along way with And they about 21 functions that are on this

20:38 . No, at the core of one would need to know to do

20:42 decent job with an open em people now I'm going to talk about this

20:51 example. So and the questions Okay, so this example called computing

21:12 . And it turns out there is simple equation that If one integrated on

21:18 interval of 10-1, it turns out be exactly by mm No. And

21:28 computer, we need to do this version of this integral. It's not

21:34 function. So simple numerical approximation of integral is this kind of was known

21:43 older approach that basically approximating? Because integral is the area under the so

21:54 the curve is defined by the The other four divided by one plus

22:00 squared. So the area under that between zero and one turns out to

22:06 hi, so approximate the area under curve antony then use this collection of

22:16 that are the lighter blue versions. And of course it's not exact but

22:22 a decent approximation in particular if the are narrow in the X.

22:34 So I'm going to do this collection rectangles and the discreet version of the

22:45 some warriors sequential code. So in case The internal wall between zero and

22:52 was divided in 200,000 rectangles. And this call does takes this little point

23:04 . So what it does pick the or each of these rectangles. So

23:15 the left coordinator of the rectangle is . And then it takes the midpoint

23:25 you know I and II plus So that's a good point. And

23:29 this is the integral. So it the height of the rectangle at this

23:36 of the rectangle. And then it up all the rectangle areas. And

23:45 the rectangles are all of the same you can not sums up the

23:51 it sums up all the heights. then since they have the same width

23:57 the end it does an application when step size or the width of the

24:02 . So this is just adding up height of the rectangles. So this

24:07 the sequential told. So now to open mp version. So well so

24:20 question. Okay okay so in this um this code was a hard

24:30 That's the tube creds to work on rectangles and then as we can see

24:37 a bunch of these variables or global . Um So the number of threads

24:46 also then one generates basically generate of values. There are as many.

24:55 basically each threat gets his own uh of partial sums or hide the

25:03 So in the two cases you can the blue and reddish triangles and one

25:09 gets blue and one friend gets we'll talk more about that. But

25:14 since each thread has his own some there is really no race conditions to

25:23 about the threats trying to update the same some variables. And then um

25:31 was that the number of threads requested the the two threads that gets to

25:38 a bump and then we have the region here for the open Mp.

25:44 in this case I. D. and threads without the ea so there's

25:52 variables when I say them and sounds same but pay attention to there is

25:59 different and threads in this code, one without DNA is local. And

26:06 each thread in the primary region. huh. And usual thing and then

26:13 threat gets his own I'd and then also up this is local private and

26:24 without the a from reading the number threads that was a sign. But

26:35 we have this global variable and friends are divided defined up here that then

26:41 get updated by one of the french . world race conditions and then we

26:51 this slope that than each one of threads get to welcome. So there's

27:03 parallel four here. So basically all called chunk here is executed by all

27:09 friends and in this case to threats since again they have their own partial

27:17 variable here that there is no race and updating there partial sums were they're

27:27 Luther and in this case and then used it in the previous example

27:32 So um each thread increments looping their web, the old number of

27:42 So it does things in a round fashion or the total range of loop

27:51 . And then it's the same thing of colder and get the height or

27:55 respect to triangle triangle. So after is done we have as many partial

28:04 sums. That's our number of So then we need to add them

28:10 and in this case this is outside parallel vision that goes from here to

28:16 . So this is basically sequential So there is no issue here in

28:22 of um race conditional. And the thing going on with updating pipe.

28:32 right. So any questions on this version open empty on the serial

28:48 Okay, so now let's see. here's what happens or obviously it's a

28:55 of threats is two and then there a table that has 1,

29:00 4 threads obviously was really compiled for Thread assignments. So in addition to

29:06 the run it for three and Yeah, so and then we have

29:12 execution time in the right column. , so and in comments to what

29:26 1-2 threads, that was obviously good in the execution time, but then

29:33 didn't really plan out too well. , so we have the colder,

29:44 And the suggestion why it doesn't scale two threads in this case.

29:54 uh huh. So cruz for listen to replace that, you ask

30:08 and I wasn't very clear on my . I know the suggestion was that

30:14 you're not getting the same number of , it is a little bit trickier

30:24 . So it was a good thing , that race condition right. Was

30:29 because each friendly has its own some . Right? So they could,

30:36 no competition. Everything is uh perfectly little balance in this case if

30:43 number of directions is a multiple of number of friends And in this case

30:51 was a few threads and 100,000 So Even if one thread gets one

30:59 intervention than the others, that would really matter since there's so few threats

31:03 they are basically even in four threads have what? 25,000 iterations each and

31:10 actually perfectly even No. So the part is yes. Mhm. So

31:28 this case we're four threads being so because Houston array Some array to

31:41 this, the one um partial sum the array for each one of the

31:49 . So that means that yeah the elements of the some array are in

31:58 same cash line. So At the one cash is one of the things

32:09 in order to get updated our That means different course, don't share

32:17 one cache show and threads are in different core. The cash line investment

32:24 to in a way Move or be between the different one. Cash is

32:35 the cache line has invalidated And thread updates Some zero. So then against

32:44 then when another threat is different core to update the same cash right then

32:53 these to grab it, copy it in this case the first court.

33:01 that's why for every time threat in different core wants to do something than

33:12 one that happens to have the current version. Things need to basically pick

33:18 it back and forth all the So this is what's coming on as

33:23 sharing in the sense there's no sharing of any data items but the data

33:33 happens to be in the same cash . So this is the notional for

33:45 and that's why this notion of our or cash lines and uh of course

33:56 a cache line you know 11 to may be shared them on a few

33:59 but not level one. So, the computer architecture in terms of what

34:10 are private to each core and also size of the cash line and horror

34:17 is allocated to cache lines. It's when it comes to understanding performance,

34:31 . Any questions on that. So have a part of the reason for

34:38 on talk about processor architecture. typical sizes of cash is and cash

34:47 sizes and didn't talk too much about coherence, but this is part of

34:56 the system needs to do. Yeah, from Mhm. All

35:04 So how does one avoid these So there are a few ways,

35:11 most naive and straightforward way is to kind of having of this summary.

35:22 basically the bad field in this not generously to the array so

35:29 So in that case it feels out cache line with things we don't care

35:37 . So the different some variables that are interested in ends up indistinct cash

35:46 . So in that case there won't any form of sharing anymore. But

35:53 kind of an ugly way of doing because that means that you're cold will

35:59 on the particular cash line sizes in platform you're using. So if you

36:06 on a different platform, a processor has a different cash line size and

36:11 need to change your parents. So works. But it isn't good.

36:18 the fact that it works, we see on this slide novel padding added

36:23 make sure that with some variables are the cash loans from this case it

36:28 the problem because now we get good up even or reduction in execution

36:38 not quite a factor for but not bad. It's close. So I

36:45 benefit from using more threats of the Within quote scales at least four threads

36:52 this case. Any questions on Mhm. Mhm. Okay. So

37:07 does one do to try to make code more portable? Wow. So

37:18 , I said one way to avoid some array and basically have local some

37:26 for each thread. So now there no variable sharing, there's no concerned

37:33 this condition for obtaining the variable because strength as its own some variable.

37:40 it means that at the end one to have a global variable. The

37:49 . Yes, updated. Well um local fred son entities so in this

38:00 this is done and inside the paramo but using the critical construct. So

38:09 that case Pie only gets updated when thread at the time. But all

38:15 get to update by. So this the way then to make the code

38:24 dependent on the particular architectural feature on processor. So it's a lot more

38:30 version of the code. And in case so that pretty much as well

38:40 the parent version and or one since a single variable but could also use

38:53 atomic state. It was only So atomic works a single variable and

39:02 can work on the collection of So in this case either one will

39:09 and the other one since it is reduction operation we can also use the

39:16 reduction statement to have the system take of um avoiding race condition and also

39:28 potentially in parallel instead of a sequential . Now doing in peril submission of

39:38 thread vice some variables. So that's option for this symbolical. And in

39:49 case it turned out there was not as efficient as using but it should

39:56 better. So for four threads in case tom is critical um we are

40:04 serial edition uh the four dress no didn't because too much of an

40:14 But where is the reduction may not been all that will be implemented but

40:20 should scale better onto a large number threats than the news from the serial

40:29 . Uh huh. And I think was more or less what I read

40:36 and take home condition about this cold are too avoid for sharing by instead

40:48 local variables. And then well they atomic critical time and the partial sums

40:55 use the production costs and function. . Oh now I want to switch

41:06 test your comments on that so and now we'll hand it over to suggest

41:14 a demo. So any questions on ? Mhm. Uh So far today

41:21 terms of or previous lecture, I'm . Yeah. Okay. So I

41:35 mr josh also with demo test but some comments first. So there's a

41:41 of different aspects of tasks and in way of some more the general version

41:54 parallel regions. Yeah. And I'll uh talk through some examples and see

42:04 far I get but I wouldn't believe is 20 minutes or so for so

42:11 to continuing the demo from last All right. So, well said

42:18 idea is to Kind of move away the strict 4th joint type of business

42:31 and have a more flexible way of synchronizing and scheduling. Task then what's

42:46 in the parallel regions for joint And I will probably not be able

42:53 cover all of it today but I then do it next lecture. So

43:06 and as you will see first, looks like in the examples I have

43:11 that is kind of looks fairly similar sections as it says on this

43:18 but it is the more flexible construct sections. So what so, tasks

43:32 initiated in the private region and then kind of Yes. Associated cold and

43:46 also the data environment and its own variables package up everything that's normally is

43:57 the parallel region but task and then independent of each other and I'm sure

44:08 examples of that and I think this just pretty much a comment that already

44:18 that, you know, in a parallel regions were in the implemented as

44:26 of tasks without opening it up to program actually him or herself initiated and

44:39 tasks. But now with the test came, I think, what was

44:44 ? Open? Empty version three. it has been used because it solves

44:49 of the issues that the early 4th version did not address very well.

44:57 it starts with an open bragman open test simply and it has done a

45:03 of causes that controls what's happened. I will talk about these various aspects

45:12 tests, that's the goal. So tends to be I guess the way

45:20 used Most of the time is that have a single threat to create the

45:31 but began the task then are independent each other. Whatever France is in

45:38 parallel region then can grab the So basically one that phenomenally in the

45:52 simplest incarnation, basically threat is assigned the task by the runtime system and

46:02 the threat, that's what task it's the runtime system decide and if there

46:10 a president task then After it gets be more than one test.

46:18 So now this simple example then I going to get through uh huh.

46:27 is just this is snow carol That's pretty simple and see what

46:34 No, so here is to stand open empty without tasks in this parallel

46:41 And then we are in this case statements um Now if one has 2

46:51 as we do know there is no order in between threads so a number

46:57 things can happen in terms of what printed, so I will just continue

47:03 my own here and say you know threads in this case this happened.

47:12 that was pretty much what I had would happen but not necessarily guarantee that

47:17 would happen. Um So but it have been um that again they were

47:31 so two friends. So basically the both reds could basically have grabbed each

47:46 since it's replicated code, remember So the threads both have all three

47:54 statements, so that means one could Both Red one and to print a

48:00 before anything else. So they could been all kinds of different combinations of

48:07 three words, a racing car, on which threat gets to execute what

48:15 , but for Israel they will obviously them in the order that they are

48:25 . So here maybe you can see should be easy enough and it takers

48:32 this example, my content. Mhm , obviously since just one thread and

48:42 we'll go through the statements in order no problem. Right, so now

48:51 going to try this with the task . So in this case there is

48:56 friend that as the print of a and then it generates tooth tasks.

49:08 task that Prince race and one test prints car. So now what were

49:20 ? Cool. Okay. Any adjustments huh. As Okay. So teach

49:32 and the eu Okay it was a bit of it kind of echoes in

49:38 end so I will thanks for the . Even though I can comment directly

49:45 it but yes, obviously what could ? And the reason is in this

49:52 that A is printed first is guaranteed then that thread generals these two tests

50:03 there is no particular order in in which the tests are executed. So

50:09 are not subject to the single threat that are subject to or managed by

50:16 number of threats in the parallel So in this case there was set

50:23 the bottom the slide Fun specified two of doesn't stay in the cold example

50:30 they are mr ram it in this with just two threads and it may

50:36 happen that the threat that gets the task May get to its statement 1st

50:46 it could also be that's different. get the car test prints that

50:55 So there is no order between racing even this task example mhm Later on

51:05 next time I will tell you about ways how you can organise or sequence

51:12 as you may want them because there ways for controlling of the task,

51:20 each other or wait for each So here's another one. No,

51:29 had another. Okay, print statement is fun to watch. So in

51:38 expectations of what this code might Well we have the issue again a

51:56 thread, it's just the two but this single thread has print A

52:07 print is fun to watch. And we have the two tasks that are

52:15 executed independently and in parable by In case different threats. The two

52:22 So clearly we get a first because was printed before anyone of the test

52:29 generated and then we discussed already on previous slide that racing cars and were

52:36 at any orders. And then we a single thread again, the two

52:43 are completed at some point and we is fun to watch. So here's

52:50 actually be what's happening that the single then at a and it's fun to

53:00 may get the job done before any of the tasks, get their job

53:05 . So there's no guarantee that the will be completed before the single thread

53:13 has a and it's fun to gets its job done. So tests

53:19 be initiated at any time that the system decide and it may not be

53:27 in terms of the sequence of the as you see it. Okay,

53:36 I think well um let's start the and if there's time left, I

53:44 talked about the next item for Yeah mm yeah, I think you

53:54 share the screen. Right, mm okay. For some reason here

54:03 ssh session keeps dropping off. yeah, mm yeah, mhm

54:25 mhm yeah, mhm Oh, mhm, mhm, mhm yeah,

54:49 Oh yeah, mhm uh just a of examples that uh we're left to

55:06 last time, so I guess mhm pretty much everyone now can know how

55:15 uh the scheduling works here. So obviously we have uh requesting a

55:21 so just assume that we get what asked for from the operating system and

55:27 just Anyone, any takers, what happen if we have 16 nutrition's for

55:32 for loop in if you use dynamic and uh if you use static

55:40 arctic best practices. Mhm. yes, with static you get an

55:47 distribution with all the threads and with , it will depend on how the

55:53 each thread uh got uh finished its nutrition. And if there is any

55:59 attrition available that it will take and chunk size defines that, how many

56:05 iterations that each thread will get at um at a given time there's one

56:11 thing that you can use a static chunk size as well. So in

56:16 case the difference will be let's say . you have 16 nutrition's um let's

56:22 you had two threads, that means get a pediatrician's per per thread.

56:27 ? So from 0 to 7 uh number 12 0 would be executing in

56:34 normal case if you don't specify any size and the rest of them would

56:37 threat to uh thread one. But you specify junk size and in some

56:44 uh these uh the attrition zero and will be executed on trade zero,

56:50 two and three will be executed on one and so on certain. Then

56:54 distribution will be more in a round fashion rather than even distribution. So

56:58 a difference if you use chunk size static and if you don't so just

57:04 example here. So as you can that static, we would expect Uh

57:10 threads and 16 iterations the even distribution the tracks for inspiration and with

57:16 you can expect such um execution as . So zero executed quite a few

57:24 on them. Uh The last train executed two of the patricians, there's

57:29 distribution between the threats regarding their Yeah. Yeah. Yeah, that's

57:42 good question. It's not a Um the the use case for dynamic

57:49 mostly when you're uncertain of the amount time that it will take for

57:55 So let's say in in a simple multiplication case or matrix vector multiplication.

58:00 that case you are expected to work the same piece or the same amount

58:05 data on with each thread. So you can expect the traditions to

58:09 the same amount of time. But you're performing some other words, that

58:12 on how much data that each thread . In that case, dynamic maybe

58:16 little bit faster because if one thread finished then it's better that it executes

58:20 other piece of situations as well rather the that actually might have gotten that

58:27 situation. So it depends how your is. Mhm. All right.

58:36 think everyone knows the single uh construct well. So let's see if anyone

58:47 with single I just have two questions this example. So where do you

58:54 are the implied barriers in this uh the scope, is it here or

59:00 it the 2nd 1? Mhm. . Well, all right.

59:28 so single, if you remember, comes with an implied barrier after the

59:33 section and that it executes. So you can expect all the all the

59:38 to uh to synchronize with each So no thread will uh go to

59:43 second uh single statement. Well, least at that line in the in

59:48 source code before one of the threads finished performing the first print of.

59:55 , um with single statement, you add a no wait clause. And

60:00 that's going to do is it's going tell all the other threats that you

60:04 really need to wait for a threat might be performing the second statement to

60:11 and the other sides can just simply on with whatever is the next piece

60:16 . So you can expect an something like this. The first friend

60:23 , you're guaranteed that no other thread be performing second or third print statement

60:27 the first print statement is executed because an implied barrier after it.

60:33 In case of second print statement, we did uh specified no weight loss

60:39 the single statement. That means some the threats went ahead and printed the

60:44 print statement even before thread forgot print second print statement. So there's an

60:50 barrier after the single construct. But can skip that by adding this no

60:56 clause after it. Mhm. Okay. At this Is a tricky

61:09 . Alright, so take a minute at this code here. And the

61:15 here is what if you specify J private. And what's going to happen

61:24 you specify Js shared or rather not as anything and let it be shared

61:31 default because it's defined outside the parallel ? Good. So would you expect

61:38 correct output for uh for this program you said J Private? Or if

61:46 would you expect the correct output refused let it be shared? So while

61:51 thinking, I'll just quickly comment We have for attractions in the outer

61:55 because we have been defined as Yeah. And then the inner

62:00 we have five patricians, and because setting two trends here. Um In

62:08 , we obviously have four times that's 20 nutrition's. But you can

62:14 um Uh 10 of those attractions will divided each uh each of the two

62:24 . So I guess what might happen we subject to private or shared?

62:40 , oh. Mhm. We wow. There's models. It isn't

63:01 start because it's in the scope. ? Yeah. Well you don't have

63:08 parallelism here, We're just paralyzing the loop here in this case. So

63:15 of the four outer patricians to each be distributed to two of the

63:22 And then in the inner side for inner loop, you can expect Both

63:28 the threats to a trade over the Patricians for Jay. But would that

63:35 if you have a shared J. ? I don't think so, because

63:42 you have a um if you have pragmatist mp parallel for in front of

63:47 loop for let's say for this outer here by default, that makes the

63:52 uh loop variable as private to each . But if you have another loop

64:00 inside that same loop, that private does not get inherited by the inner

64:08 . So by default, J will shared. If you don't declare it

64:12 . And what could happen is sure both the threads might run the the

64:18 outer loop but because they are sharing . A variable one threat might updated

64:26 some value. Uh The other one read some uh some value that it's

64:31 Yet worked on outside thread one increased to from 1 to 2 and then

64:37 zero comes in And read the two it starts from 2 to 5 rather

64:42 going from 1-5. So that means sharing that value, Correent? So

64:48 may not get all the 20 traditions that. uh for the for this

64:54 groups of four times 5, 20 may not get the all 20

64:58 So here a sample out but you expect something like this. As you

65:04 see here we got I zero going J 12 J five Then I went

65:11 from J12 J five. But again because jay was shared, you apparently

65:18 up running J calls to to again I calls one and that might have

65:23 because I called zero may have updated to do at the same time.

65:28 that means J is being shared. so you may not get all 20

65:32 so that if you see this count , It won't go all the way

65:36 20 I think yes it just ended executing 18 nutrition's in total. But

65:44 you set the private jet to be explicitly then you can expect a correct

65:53 And then you get all the 28 here. So you have to be

65:58 in terms of the loop index is they're not private for inner loop.

66:03 the problem will be paddled for is for an outer loop. Mhm,

66:12 hmm, variable. Okay. You Yeah, all court move. Mhm

66:22 , variable. Cable be private because defined inside the parallel region. So

66:28 you want to check the scope of variable, just check where it is

66:33 if it's defined inside the barrel region it will be private to all the

66:37 . If it's defined outside barrel then by default it will be shared

66:41 you explicitly started to be private first . Last private. Any of

66:48 Okay. So yes. K is in this case. All right.

66:58 see. Yes, I think I the same uh example for tasks as

67:04 professor just showed on the slide. this is what he is going to

67:10 in the coming slides. But one that I wanted to mention is,

67:15 say if you had a variable uh as private for the panel region,

67:22 have you had a variable named chair for this barrel region then if you

67:32 all the threads, let's say you on J or update J whatever it

67:39 do, all the threads can do with uh their copy of J

67:44 But when the tasks are called that variable is upgraded to a first private

67:53 dust. Okay, so if you had private uh j you did not

68:02 any updates, that means you may a garbage value or any value that

68:08 , I may have said for a variable, that value will be um

68:13 into the tasks. So any private upgraded to first private here?

68:24 All right, I think that's most the examples I had to show dr

68:32 , maybe you can continue from here . Okay, so I want to

68:36 if there were any questions before then describe my screen Yeah, no questions

68:55 this, yeah. Oh I don't any right, okay, yeah,

69:03 for example will be a little bit exactly how variables are inherited in

69:10 tasks that are similar but not identical the case for carbon regions as I

69:16 mentioned. Mhm So um Scotland Yes, so the global variables there

69:25 behaving the same as in parliament regions as um Okay, mm she has

69:35 uh so local variables, there are and then it becomes first private as

69:46 can hang mentioned. So here is example, I think that tried to

69:51 that a little bit contrived in the that too open and pay parent regions

70:01 one I think I am be a and and subsequent declare be as private

70:11 then we can see A and B C are what defined before one gets

70:19 anyone of the parallel regions of the variables than D. Is defined in

70:28 For a 2nd problem region. So clearly private. Per threat in this

70:37 region. And then in this region is a task as well and then

70:47 is defined in a science within the . So now the question is

70:56 what's the scope of the different five variables defined years? So scope of

71:06 . Is pretty straightforward. It's only outside and there doesn't appear anywhere else

71:12 this cold. So it is within task. It inherits as it said

71:19 global property. So that's shared. about the what does B. And

71:34 terms of the task? And is still a brush fire? Yes,

71:47 think I heard it correct. So , it's first private because yeah.

71:56 inside the test. It's both private it also inherits what the the

72:08 So it's assigned a value in the . So that's why it's first

72:14 not just private. How about Mhm. We'll see is defined before

72:30 of the parallel regions. So it's same with shared right? So similar

72:40 A. Doesn't. How about d ? Well can we say about the

72:55 the task? Well these defined in parable region so is private to each

73:13 in the problem region. So in task and it is first grade and

73:23 is simple is just private because it's within the task. Okay. All

73:30 . So now what are the So A is pretty straightforward. That's

73:36 share. And it was designed the of one. So what's B What's

73:44 value of B? It is Justin huh. Parliament is serious. You

73:57 repeat. Oh, I'm sorry. some junk values since uh there's no

74:03 carried out by the uh well it's on the audience doesn't work too

74:13 today. So it's thank you. in the cold that is visible was

74:22 assigned in the various even though its private because after the second parallel region

74:33 be was never assigned the value Because private. So it's not b equals

74:42 because be in the private vision is different B. And local stores for

74:46 threat and it wasn't first private in parallel statement. So it doesn't inherit

74:52 whole stupid it's a new memory location their private. So it was never

74:58 . And the fact that becomes first within the task doesn't it's just important

75:05 value it had before entering the And if since it was never signed

75:11 , that's why it's basically I'm She is pretty straightforward. three.

75:22 issue defined too because the even though first private, but it was actually

75:26 the value. So we know what is right. And then he is

75:32 signed in the region. So that's problem. So it's as the judge

75:40 out and I tried to stressed all time. It's really tricky or one

75:45 to be very careful to keep track the status of variables what they are

75:54 or private and whether they are initialized not. So, so it's ample

76:05 to make mistakes unfortunately. Right. right, let's see. Yeah.

76:13 this is just um comment on So as I said earlier on,

76:24 order to manage or clear about whether are private or shared, it's good

76:33 always states exactly what you wanna intends variables to be and not rely on

76:46 status of the variables. Okay, I had something about synchronization, I

76:54 couple of minutes I'll just uh mention um um probably talk about the first

77:04 because I think I have a couple minutes only. So barrier is no

77:09 than we're used to and task weight I guess the opposite to no wait

77:16 in the parable regions statements and that also showed for the single statement.

77:24 here is the old example again, . And now with this task wait

77:34 statements on this case, even though don't know which order car and raised

77:43 will be executed. We know that will be executed before is fun to

77:50 is executed because of the task wait . So in this case there should

77:58 so exactly that is fun to watch always the last thing to get printed

78:04 of this desk wait state and Task group is yes. Uh

78:18 And yes I had talk about this to so now maybe we can see

78:27 at least I'll wait and see if one was tell me what gets printed

78:36 this case. Okay. So when have excess a global variable then we

78:45 the single thread that generates the first and then we have a task wait

78:52 then generate the second task. So but access a global variable. Now

79:03 do you expect to get printed? . And the takers. Okay so

79:30 first tasks generated in this case It actions, started up zero and gets

79:41 . So when the first print if encountered It's a Task one. It

79:50 X. And the second task Um doesn't get initiated because of the

80:00 waits statement. So They get the one is going to print x equals

80:09 . And then the second task increment again. So we'll get tax one

80:15 actually equals two. On the other . Mhm. If um we change

80:30 code to use the first private Then what what phone one intellect.

80:48 Yes. Uh huh. I think heard one and runs and that's what's

80:52 to happen. So 1st private. means each task gets initialized With x

81:01 zero. So that's why we get print statement even though there is a

81:07 wait is going to print the Excellent. And I think that's

81:17 Um Mhm. Let's see. No . I don't. Right. So

81:34 the thing that in this case because first test initial list X. Which

81:46 also private so that task so if task finishes then it's not clear what's

81:54 to get printed. So I think my time is up. So I

82:02 the next thing is I'll talk about but that's for next time.

82:11 Okay. And the questions. Uh . So I stopped recording now and

82:23

-
+