© Distribution of this video is restricted by its owner
Transcript ×
Auto highlight
Font-size
00:01 All right. No it should be . Yes. So I wanted to

00:06 clear about the last private. Wasn't clear explanation in the last lecture.

00:18 it is well defined what it actually back last private? What gets exported

00:27 of the parallel region and it is X. Value that is associated with

00:37 last integration of the for loop regardless when it happened to be computed.

00:47 it's not the last time ex possible touched but it's the X. Value

00:55 the last iteration. So we are an experiment. Yeah after class last

01:06 . So. Mhm. This may hard to display but uploaded. Um

01:22 so they basically they chose on the there. What the Tread idea about

01:34 threads in the particular case and it's . Yeah two iterations and but well

01:43 can see also that the friends did execute nicely in order with the which

01:57 that this block assignment of integration in that the thread number three would have

02:09 last three I guess iterations in this . Um But that's got down therefore

02:25 X value was touched last but nevertheless looks at the bottom you can see

02:31 what was exported was in fact the value that was created and the iteration

02:40 was max or while in this case guess. So it so the order

02:50 which the iteration indices were treated does effect so it's well defined what gets

03:01 out of the region. So any on that. Yeah. Okay.

03:16 stop sharing my screen and I'll switch today's lecture. So, uh,

03:28 is like the last time. yeah, I think so. Was

03:49 some question I shouldn't answer. You it. Uh huh. Okay.

03:58 then we'll continue with open and peak we left off. More or

04:05 I encourage you to do the example was, I didn't get the last

04:09 . I think the slides hopefully self and I think it's a good exercise

04:15 go through and trying to make And if you have questions about

04:20 then let us know one way or . Oh, this is kind of

04:29 . The great up on the first bullets is really what was covered last

04:35 . So we'll move on to some work sharing constructs and the constructs that

04:41 listed here and I do not think will get to do the example

04:47 Instead I leave time for cst demo and Okay. So yes, the

04:58 again, this kind of structure of empty is um, throughout the control

05:04 creating parallel regions that were the work construct. And as we talked about

05:12 some extent, this notion of shared private. Have to be careful about

05:19 race conditions. And then there are which didn't talk much about last

05:26 So that's what I want to talk today and some of the long term

05:33 . So here the kind of bluish are the ones I did not cover

05:37 time and I'll cover most of them some degree today. Except I will

05:45 cover Cindy and task them will be a future lecture. All right.

05:53 additionally work sharing constructs. So they these three that I will talk about

05:59 as work sharing construct a mask, construct a single construct and infections constructs

06:07 if 2nd and 3rd the master single kind of miss normal because it's actually

06:14 sharing words. So it's the opposite it's still in the group of construction

06:21 this labor of work sharing. So the master construct does is basically that

06:32 um, part of the code should exit code executed by the master thread

06:44 . So and after in this case this pregnant open empty master in the

06:48 region. That means that's it. codes within this carbon gracious will then

06:57 executed with a master read only. sometimes that like this example, another

07:08 that was that I did this side are familiar with something that's almost as

07:17 used as matrix multiply or matrix vector is to do the Corbett direction on

07:26 and in that case boundaries are treated from the internal points on the

07:33 And that is because it was chosen best have the massive threat is what

07:40 boundaries and that means that the other do not father working with.

07:54 peaceful cold that is dedicated to the . Now for this particular example,

08:03 I mentioned, typical jacobi iteration on mesh and for corrections of the

08:09 It's important that standards gets updated before for instance deal with the internals and

08:17 why it's also the barrier statement after open and the master. Yeah.

08:27 of the code because the master construct not have an implicit burying so if

08:37 wants things to be synchronized, one to explicitly required by having a variance

08:45 otherwise other threads will escaped. But math is designed to the master and

08:52 on to follows So and part of thing that I'm pretending on this Fortuyn

09:02 that shows that the master threads it's of always life through the whole

09:13 So whereas other friends are not, are basically confined to be created and

09:25 at the end of the region. master threat continues through. And that's

09:31 that's important. The reason why this construct doesn't have it mm implicit barrier

09:40 the end of the cold section for to which it is supplied. So

09:47 is what I said. So then was a single construct that is also

09:54 for insurance construct which is basically that of cold like for the master should

10:01 executed by a single friend and it's from the master in the sense that

10:11 threat could take the job of executing code for which is declared to um

10:25 by just a single threat. But it can be and it did not

10:30 could be a massive hit but it also be any other threat. Now

10:34 single, unlike the master construct does a barrier at the end of the

10:42 of the code for which it to it's applied. So in this case

10:49 this code will behave exactly the same logically as the one of the masters

10:57 putting in the barriers because it's implicit this single, so that was the

11:07 and the single. So that's what says here. So what because of

11:13 barrier, what it means and when thread execute this cold segment than the

11:21 ones potentially doesn't do anything depending on they are and what work is designed

11:28 their threats. Mm Then there was section's construct that is a way of

11:40 up work among Mhm. The So in this case what every code

11:52 is following an open empty section, or construct is executed by one Fred

12:03 , so but the different sections here be executed in parallel. So in

12:11 case X, y and z calculation concurrently, but different threats with each

12:20 executed by a single threat. And then there as says the barrier

12:31 the end of the from the pragmatic empty sections constructed, initiates the part

12:41 the code for which a division for is explicitly managed through this section,

12:49 home. So what is as if has more sections then threads than thread

12:58 On more than one section. But this, the way this call is

13:05 , there is no particular threat that on in a particular section. So

13:10 they always decide is a good threat take on at the tech perception,

13:16 what's going to happen. So, there is a question in the

13:23 Yeah, well I see that um the difference is in the single

13:42 you cannot have several threats working on sections of cold at the same

13:51 So the sections allows you to have each section, it behaves like the

14:01 um construct, but unlike the single , you can have several threads working

14:10 different pieces of code at the same . Did that answer the question?

14:23 , yeah, so I think these the rock sharing comes structure had planned

14:31 talk about, there is no value . The barrier is at the very

14:39 . So the thing that matches the sections construct. So in this case

14:50 will be three threads executing uh in it's likely. And the synchronization happens

15:01 the end when these three threats have their respective cold sentence. Mm

15:16 Okay, so a little bit about for I'd rather talk to the

15:24 I think that's fairly and through them no critical and atomic may not be

15:31 much. So I'll go through What's your question? Oh,

15:39 All right, so that's just pretty the obvious thing. The barrier basically

15:45 that everything is on there assigning things a is done before anything has worked

15:54 to assign things to be. Uh that's pretty much the obvious function of

16:02 barriers. So then a little example being worth spending a little bit of

16:09 you'll see. Yeah, maybe. I'm going to ask questions and we'll

16:15 if I hear you or if when so you have to repeat But

16:21 so we can see this cold here excess global terrible. And hence shed

16:28 all the threads museum and explicitly So by having the shared have to

16:37 or attribute for the parallel section. . So I mean it's all threads

16:42 execute the region then have access to . And then what happens in that

16:52 is that thread number zero update Value. But none of the other

17:00 objects. The ex father. So then there is a sprint

17:08 So now the question is what do different threads print to the print to

17:19 the print fire or what would they ? And they take hers it should

17:35 too because it's shared, correct while correct in the sense exhibition And depending

17:48 whether thread zero updated X before the thread looked at it. So if

18:00 threat that is other threat than zero is slower in this way Then it

18:09 print five. If it gets too . Before threats, zero updates that

18:14 will print too. So this is on what it is. But after

18:19 barrier because things are sink. So means at that point X is definitely

18:25 to five. So then all threads pin fine. Oh, so

18:33 it's just important to keep in mind threads can execute things in an in

18:40 arbitrary order and there's no guarantee with makes most of the progress. So

18:48 pick someone can nothing for any time between different threats. And I'll talk

18:59 about this flush in a moment. that's what the name is for when

19:08 are synchronized between threats. Right? So critical is the former mutual exclusion

19:22 this process. Is that the piece cold for which the critical construct applies

19:32 only be executed by one thread at time. Uh huh. But all

19:41 will eventually execute that piece of So what, it's not like a

19:48 construct in the sense is just one can do it all. We'll do

19:52 but they will do it at different . Yeah. So that's what a

20:00 one call at the time. Oh . So uh looking at in this

20:08 call example, um I wanted to out how this for loop is

20:18 So in this particular case um the iterations of the for loop. Our

20:40 two different threads and a given the state does whatever is thriving fred

20:48 is and then the next loop It does is something that text the

20:59 of the current ID and increments it the number of threats. So basically

21:04 kind of um picking out iteration counts in Iran robin way. So if

21:17 have the range of iteration and haters in this case and you have

21:29 10 threats then thread number zero does number zero Then it does like a

21:40 number 10 etcetera. And the third one does one plus and the next

21:47 does 11. So it's kind of takes the range of inspiration in disease

21:56 doing a round robin way, assign loop indices to threats. So this

22:03 a common way that people sometimes decided do paralysing loop and explicitly control of

22:16 intervention in disease gets society particular Mm Okay, so the next construct

22:31 is kind of similar but still different the critical is atomic And the different

22:42 atomic applies to variables or memory locations supposed to cold segments. So it

22:54 assures that the particular variable is only or By one thread at the

23:06 So it prevents and kind of confusion race conditions and that sequences one at

23:15 time. So that's sort of the between critical is one thread at the

23:25 for cold segment. Atomic. one at the time for a variable opted

23:36 question on the difference between critical and . Okay, next one is if

23:49 or that can be used as is in this particular example where the segment

23:57 code is paralyzed under certain conditions. specific conditions to then the parallel region

24:07 generated and a bunch of trends, generated for treating the code segment that

24:15 if it's false, the region will be created and that means sect the

24:22 are called will just be executed by threat. Well then it's just so

24:30 reason for doing things like this is it says here and I said before

24:36 generating primary region and a bunch of has a certain amount of overhead and

24:41 they work that is small then one not really care to try to paralyze

24:50 . But actually then it's better than to just sequentially go through the

24:55 Mm So this and sometimes in the you don't necessarily know and what in

25:02 case the end value is So that's you have this sort of data dependent

25:08 you paralyze things or not by potentially this its expression to control. What

25:17 parallel regions multiple threats should be Okay, so this was that there

25:27 a new way away. That means normally there is the this is that

25:37 implicit barrier at the end of the loop and I know what it means

25:43 um progress. It's not limited to that every friend Gets to the end

25:52 the four. Look before moving on of course that be very useful.

26:00 again, needs to be used unfair upon whether one for correctness need things

26:06 be in central. Not so I guess just an example of you

26:15 what cause here where there is still lists on top there is basically creating

26:23 vision with maybe in c being shared the idea being in private for its

26:31 , the best places thread ID. that means you know, a different

26:39 of gets updated by different threads. it's not this forest conditions but then

26:47 moving on to using a One has barrier and then there is the 4th

26:55 . So in this case and generous sees and that one again would like

27:01 have all done. But then for last four Luca highlighted in red then

27:10 okay not to have Been synchronized for four lou but then it gets sync

27:18 us at the very end because that's initial or the problem partner regions of

27:23 done. So it depends on what called. Logic is where there is

27:32 and history introduced. No. but it's again, a way of

27:39 huh avoiding the or canceling the default implicit synchronization. Yeah. Mhm.

27:50 right. So the next thing I I want to talk about this

27:58 Mhm. So, reduction is a common operation, as you probably do

28:04 know, this is a very simple example, you're standing up the

28:09 OnRA eight and then computing there average . So sequential. That's fine.

28:17 Fonda's kind of straightforward parallelization that is here now on this side then.

28:30 we'll see what happens. So is any problem with doing it this

28:41 So the average C A. Yeah. Yeah. Oh, is

28:55 outside the region issue? All Yeah. So, well, the

29:17 . Mhm. So in a way problem stems from I guess two

29:26 but one being that The 80 variable going to be the average at the

29:34 is the global one. So that in the parallel region, the different

29:44 are then updating a V by adding respective A values. They have you

29:55 each A values because the lupin extract among the threads, so they all

30:05 different A values. So that's But the problem is that maybe is

30:12 and share. So if two threads to get to the statement at the

30:19 time, the object will not be . So one gets a race condition

30:30 doing it this way. So there's few ways and I think a few

30:37 here of working on this americans is . So in this case and avoids

30:53 problem with having a global shipments a variable by having a local accumulator for

31:02 one of the threats. So maybe justice one that is private for each

31:15 . So that means that the thread submission is called just fine. They're

31:24 every day locals from different threads. when get partial sums correct for each

31:32 of the threads, but then and needs to add up the partial sums

31:39 is generated by the different threats. , but now to avoid the race

31:49 again, Mountain use the open and . So that means it's that gets

31:54 turn the ads it's partial sum. the global. So so this now

32:08 it generates a correct code? Any And there are other suggestions for how

32:25 do this. My that's all I just talked about the journey.

32:36 I guess I have an example you're what happened in this case and that

32:42 that something's us did just to verify in a simple case and I can

32:48 that the local partial sums here Um so the thread number is on

32:58 right hand column, right Thread And a was basically Running from 1

33:05 eight. Listen, eight elements in array. So that means read zero

33:15 a one. So that means the becomes one. Then it also takes

33:22 second threat. The second index. that means is a A or one

33:29 basically the second element of A. too. So it at those two

33:33 . Someone gets after that taking the iteration and was designed to the thread

33:40 as in 2-1. So another local is three. Um We can take

33:47 just jump to the last trend and number three that has the last two

33:53 of the loop. And that means guess The second to last element of

34:00 97. And then the second index it gets that trend is

34:07 So then it has a thing at . And then what happens in the

34:12 region that the final value on each the of the local? So that

34:20 3 7, 11 and 15 gets up And it doesn't show one that

34:26 here but We can doing it, actually is 36. So it's 7

34:34 per seven, that is 10 11 is 21 plus 15 or

34:41 And then there were eight uh Elements basically 36 divided by eight, which

34:48 four points. So it ended up correct. So left. Well this

34:59 so so yes, what is this a swan flip side of this?

35:07 And I come to that in a um that is adding up the partial

35:15 from the threads. It's a question of the critical statement, one could

35:21 use atomic that I mentioned before because this case it's just a single

35:27 The variable that it's updated. So of critical one could have used the

35:32 for this particular example but it doesn't the fact that the adding partial sums

35:41 sequential. That's a sequential operation. there was a question in the

35:51 Okay. Mhm. Yes, And so nothing yes, nothing proceeds

36:15 all the threads has done. The critical good question I shall mention

36:31 so that was atomic. Uh so what I said I would talk about

36:42 reduction. So using the reduction clause um both is managing the parallelism among

37:01 threats, work assignment to threats as as synchronization and correctness. And in

37:14 does not force anything to be A certain principle and depends how this

37:25 construct is implemented in order to make it's correct. But we all

37:32 No. So how fast can you one talks algorithms in terms of parallel

37:41 in this case. So how many does it take at the minimum?

37:53 you have say many threads to add eight numbers or nobody misses, we'll

38:09 you guys get the idea from that . So in parallel algorithms basically on

38:29 huh have that the least number of it takes to say at all.

38:36 elements of an array is kind of the sequence of Paraguay's submission. So

38:45 you have kind of a tree. the number of elements to be added

38:51 belief and then you combine that pair you get basically the minimum number of

38:59 steps if we like is the height the tree so it's best Logan so

39:08 that's a I didn't have behind this and by doing the pair was additions

39:19 the correct way than going up up tree from the leaves, you can

39:30 sort of each going up the tree each pair or leaves and as long

39:37 they only sent points when you come join on the left and right brands

39:44 the tree, but different sides of tree can operate in parallel. So

39:52 how the reduction or the reason for inclusion of the reduction cause that when

40:00 the sequential part, that was in critical or atomic piece of the code

40:09 this reduction operation. So here is what it does then in this case

40:16 just have this additional cost for the four construct that claims Yeah, this

40:24 going to be a reduction loop. . And in this case again final

40:31 should be in the variable avian. this case it's a plus reduction.

40:36 add up the values So in this one doesn't have to declare and the

40:44 variables for each one of the threads one doesn't need to worry about adding

40:51 the local values of the reduction. implementation, take care of all of

41:01 any questions on that. So here typical reduction variables. Yeah, so

41:15 then they have some initial is so you add things it's clearly a

41:20 subtract. Say then, you the initial values designed to the summation

41:26 basically initialize to zero something sets up . If it's you can also multiplication

41:32 being the reduction operator and in this starts with the one and then there's

41:39 things for logic as to what initial are. So this is available in

41:48 mp and in most parallels languages or because it's an important operation in so

41:56 applications that efficient for all good performance good. Critical sometimes. And this

42:08 guess that I would say something about flush. Mhm. Operation and it's

42:14 basically says that open MPI has what's as this relaxed consistently share memory models

42:21 that Mr scan have a local you a while but when you come to

42:26 point then it make sure that um of variable states is consistent among threats

42:38 might enforce it by using explicitly flush . So here is kind of what

42:46 flush that's some of it is implicit some that you can also make it

42:52 , don't not sure that things are fred sees the value should expect them

42:57 see but you don't necessarily um needed correctness on every program. So thinking

43:06 the time is kind of overkill and things down. Okay, so looking

43:16 a good time here. Um so let's see Yes. Mhm. Start

43:24 demo and the first time left, will continue. Oh, okay.

43:32 . Mhm. Mhm. I'll start my screen then, yep.

43:47 Mhm. Mhm. All right. . Oh, between Okay, so

44:09 just trying to keep it interactive and to answer uh questions, hopes what

44:24 ? Yes. Session ended for some what my uh session ended on stamping

44:35 something. Okay. Okay. I assume session because I can see your

44:41 . Yeah, apparently that, as said session got terminated. Okay.

44:52 . Yeah, so I thought that be learned on the tools for hopefully

45:02 won't take long time to get access . You said this year's stampede is

45:09 . Yes. And you can see it gives me access very quickly.

45:17 . No. Okay. Yeah. . Right. Mhm. Okay.

45:29 . So is to start off so by now we all know how the

45:36 and P program looks like. So have the O M P dot H

45:41 file uh in your program and then you have that in your source

45:45 you can use all the open MPI that we've been seeing throughout the throughout

45:51 lectures um before getting into what the does. If you want to compile

45:58 court um For with open mp. with gcc compiler, you don't have

46:03 do anything special just need to add extra flag to your compilation command and

46:12 just to uh any uh compilation, any other program if you are using

46:19 intel compiler as the I C Then you just need to switch this

46:24 from f open and P to Q mp. And that's the that's the

46:28 difference between using gcc and intel And once you do that it should

46:35 the uh executable as you were expected . All right. So, first

46:40 in this program, how many threads be reported by uh this om Vietnam

46:47 in the cereal region? Mm How many times will it report?

46:59 yes. So, how many in in the cereal reason? How many

47:05 village report? It's a serial Yes. So, yes. And

47:14 reason. It will report. Uh one thread. How many threads will

47:18 report? If if I do not the number of threads explicitly in the

47:25 region. How many threads will be if I do not explicitly set the

47:33 of threads. The only position, ? It's that simple. Okay.

47:42 yeah. All right. So, depends on the environment variable called o

47:50 num threads and that. Its value on how the uh opening period time

47:57 been said So on stampede. If check the value of RMP num

48:02 it is by default one I believe ridges to it is 28. That's

48:06 to the number of course on Each of the processors or I think

48:11 the whole load, you know what that? It will if even if

48:21 ask less number of processors on a , on a note, it will

48:25 be that default value. Oh are I was running this martin?

48:35 is, it is you can you request more threads than the even the

48:40 as well. Yes, in that multiple transmit run on the same court

48:50 . Yes, funny part of Right, by default, it's that

48:55 this environment were able to one mm , um what happens if I

49:03 Yeah, so in the slide deck I comment on it later but there

49:14 a slide that details exactly how they figures out how many threats to

49:20 Um so you can also look in slide deck and a few slides ahead

49:25 where I stopped under the one time and there you will find they open

49:33 spec for how threads are assigned. , yep. All right, so

49:46 question how many threads you will get I explicitly tell um so there was

49:54 about mirrors. Well, um was in the chat about this explain what

50:03 mirrors, computers? Yes, I either, I didn't I'm not sure

50:11 matters. Yeah, maybe I heard , uh you said that if number

50:16 um requested threat is more the number course then it too near, but

50:23 I've heard just wrong. Oh yeah, so um yes, a

50:31 detailed why, how it decides depending the number of conditions, what number

50:39 threads are actually is given to the of give me x number of

50:48 but they always should not assign more than is available. Right.

51:00 When you when you said uh any of threads using this, Oh,

51:05 . Certain um threats called. That's a request to the operating system.

51:10 not a guarantee that it will give the same number of threats on your

51:15 your on your private laptops or any machine. You may get the same

51:19 , but on shared machines it's not a guarantee. So it's always a

51:24 idea to check once you are in parallel region that how many threads you

51:29 got for your execution. And it's a good a good idea to maybe

51:35 codes that do not depend on the of threats you got. So it

51:39 be still be able to give you correct answer. Okay, When they

51:47 being on. Right. Yes. that's that's what I was going to

51:52 . So the environment variable, it sets what's called an open and fair

51:58 time is a internal control variable. I C V. So the environment

52:04 sets its value. But when you oh, mp, said num threads

52:09 that function call actually updates the value that internal control variable. So whatever

52:15 ask an open Oh, NBC, threads it takes priority. Ask uh

52:20 the environment, radio? Yes. . All right. Yes. You're

52:32 it? Yes. Right. You're you're not telling the world you're requesting

52:39 ? Yes. Just make sure you that. Okay. And yes,

52:44 4th 1 is, I think pretty . I think we already discussed that

52:48 if you are asking for eight you may get less. Number of

52:52 always is not going to give you number of threats than you ask.

52:58 I think if I just run it it is because I did not set

53:02 number of threads right now. So giving me one thread outside the parallel

53:06 and one thread inside the parallel reason well. If I just quickly uncommon

53:15 this part here and I'm sorry if recompile mhm. Then now you have

53:25 your operating system gave you a threats every every thread executed this same piece

53:32 code. So the code gets replicated the across the threats. So,

53:39 in the plasters, so the program your questions, you don't get,

53:56 know, depends how the operating system . Right, correct, correct.

54:06 the other person's program and the the that is using most of the

54:12 Yeah. Right. Yeah, that's , depends how the operating system sets

54:19 priority of different processes that it gets for. So yeah, it may

54:27 you the same number of things that ask for, Even teams that it

54:31 enough resources to do that. All right. So assuming that you

54:42 how many, which have a number threads you ask for, you get

54:46 from the U. S. So many total threads will be in this

54:52 ? Mhm, mm hmm. Four Okay, right. And how many

55:02 are threat will be assigned? Let kill that fact scenes is our little

55:16 huh. This part of which is to be like no. So if

55:25 they always gave you the four it will not, it will not

55:28 away any threats from you once. , yes, for example. So

55:34 each uh thread will get for attrition correct because this is static scheduling by

55:41 , that's the static scheduling. So will try to evenly distribute all the

55:45 . It rations with all the all threads. So yeah, if I

55:50 simply run this to get an output like this again, uh notice here

55:54 there is no implicit order in the of the iterations or any thread so

55:59 can But that's one thing, one is for sure that each thread got

56:04 nutrition's Yeah. Okay, okay. , mhm. All right. And

56:16 , let's assume that gave you the of threats that you requested for?

56:21 many um how many threads were generated this uh this respond in this

56:31 Yes, there is a class going Uh up till 5 30,

56:42 Mhm. Uh huh. So how threads in this program? 48.

56:52 , and how many traditions part Mm Yeah yeah yeah. It says

57:10 are one in the chat, that's that's the correct answer. So

57:16 you ask for more threats than you the work available for Yeah, they're

57:22 and parent and will again try to the word evenly. And the threats

57:26 do not get any work, they'll sit idle but they will be

57:30 That's for sure if if the S. Gave you that many

57:35 Uh So again if you run it , each I've got one nutrition each

57:41 the other threat. Pretty much did didn't do anything, you know?

57:49 right, so far more expensive. . Yes. If you if you

57:56 how how much how many threads you for your work. If you are

58:00 of the number of lou penetrations and , correct? Yes. You may

58:07 hugging unnecessary resources that your program may end up using. Yeah. And

58:15 right. Uh Take a few seconds look at this program. And the

58:21 here is can you expect a correct from this uh parallelized loop section

58:29 So what this code basically is doing it has to raise A and

58:34 You initialize a a rabbit? Somebody . Then we have four threads And

58:42 have uh eight elements in the code eight elements for a day and we

58:49 this loop With four trans here and basically just element of B equals element

58:58 a plus element of B. For minus one. BI Equals AI Plus

59:07 , I -1. Can you expect direct output from this? If you

59:14 this for loop, why be Okay, I need help talking

59:33 Mhm Yes. So what you're seeing is called what we all know as

59:38 data dependency because the a threat that be accessing, let's say some element

59:45 B um may ask for an element me that there's being worked on by

59:52 other threat and the execution. So a data dependency across the across the

59:57 . Patricians mainly. All right. . One way of saying it is

60:01 threads as well. Yeah. So that case because we shared uh different

60:07 , maybe updating the value at the time and if you run this program

60:12 is no guarantee that you will get right outboard. uh in this case

60:19 or the other may have been uh in this case. All the Patricians

60:25 in a sequence. So we got but some some execution of the program

60:31 the operations may be jumbled between as know, there's no implicit order so

60:34 may get an incorrect output here. you need to be careful about if

60:39 any data dependencies or any other kind dependencies on your program that you're trying

60:43 paralyze It was also correct answer given the chat in case you didn't see

60:52 . Oh mm, correct. Jordan is right there, Yeah.

61:01 . Mhm. All right. Uh is a bit tricky one. So

61:09 have set eight threads on the top , Then we have a parallel region

61:17 spawns two threads and sets the outer which is an interior as private for

61:24 section. Then for this outer parallel we get the thread ID using Oh

61:30 . Get thread number or trade we print that. And then inside

61:36 parallel region we have another parallel region spawns two threads and I don't think

61:43 discussed this clause yet. But if add non threads clause in front of

61:49 and be parallel for then for that parallel section. Even if you let's

61:54 for a threads and you need only trade, you can specify that you

61:59 only to threats for this particular parallel . But yes. So the important

62:04 here is we have implicit in what said that the number of threats sitting

62:12 a parallel constructs as the highest priorities kind of over, is the set

62:19 that overrides the environmental variable setting. , Sorry. One. Yeah.

62:28 . So we have the nested battle in in this in this case.

62:34 first question is how many threads let's are in the outer parallel region.

62:44 , how many total threads do we in the inner parallel reason. Uh

62:54 Yeah. So it's in the Have one answer if you can see

63:00 . Okay, four. Okay, correct. Yes. All right,

63:07 this is the tricky one uh based question one and question two, how

63:13 print shops do you think will be the output? So we have two

63:17 in the outer region and for in inner region. Eight. Yes.

63:31 Okay. One offer also in the . Yes. Well, I'm not

63:42 it if it's correct or not, let's just run it. Uh All

63:48 , So the last four, let's it. So when did we got

63:53 ? We got one outer and two . So we got to print ups

63:57 the outer region. We got right uh inner print off for the outer

64:07 and one printer for the outer So we got only to print offs

64:12 the inner regions. Thank you. reason for that is there is another

64:20 variable and uh open mp. Which called as uh oh, empty

64:30 Yeah. And right now it doesn't any value. So by default it

64:34 to false. If you want to nested regions, then you need to

64:41 this particular variable. They're true. you once you said that now,

64:48 you run it, you get two plaintiffs for the two outer threads And

64:56 of them had to inner threads. we get basically four printouts for the

65:02 region. Does that make sense? right. Now the final question is

65:10 total. How many threats do you in this program? Six?

65:21 No, No. There are four on this program. The reason is

65:29 . MPI uses something called us thread . Uh It's called thread pooling.

65:35 what it does is when you ask two threads here, it did spawn

65:40 threads that's expected behavior. But when both threads went for this inner

65:48 They each asked for two friends. that means in total now we needed

65:52 threads. Yes. For okay, the outer outer region we needed to

66:05 right now, each of these two for the inner region needed to threads

66:11 . Right, So four in you needed four threats for the inner

66:15 . Right? But if you think it's in a simple way, you

66:19 to threats for the outer region and threads for the inner region. So

66:23 in total shoot, you should expect threads. Right. But what happens

66:28 open MPI uses thread pulling. So reuses the two threads that were running

66:33 the outer region. So it does spawned two extra threats. It uses

66:38 it spawned previously for the outer region well. So in total, rather

66:42 having six threads, you end up only four threads. So you get

66:49 myself express to the fence, not performance but at least use uh spare

66:57 overhead of uh spawning two more Yeah, yeah, yeah, I

67:05 also question in the chat uh was issue in open mp necessary is set

67:13 false the outer threats. Do not the inter nested loops. No,

67:18 do enter the nested region but they not spawn extra threads. They still

67:25 run with a single thread. So inner parallel region is basically ignored as

67:33 and considered just as a sequential section the outer parallel region. Yeah.

67:43 . Alright. So yes, so this because in your assignment you are

67:48 to paralyze uh nested loops at the mr loops. So for that you

67:54 need to set this particular variable to . Otherwise you may not get you

67:59 not get uh master luke nested regions be working? Mhm. All

68:08 That was it for that. All . That was just the basic um

68:20 with the parallel regions. Just a examples with the critical region. Um

68:27 right. So I think this you guys cannot tell me. Can

68:32 expect the correct result from this Uh One if you have a critical

68:39 construct or if you do not have critical construct in either of the

68:45 do you expect a direct result What we're doing is we're just trying

68:48 find a find the maximum value from from the A So what do you

68:56 if we if I remove the critical , do you think we'll get the

69:00 output. Mhm. Mhm. Getting get the bus, you know?

69:19 . Well, max is uh So it is a shared variable.

69:29 . So here the problem is that max variable is, as I

69:36 is shared. And excuse me. has a race condition on it.

69:43 if I remove critical then every threat be trying to update it with whatever

69:48 value that it gets. So in case you may not get the correct

69:53 when you add critical, then each uh only one thread can enter this

69:58 section at a given time. So that case you're guaranteed to set the

70:03 value for each thread for the max . So a sample output may look

70:10 like this. So, if you , let's say four elements in a

70:16 , 12 without. Okay, Well 12 is not the max value out

70:22 those four. But if you have with the critical, you're guaranteed to

70:28 the cut it out. Uh We'll sleep is uh I just added

70:39 to make sure that there is some between the Senate executions. Yeah.

70:44 even if you don't have that, say you can Yeah, let's say

70:49 you have something a large piece of other than the sleep function, it's

70:54 assimilate something like that. Mhm. right. What was this guy?

71:10 . All right. I think everyone knows how the uh scope of the

71:17 work. So here are the So what would happen if we use

71:23 shared private or first private for the I hear this program and in this

71:32 every threat is just trying to add plus the thread I. D.

71:38 to the I. Variable. So would you expect as the output for

71:45 . Um if you have shared private first private? Um Yeah in

71:57 Okay. And the fear of right . Mhm. Yeah. Uh

72:14 Okay his take your time. Think it. What and dr johnson let

72:27 know if I'm going over time if want to. No no no I

72:30 it's good. So I think I get yourself go through what she

72:37 I'll let you know when time is . You have about five more

72:41 Okay sure mm. All right. for now it's just show what the

72:52 output may look like you guys want guess maybe. Yes. Yeah that's

73:02 . So what's going to happen with ? If he said I as shared

73:09 every thread will obviously get we'll be with the shared variable and whichever variable

73:18 the last update to the shared You will see the output reflected for

73:26 particular thread because isa shared variable. is trying to update it but in

73:33 end whoever ran Last will be the Setting the final value of I.

73:40 . So here thread six run last it updated the value. 1000 plus

73:48 . So that's 1006 that you get in the end. Yeah, if

73:54 set it to private, then we I with them but since uh we

74:02 to private here, Then none of threads get this 10 value here.

74:08 get some either some garbage value or this case open and period time decided

74:12 initialize it to zero. So that's everyone gets a zero value in this

74:17 . And in the end because it a private variable, You don't get

74:22 updates outside the barrel reason because private not carry out the outputs outside the

74:27 region. So you still get 10 the region. Okay, in case

74:32 first private it's pretty much the But now every threat sees the initial

74:39 of variable I because its first private That gets carried inside the barrel

74:45 So rather than having a zero, get a 10 inside the barrel

74:49 but still outside the parallel region. get attempt because you don't carry out

74:53 final value outside the region. just a simple example to show the

75:01 of the of the variables. It depends on how you stop.

75:15 , there is a class going Yeah. Alright. I think I'll

75:23 with this example. Sure. No . All right. So,

75:37 All right. So with this example using the uh the clause schedule.

75:45 the question here is if things since were on this nested loops? Uh

75:53 don't know if you have a they always also have the right thread IEDs

76:01 local to regions. Yes, I that was part of I forgot to

76:07 it during that example. Yes. . Yes. Coming back to that

76:13 . Yes. Inside the the nested region. The thread IEDs again start

76:20 zero and that they are private to nested region. So you would not

76:26 in the inner region to go from or three for each region. Outer

76:32 , you get in zero and in and same for outer one, you

76:37 zero and then one. Yes, can do that. Mhm.

76:50 yes. Uh huh. Yes, uh I think by Yes, the

77:05 clause I think follows the hierarchy of western region. So yes, that

77:09 be carried out into the minister Yes. Yeah. Right.

77:19 I'm sorry. I forgot to mention when I was talking about. That

77:22 an important point too. Then one to keep track of they found your

77:30 and get the thread ID. Uh . That is local. So if

77:37 depends on also to which outer I've read it is related. one

77:45 to keep track of the thread numbering things are nested. Right? If

77:53 cold depends on that for correctness. . So uh So, I think

78:02 time is pretty much up for us okay finish today, so we'll continue

78:09 your demos next time. Sure, , uh huh and I will receive

78:16 where I stopped the slides next yep. Um so one thing uh

78:24 wanted to ask since I will be also on next week, whether this

78:33 is fine or you rather have online , it's just for my it's my

78:44 agreeing for this format. Okay, what I guess, partially my hope

78:50 expectation, even though it's not but I think it's better than just

78:55 personally, so I appreciate that. so it can take just some more

79:03 before people are coming in, but think we have at least a couple

79:08 minutes if anyone has any questions, the slides, I didn't get to

79:22 heavens also a bit of explanation on things that demo today, so we

79:31 have a back up both in terms the video and in terms of the

79:35 that are on blackboard for today, phone, just stop recording at this

79:55 .

-
+