© Distribution of this video is restricted by its owner
Transcript ×
Auto highlight
Font-size
00:04 So, um, we'll continue with open MP that I started last

00:13 So now, discussing or talking about various constructs, um, the basic

00:23 as well as what's known as close that is a way off telling the

00:31 a bit about what? What should done in response to the constructs on

00:39 there's time, which I probably don't I'll get Thio go through an example

00:44 what Detect? Because I will stop little bit early for us to give

00:50 a demo for the basic open MP and how to use it on.

00:57 we'll talk a little bit more about for them or more next time.

01:03 first, just a quick recap about Open MP is all about. It's

01:07 shared memory programming model so practically it's to single note Marty third or multi

01:18 programming on the streets around it. for this class, just think of

01:24 as a something for programming individual and it worked through compiling directions.

01:31 then it has the support of ransom , and one can define execution environment

01:38 environment of variables, and it is fourth joint type model. Uh,

01:44 conceptually is similar to the master as was discussed last lecture. But

01:51 different in the sense that for open , it's just one address space,

01:58 separate address spaces and by having one address space, then it can

02:05 done on a thread level. So is a lot less heavyweight, so

02:11 speak, than typical master slave processes spawned on potentially different notes or

02:23 On the very basic one is the model pin Parallel constructed. We'll talk

02:30 about today, and the other partner brought up last time is that it

02:36 shared memories. So, uh, requires a fair amount of attention to

02:43 one wants shed or not between the in parallel regions. And we're talking

02:49 that today, and this is a bit of the structure brought up.

02:54 there are about the control structures and work turning constructs. There are things

03:02 manage in data sharing, and, course, since the Panelists we need

03:07 so synchronizing threads. And then there 10 functions. Recover a little bit

03:14 most of these aspects today, and a little bit of point off it

03:22 open, empty. That started out very simple, uh, about I

03:27 20 years ago now it was just way of trying thio make programming parallel

03:34 . Oh, shared memory systems easier use is not having to deal with

03:39 extends directly, but creating about your top of it. And it's if

03:44 fairly simple standard as it shows on graph to the right. That was

03:49 more than about 15 pages on. now, it's more than 500

03:54 just the specification of the standard. in some ways it's lucky that there

04:00 not that many constructs one needs to in order to get the recent program

04:04 work. So it says here it's 21 functions or constructs money is to

04:11 aware off, and we'll cover most them today and the rest of them

04:16 lecture. So here's something what I at the end of the last

04:22 which, uh, chose Steve typical of a cult. That one is

04:30 include statement for the Open and P , libraries. And then there is

04:38 very basic command or directive I should the problem open parallel. That business

04:47 that the next cold docks, in case in close within their college

04:54 is something that you, the wants paralyzed. So in this

05:02 it doesn't specify in a particular number threats to use for this parallel

05:08 And that means the operating system has freedom in deciding what how many threats

05:13 want to sign to this parallel Uh, then it shows one of

05:23 yeah runtime commands that's we'll talk more towards the end of today's lecture,

05:30 its command that gets the idea identity each thread, since they all have

05:38 unique identity that is many times So you can make things conditional depending

05:45 what threat is doing what. So is the open and get thread

05:51 which is to spread. The idea supposed to the number of threats.

05:55 a command for that to that we'll about. And then there's a simple

06:00 know, hello World Program that everybody's USA's economical example for, um,

06:07 to programming on anything. So in case, from can see what they

06:12 put happened to be and what it's of illustrates it is that the different

06:21 are not ordered it in a So whoever gets to anything first get

06:27 chance to basically, in this print up whatever the statement says,

06:31 should do so in this case. , the thread that had the identity

06:40 one was the one that first reached print fellow statement on. But before

06:46 managed to print the word part of whole world said number zero actually ended

06:52 printing out Hello. And so things sort of jumbled, and it depends

07:00 how the different threats progresses to the . Consider doing so. There is

07:05 employed synchronization or ordering between threads unless forces what. And the other part

07:14 that each thread basically execute the copy the cold in the parallel region,

07:23 they all have as well, said lecture. They all have their own

07:28 . Counter stacks, registers etcetera. they are each other's executed, executing

07:37 , the cold that has been And they if there the workload sharing

07:45 that is happening throughout the mechanisms that principle each thread gets a copy of

07:52 code to execute and, um, in a several slides to come.

08:01 talk on the tower. There was shade or not when it comes

08:07 uh, what happens in the pilot ? Mhm. So here is and

08:15 to all that this types of constructs are the direction and constructs that's the

08:21 flow constructs. And then there's attribute that are kind of modifiers to the

08:29 , instructing the compiler off what and the other compiled into the runtime

08:38 What Mom wants toe happen within their regions or response to the construct.

08:45 I will cover most off the construct in these three columns on this slide

08:52 we go, so I will not on them here. But as we

08:56 through them, so for us that work sharing constructs and there is the

09:03 one that is very commonly use. is how to, um, paralyzed

09:11 . And in this case, one use thing can open empty parallel construct

09:18 directly compiler to try to paralyze in case, the four loop. And

09:23 a couple of ways of doing One can you see open parallel four

09:27 specifically on direct to computer that following loop is the thing that should be

09:36 . So that means each threat in case get the copy of the um

09:42 the statement and on the next few , uh, have a little bit

09:47 than this. Loop work is divided among threads, but after apparently region

09:55 back in sequential code, so low on the right basically shows when that's

10:00 serial thing where in this Step 2000 then follows on the loop that then

10:09 a number of threads again. This ? No, the specific number Francis

10:15 . So the degree of parallelism that is something determined by the operating system

10:23 talk about that happens. Hopefully, get to it later tonight. And

10:30 there's a sequential part. And then a new kind of look doing something

10:34 again. Whether that's followed. there's a concrete example tried to show

10:41 little bit. All these things so I only have. The sequential

10:46 is the simple loop with French up integrations and now to create an open

10:56 version of this one, as includes for including the libraries necessary for open

11:03 and then the pragmatist that defines such for loop now should be paralyzed.

11:11 that means the compile it and generates for a number of threads, as

11:16 illustrated in last lecture. So the who doesn't really have to worry

11:23 uh neither creating threads or synchronizing So another point, I guess talk

11:34 later, is that so called spin programming model that this is a good

11:42 off. So there means that's a program on multiple data. And that's

11:48 most common where your programming, whether shared memory systems or distributed memory

11:54 it doesn't mean that, um, thread follows the same execution path through

12:00 program because there are conditional zand that cause the execution path to depend on

12:08 data that is being worked on. even though the program it's a single

12:14 of the same program for all it doesn't necessarily mean that exactly the

12:19 instructions of being executed by each Oh, Walter Johnson, Yes.

12:25 this, of course, another name a single instruction instead of program?

12:29 those two terms interchangeable? No. there is Cindy that is a single

12:42 So that means that's the generic in instruction you add to erase and in

12:50 single ad instruction that works on all paradise elements say off the two Eurasia

12:58 . So it's much more restricted than single program. So in the single

13:05 , as I said, the code look the same or is identical,

13:10 it enables conditions and much more flexible than Cindy. We'll talk about rectory

13:18 on top Olympic about compilation in some lecture. There is, you

13:24 conditional is also unseemly. But it to be that, um, branches

13:31 should not be taken are still being on. Then he just ignore the

13:37 . That's not the case. When have a single program is just excused

13:41 after one. So coming back to particular example, then so here is

13:51 what happens. So in this the loop there has 1000 iterations capped

13:59 default. Unless you ask for anything , it gets split up evenly among

14:04 threats. So in this case, shows that the first thread yeah,

14:08 1st 250 iterations in this loop and the next threat Yes, the next

14:14 50 the next 2 50 in the 2 50. So it does,

14:21 otherwise told anything. Uh, just up the range loop index range even

14:29 the threats. Several science on this . Four threads. So that was

14:37 to say. And this slide basically that one can do the little parral

14:44 by doing it. Kind of um, open MP instruction first to

14:51 the region and then paralyzed the four inside the parallel region. Or Lincoln

14:59 on the right hand side, combine two parallel and four into parallel

15:04 If it's just a single loop body one once paralyzed. So let's zero

15:14 The questions on that before I talk a little bit on this Dana scoping

15:23 . Uh, so whenever you say makes a copy saying, that's,

15:28 , pre process the directive, we're that it makes literal copies of the

15:33 , right? Not just as a good question, I've actually been trying

15:38 find out the real answer to that our failed. So I am not

15:44 whether the standard forces the copying of code, but, uh, it

15:49 a bit unnecessary since it is just memory system, so every threat has

15:55 principle should have access to the code then it has his own instruction

16:02 so it knows what instruction it actually . So it's not clear to me

16:09 it's necessary to copy the code. I'm sorry I was trying but so

16:14 failed in finding a concrete answer, it's in the implementation option for the

16:20 writers or it is required that the be copied. So sorry I don't

16:30 a better answer at this point. other question? It doesn't matter for

16:43 for the execution off the program, it has obviously impact on the memory

16:52 . And they could have minor impact execution if several threats tried to access

17:00 same construction at the same time, which case they had to be

17:10 Unless it's so depending upon how the system works. Uh huh.

17:18 yes, about the scopes. So default rule is that anything declared in

17:28 sequential region is available to any threat the parallel region. Conversely, anything

17:43 inside the parallel region is private. threads in the region so things do

17:52 kind of get exported out of parallel into sequential regions. But second,

17:58 regions inherit things from hmm sequential Mhm. And then there's also the

18:08 that for function calls or subroutines inside region. Things are private,

18:16 Um, whatever goes on in there or subroutine. So one needs to

18:22 very careful and managing what shed and is not. To make sure one

18:30 they both correctness as well as, , more performance related behaviors off the

18:38 on it. To be this Sai is the most tricky part in terms

18:44 open MP to get it right and it perform well, you say

18:51 Do you like it's not that zone . Just one copy amongst all of

18:55 . So shared in this case means just one storage location for the variable

19:01 anyone can get to, and it's , and there is a separate member

19:07 for each thread for the variable. this is what I said. So

19:16 the attributes clauses then that tells how variables should be treated in the parallel

19:26 , so I'll talk about them on following slides so shared as the

19:31 That's something that's one storage location that gets accessed by the different threads.

19:40 it's ah, then storage associate It with sequential parts private means there's a

19:48 , um, memory location being allocated the variable. Even if it has

19:54 the same name as in the sequential , I'll show example, some of

19:59 things work. And then there is other following to hear first private and

20:06 private. And I'm sorry, that . What happened then, son?

20:27 so you guys have my share right? So I just want to

20:29 sure that I haven't that stuff. okay. So on So where we

20:40 so the first private and fed um, and less private. It

20:49 a little bit how things are manage the private clauses in effect, First

20:57 essentially has to deal with how variables initialized, and that's private. Our

21:03 in private variables are kind of exported of parallel regions, and then the

21:12 , um, rule for what's fair what's private can be over it.

21:17 by having it the false statement that defines what the default should be a

21:22 to that default toe. So Um, mhm is again. As

21:36 said already, I think they responded the question that is a story

21:41 There's a single one that everybody can or write, but the fact that

21:47 can deal with it. Since threads kind of a Nord in their

21:54 that means they can also try Thio read or write something at the same

22:01 . And then it could also be the outcome is undetermined in terms of

22:06 got access to do what. So condition is one problem when there are

22:16 , uh, variables. And it's that the use of responsibility to make

22:24 that things are ended up being correct here is just a tiny example in

22:32 case, access to find in the regions or by default. It's a

22:38 variable, so that means all the in the paranoid region defined through the

22:43 and Pearl now can update this variable , you know, implemented by

22:51 um on. Then what? X ends up being afterwards. If one

22:57 lucky, the trends do it one the time, and then it just

23:05 the value or five. Plus the of threats are being used, but

23:09 can also be that they're trying to it at the same time, in

23:13 case only one gets access and the value may not reflect the number or

23:19 were actually being used in the parallel . So this is what it

23:25 So now the when one declares a to be private, so that means

23:35 variable that Teoh most likely than I in the sequential part. That's that's

23:41 same name. But you want to that same name in the private

23:47 but not share it with other Unit try to be private on.

23:53 means the new memory locations allocated for variable in there for each thread in

24:00 parallel region. The point to be , although it's it's allocated, but

24:07 not initialized, so one has to care to. Initialize has been one

24:12 or another. We'll talk about ways do it, Um, one can

24:18 it inherit the value that is existing the sequential section or otherwise in

24:28 Do some form of assignment, as I said, because it's a

24:32 memory location, so it is the thing to be right off. It

24:37 the allocated at the end, and you want to keep the value at

24:41 end of the parallel region. It up to you. So if you

24:46 to export it, one uses last course that I was talking about.

24:53 here in the next couple of I think so. Here is now

25:04 clear the state. This is not good example is not something you should

25:08 . Thio, Copy and remember to contrary. So given what it says

25:15 top and what I have said and one sees, and the issues with

25:20 particular code of a piece of cold accessing a very able that has been

25:28 allocated right. If it's been flushed the time you leave the or,

25:33 guess it'll print the 01 they It was the one that you created

25:37 will have been destroyed. Temp. you exit the probably region again,

25:49 says open and prepare a little So on Lee, the forced Luke

25:53 paralyzed at the end of the four attempt that was used inside. The

26:00 is lost and it is attempt that and was defined in the sequential region

26:10 is still valid. So that's why prints out temples initialized. It's the

26:17 on. Whatever happened in the parallel is lost and any other comments to

26:26 particular coat. So it follows the scoping rules. Yes. Okay,

26:35 the other thing that's temples, not . So that means who knows what

26:39 value is. Each thread will implemented and add 1000 or whatever value

26:46 Waas having one. It was you know, sometimes, um,

26:52 default compiler writers may actually include that things are initialized to zero, but

26:57 absolutely no guarantee what that value might . All right, so this is

27:06 way, as I mentioned, to the value if you use the same

27:12 name in the parallel regions on this , increments is used both inside the

27:21 region and outside. Um, but the region, each threat against his

27:28 , uh, memory location for increments everything has his own increments variable.

27:35 by the first private Klaus added to parallel four construct, that means all

27:43 memory location for increments for each one the thirds. In the parallel,

27:48 gets initialized 20 that it was in global region. So there's 600.

27:55 as anyone can go through and see the loops are doing here and the

28:01 on the first private okay, that's is the opposite. That's in

28:11 Thio export. So this big the name from something that has been unique

28:24 a threat inside the power of the to get it out. Then I

28:28 the last private, that is then the last value off exes among all

28:33 threads in that parallel region is the that gets, um, then cop

28:41 into the global variable X or sequential X. Um, they're not storage

28:52 ID, but that's why it's kind a copy from the parallel region.

28:57 to this sequential. All right, what I said. So now try

29:06 ask a little bit more questions to little example. So there's three variables

29:13 B and C that are initialized in sequential region. And then there is

29:19 parallel region started with on the cause that the private and see It's the

29:27 private variable. So now I'll see you guys can help me decide what

29:36 are in the parallel region. So start with I guess a and it

29:46 on what a might be in the region. Well, the default is

29:56 , right? So it would be amongst a lthough threats. Right?

30:03 , general, that's an equal Next they do, it would be

30:07 see. And in comments on those guys, No, no one volunteer

30:25 accept Island is good and commenting anyone or Yeah. Then you're welcome to

30:35 . B is each. Each one a sound copy of B, which

30:38 initialized. I'm sorry on initialized because doesn't have the first before it,

30:44 ? Yeah. Okay. Yes, . So what's he did? So

30:49 means, um now, let's So once one is done with the

30:56 region, then what are the values a B and C? Let's

31:10 after the parallel region, uh, would not have, uh I guess

31:16 is dependent on the operating system. the last one that wrote a which

31:22 arbitrary order unless you specified it. I think it would be, uh

31:26 can't exactly say, um, for , it would be one. And

31:36 , see, it would also be because we didn't use any of the

31:40 private clauses. Correct. So So this is correct that so Because

31:48 was exported for B and C out the Charlie religion, that means whatever

31:53 might have values they may have had kind of lost on what there is

31:59 existed in the sequential region and for correctly, depending on what happened to

32:06 in the parallel region, it's nothing . It will retain his value.

32:10 . Otherwise, it will be whatever left assigned to it in the parallel

32:20 . Eso this is, um, critical aspect off, um, open

32:36 . So the rule is that the look variable. But in this little

32:47 , the high index in the for is private to each threat. That

32:57 not something that has to be explicitly because, uh, hopefully you can

33:03 if it ended up being a shared , things would be a mess.

33:11 even if the loop is paralyzed, , different threads, you know,

33:17 any threads, that's supposed to you know, wrench within the

33:24 uh, integration count on it. wants to have his own notion

33:30 which I accept. But if some threads has another idea what I should

33:37 , then things will become a royal . So for that reason, do

33:42 disease. Um, for loops that paralyzed bye specifications has to be private

33:54 in this case, everything else. . Not nothing is defined in the

34:00 regions. That's all chair. And , um so this means, I

34:09 , which is the point. That perhaps easy to miss is in this

34:18 , it's the J Loop that is if from one's nested loops.

34:27 there were, uh, inner so to speak, in nested loops

34:32 be paralyzed, one has to explicitly so. Otherwise, it paralyzed salute

34:38 follows the parallel for stick. So means, in this particular case that

34:47 thread that executes some range of J disease is going to carry out all

34:55 orations off the eyes of soul. kind of makes sense from what?

35:05 programmer intendant, Right After each J , you're supposed to go through the

35:10 range of I indices, but in case, everything is that, um

35:22 are maybe not literally, as we earlier, but it's executed as

35:29 Each red has a copy on the look, but it's not a private

35:40 is so So this is what I'm to say. So now I think

35:49 have an example here. We'll spend little bit of time on. Try

35:53 you figure out if this thing may or not. I think it's a

35:59 thing to start. We're trying to which variables are on shared in which

36:05 are private. Two threads. So see if anyone otherwise off is volunteering

36:19 time, too. To try you can start with hi J and

36:28 whether they are chad or private when comes to the garland regions. So

36:50 they are defined in the shared So they are chaired. Hi,

36:59 on. Then we'll come to the low. Pamela just went through

37:03 The I in the for loop, is private to his thread in order

37:10 to create a total mess. is defined in the private regions.

37:18 . Is private to each threat And we have the forge a loop.

37:27 volunteers about Jay is a private or mhm that should be shared because it

37:39 declare a new job. And yes, correct it because it's not

37:44 loop that is paralyzed, and that's it and stop being a shared

37:51 So here is a little bit, we listen. That plant, a

37:54 one, is. Hopefully sit down your own and look at this particular

37:59 and see what shared and what's So, um So I was pointed

38:09 since J. Is shared and commented the previous slide. That means the

38:15 threads that even though they have their specific range off in disease that they

38:24 on, they all have a copy . Therefore, J. Luke and

38:31 is to shared variables. All of threads can, uh, read an

38:38 the J variable. So that's why become M s. So this is

38:46 little bit hard to see. you can see it on your

38:52 It's a little bit faint. So is, um, citizens suggestion.

38:58 enough to run this code as it with two threads and put an equal

39:03 four and just taking it as it now, if you look at this

39:13 , so in principle they had n four. That news, Uh,

39:19 I in the school from zero through . For each of these I in

39:26 , we have five j indices. that means, in total, the

39:36 Cali should be called four times 5 times. But as you can

39:43 things terminated after seven calls, but was 10 also in there,

39:50 So they're not ordered, as we in terms of what gets printed,

39:54 it's not 20. So that is different threads. And, you

39:59 there is no kind of order somebody , Um, not there are somewhere

40:07 J equals one in five and then four or something. But then some

40:14 threats. My reset, the So then it goes back. And

40:17 you look at the sequence of J for a given, I iteration,

40:26 not necessarily incremental. Um, in case, they, um what they

40:39 . And but there is one that got a equals six, but again

40:45 also some some high iterations didn't get the all the J it rations as

40:50 , right? So it's kinda useful take a look at it and see

40:56 they ice. And Jay's. Asai that, um, for so the

41:05 interest case had two threads. So zero gets, uh, equals to

41:10 and one and thread one guests I index two and three, and

41:17 since he for I constitute, that just one guy that immediately got terminated

41:25 they got J six, right? then what? So this is what

41:34 just went through. And then So just did this thing also then to

41:39 giant print make gay private, And you can, in fact, see

41:46 things are getting orderly. So for I index, all the files j

41:54 iterations were done. Um, if have it story from 1 to 5

42:01 did check it. So I think this case, everything got done as

42:05 waas intended in terms off, most intended. I didn't think they want

42:11 be random Africa. So this is then. And when James Private each

42:22 index Scott, it's five integrations off jail. Ooh, any other thing

42:30 you might spot. If you can see on your screen that weak

42:38 There's actually one more problem. That things show that they're a was not

42:44 , that it turns out in the situation, whatever was there before,

42:47 it's not guaranteed, as I mentioned be nice, nice value, like

42:52 . It could be anything all Any questions? Comments on that

43:09 But again it shows easily. It potentially to make mistakes and writing open

43:15 code in terms off variable sharing among , and it can be kind of

43:24 to develop you. That's why it's recommended that one is very explicit in

43:31 what the variables are private and If nothing else, it should make

43:37 bugging somewhat easier. Hmm. Declaring that are private or making things

43:48 Just to be safe, however, have some costs because it means that

43:53 memory locations gets allocated for things that necessarily need to be private. So

43:59 may have cost in memories, and is just, uh, what this

44:07 clause does. That function makes the overwrite. Whatever the compiler writers are

44:13 to be. There, default make things being very as the fault

44:20 being shared, and then you don't Thio specified a teach on the constructs

44:27 may involve the variable. So this what pretty much this cause. So

44:38 , in terms off that variables being or not, there is also the

44:46 private. So you said that private that the variable is private to each

44:53 . In a parallel region, the private is somewhat different. So it

45:01 that variables declared to be third private private, too. Um, each

45:10 so he's threatened, gets a Remember location for that variable, but

45:16 is preserved outside the primary region, that's the main difference. So,

45:32 , to some degree behaves a za variable, but it's not identical.

45:38 this sense. There are separate memory for each one of the threads,

45:42 it's not just a single variable in sequential region, and then they also

45:51 to be initialized. And then we use, um copy and as a

45:58 off initializing, for example. And also data statement that we'll talk

46:04 uh, later. So this one can to the traditional fort call that

46:12 would use if you were doing this my noi. Yes, The thing

46:17 that when you create a parallel you have a bunch of friends,

46:23 ? So all of a sudden, you have a bunch of copies off

46:31 with the same variable name as in sequential region. So maybe, you

46:36 , sequential region with one variable name and in apparel region than you

46:41 say, 10. Um And now turns out in this particular case that

46:52 master threat private variable is storage associate with the sequential region. But the

46:57 nine are not on. They still to be initialized somehow, except the

47:05 Fred. So, um, so has to deal with again management of

47:12 data. And the forking does not , uh, anything explicitly about how

47:21 various, uh huh kind of replicated are allocated or initialized around. So

47:34 have an example. Hopefully, it'll it shed some light on what this

47:37 means. So I guess, example. So in sequential religion,

47:46 defined maybe variable I and the third d and the Variable X and then

47:55 have this threat private statement for A X. That means now an

48:00 Um, we'll have dedicated copies for threat inside the parallel region that

48:09 but they will not be the allocated the exit off the parallel region.

48:20 then there is some statement here, I can have the fragment open

48:25 parallel, private. In that for B and the Fed, I'd

48:30 now local to the first parallel A, um is also has a

48:39 copy. Um, in the parallel , Be definitely because it was declared

48:48 be private and X is a The global guy through the threat private

48:56 though each threat has his own So here is now what's the Prince

49:01 is generating, So that prints out A, B and X values.

49:11 , Hopefully, there's kind of no . So both a and B get

49:16 i. D and X. And one plus 1.1 multiplied by the

49:22 I'd again. Whichever thread prints out reaches the prince step and first way

49:33 know. So it's so the jumble terms off the ordering of what the

49:37 are. And it's not just 123 then because the ordering is not

49:44 But we can see a B and third IEDs, and there are zero

49:51 30 and then they are kind of . And then look at the X

49:57 . That also makes sense given the I D. That fed zero.

50:02 , it's basically one plus zero. it's the one and the other one's

50:07 probably incurred. Well, explain this , properly, not supplied by the

50:14 idea and added the one. So not two. Remarkable, I guess

50:22 . Then there is a sequential and then I guess it's the more

50:26 part what happened and, uh, second probably region. So in

50:39 willing to comment a little bit on this print out for the second private

50:45 makes sense. So now, since and X were decided to be or

51:09 sorry to be threat private, that they're not a copy for each

51:17 And because it was thread private and private, they are preserved after the

51:24 private region. So that means all and X values that ended up

51:32 um, the value for each one the threads at the end off the

51:38 part of the region are still in . So that's why if you look

51:45 the A and X values for the threads, they are exactly the same

51:51 they were and the first panel So more or less just the difference

51:59 private and direct private is persistence among regions graph. And however for

52:09 that was the private variable to be the allocated and basically value is unknown

52:17 you come to the second parallel So whatever happens to be in that

52:23 , uh, at the time. so it Zira. But it depends

52:30 actually is there. So it's not guarantee that it would be zero.

52:38 So this is just showing the cop just initializing by the Ted privates variable

52:48 using the copy, and they're being value from the global to each,

52:55 , one of the threats. And there is another one that allows Fred's

53:04 share values assigned so one thread can . Ah, it's trend private

53:15 too. Yeah or other Should not private threat private but private variables to

53:21 other threats. So there is a off copying or broadcasting values among threads

53:30 what's known as a team that I mentioned all that much. But I

53:34 talk a little bit more about Maybe not so much today, but

53:38 time. But it's basically the team the collection of threads created for a

53:44 region when you have nested things that a little bit more complicated. But

53:51 more so when we talked about Andi, that's what we're talking

53:57 hopefully next time. Otherwise the lecture next there is more work sharing constructs

54:09 the parallel four we talked about. there's three more that are good to

54:13 . The master single in sections, the master simply says that whatever their

54:21 of code, that is, designated to be, um, or

54:32 close, what they call the practice this case is the open, empty

54:35 . The best that says that that cold should only be executed by the

54:41 Fred. There may be reason why wants to decimate some piece of cold

54:49 to effectively the sequential, but more more so than that. Um,

54:55 is only the master threat that can it if one is more flexible with

55:00 one of the threads, even though one at the time or only single

55:05 , it's not one at the Sorry, I shouldn't have said that

55:09 the other threads ignores it and just that piece of code and continues.

55:16 the other constructive fun doesn't care which execute that than one. Use a

55:22 construct that this similar to the except it doesn't earmark the master threat

55:28 be the ball. So in this , anyone on the trends can be

55:33 , and the section and construct is giving out blocks of code two different

55:40 . So this shows an example that three pieces of called X y and

55:45 on it designate the X calculation One thread. Why, to another

55:52 on DSI the calculation to a third . And this it says here,

55:58 there is, um, more sections threads than the threads takes turn until

56:03 are exhausted in terms of sections. , if there are fewer sections,

56:10 threads than some threats gets not becomes because they don't get the scientist

56:22 there's the flow control constructs Uh, , and that's maybe I'll go through

56:31 on. Then I would probably leave to so yesterday. The demo.

56:39 the flow control constructs Is there one at least the first one the barrier

56:47 may be obvious in probable execution Come that month. But we need a

56:55 threads occasionally, depending upon the logic the program to synchronize before anything else

57:01 . And the barrier is the most way of doing that. Basically,

57:07 , um, it's a synchronization point threats I get there earlier, White

57:16 the last threats arrived in terms of same part in the coat critical,

57:23 Atomic has to deal with things that only be executed by one threat.

57:28 talk more about what these things are then a conditional if and no

57:35 no, wait is if synchronous station required. Um, one can change

57:42 default behavior for some other constructs, , not have an implicit barrier and

57:52 about these things. So the barrier a simple example Here, that one

57:59 , uh, parallel section original Um, but in that case,

58:07 I get to do the be equal wants A to be fully computed.

58:14 that means all friends working on a to have done their work before you

58:19 anything would be so That's what the says. So I think that's supposed

58:27 know magic. Um, about this . Let's see what I had in

58:33 for this one. Yes. So this case, um, when man

58:41 so there's a conditional if only um tried with Heidi zero is the

58:49 that incumbent find that is a global . So depending upon what happens between

58:58 threads between the prince statement, uh, it generates to sort of

59:07 depending upon whether thread zero has implemented or not. Whereas after the barrier

59:16 guaranteed that things will be fine. , so that's what is that the

59:27 think there's a mutual explusion exclusion type . So that means one piece of

59:34 can only be executed by one threat the time. So in this

59:41 it's not the case that thread saying except one that get to do

59:46 uh, thirds get to do but only one at the time.

59:53 in this case, rests is a variables. So everybody wants to update

60:02 . So what? This construct, , enables. Or it gives correctness

60:10 in this case, the race condition eliminated. Otherwise, all the different

60:15 may, or at least some of threats may try to update dress at

60:19 same time, and only one will . In this case, all threats

60:24 the increments. The rest variable. the critical because only one at the

60:31 and then thistle is probably I'll talk that. Next time I'll just finish

60:39 the atomic one. And then I suggest to the demo. So the

60:45 is somewhat similar to the critical except refers Thio. Individual memory locations critical

60:56 be used for a code segment that executed just by one third at the

61:01 , whereas atomic is yes, um updates or reads our actions on

61:11 single memory location. So it could you been used in the other example

61:16 it was only one variable, the variable. But so I guess maybe

61:23 was a for example. So I then, uh, stop here and

61:29 talk about this next time. So get time to do the demo,

61:38 then it comments a question while so , maybe getting himself ready.

61:44 So you talked about the difference between process and a thread yesterday?

61:49 but what kind of, uh what we be aware of? When when

61:55 , Um, in the sense that I was going through operating systems,

62:00 would say the idea and the P for process idea versus Threat idea.

62:05 there any nuances asides from the ones you mentioned on Monday that we should

62:10 aware of in terms of the difference a process. And if you

62:15 the only thing I can think of that threads have a lot less baggage

62:20 processes. So that's why that tends , um B'more efficient in execution and

62:29 also in terms of memory required because process has a lot more information associated

62:36 it than threaten. Okay, so of the lightweight thing that you were

62:43 about last time. Six. So why I, for instance, one

62:51 use which I was talking about the lectures, this message programming interface or

62:57 I for short, that is process . So one can certainly use

63:04 P. I for doing, no programming and have multiple threads on

63:13 parallel processing by using MP I instead open empty, but is at the

63:19 level. So it's much more overhead using open and people Okay, I

63:31 can take it up on that, next time. And let's just do

63:38 thing here. It's, um So everyone see my screen? Yeah.

63:48 , great. So today I will give some basic demo, uh,

63:54 for the open MP constructs in the next lecture, we'll see some of

64:00 more advanced constructs that Johnson talk towards end. Like critical and atomic.

64:05 those stuff, we'll see them in next lecture again. So this demo

64:10 be mostly Q and A based. I will ask questions based on the

64:17 courts. I will show you and free toe. Guess what? The

64:21 . What outputs will be eso before start with that, uh, some

64:27 information about a code when you want , have open and be,

64:33 support with it and run multi travel . So with open and the first

64:38 you need to know is you need header file up here. So you

64:43 your C programs. So that's RMB . You need to include that and

64:50 you can start using all the open constructs and open and be function calls

64:58 compile a program with open and be . You can use either Intel Compiler

65:05 G and G C C Compiler with compiler, You can, uh,

65:12 your coat just like you would But you need to add one extra

65:16 , which is the high phone que empty. If you happen to use

65:24 , then this flag becomes F MP. So that's just the only

65:30 between intel and, uh, GCC . We'll use I see my hyphen

65:39 open and be provide our source code and the output by lame and just

65:46 it. So first question here uh, to have all of you

65:53 what will be the print state output this print statement and what and how

66:01 will be the outputs for the second statements Notice that this this call here

66:09 being has been commented out. So guesses what would be doubt, But

66:19 first print statement, uh, it get the number of threats of that

66:28 machine is capable of, right? it's in the cereal, um,

66:33 of the program. Okay, I'm saying it's right or wrong, but

66:38 , but what about the second print ? Um, not sure exactly how

66:45 works. But assuming parallel defaults to max number of threads Okay, then

66:50 would print the number of the thread total. So you should you should

66:57 the number of print statement should match output that you got from the serial

67:02 statement. Okay, so a Z see this. I'm giving this them

67:07 the bridges, compute notes, which , uh, to course,

67:11 two processors with 14 corsage each. in total, we have 28 hardware

67:18 . So let's see what happens when done this. So this is the

67:27 with the first print statement. So region will always print number off threads

67:33 one when you call or go, , get non threads because,

67:38 in serial reason your program or process you want to call it, it

67:43 has the master threat. So whenever call it in cereal section, it

67:47 only eat it on one. For you have this drag Malindi Battle of

67:55 , it actually depends. It depends what the value off Uh, this

68:08 variable that there's 30 M. P threads is so see here that this

68:15 environment variable is not set on bridges stampede. If you go and

68:19 it will be set toe one. since here it is not said the

68:26 Muay MP parallel section will take the number of threads. Toby, the

68:31 number off art where threats that's So in those kids, that's 28

68:38 Stampede. If you just simply go run it, it will likely print

68:43 second print statement only once, because default, open MP has been configured

68:49 have this environment variable as one on to. So it depends on how

68:55 the Open MP has been configured. , let's say if I go ahead

69:03 remove, uh, there's a comment here, and let's set the number

69:09 threads explicitly. Now, what that is the, uh who MP set

69:18 threads Call. It supersedes thean environment so it can override the value that

69:26 the environment variable And now if you it, your parallel region will only

69:32 a threats. So it does not if your environment variable had a different

69:39 . Theo. MP set non threats always override the value help for that

69:45 variable. Does that make sense? this this goes off or I'm not

69:53 it just has hyper threading enabled enabled . Bridges does not have hyper threading

69:58 . So on stampede, it will 96 I believe, uh, print

70:06 If you don't said anything. Yeah. The second example here is

70:16 a few Open and be open and calls so open and be max

70:23 It prints the maximum hardware threads that that you can have Lambie get

70:30 Prague's here is um ah, it to the number off course that you

70:40 you have when we get thread That's the maximum number of threads that

70:46 respond and number off places so open be defines places by default as a

70:55 . So can either have a value thread a socket. Oh, are

71:04 tread core socket or node. So default, that's, uh said to

71:10 . And since we have only one access right now it will count number

71:17 places as one. And then inside , uh, that place I'd It

71:27 then tell you the number off rocks number off course that are present.

71:32 compiling this one again eyes same as previous one. And as you can

71:38 , Max Threads, we have 28 threads. We have 28 cores.

71:45 is the thread limit supported by opening on time. We have a number

71:51 places. So since we have only note and the i d off the

71:56 place number in the whole place so since we only have one

72:00 so it's ideas, said Thio. uh now the next exam called It's

72:11 one. So again, question for . Uh, we have this call

72:18 setting number of threads here to 28 we have this four loop that's paralyzed

72:25 16 nutrition's. So my first question how many threads will be spawned for

72:32 case, since the index is lower the number of threads that probably spawn

72:45 of them and give one generation to one, right? So it's,

72:51 actually not true, but because it spawn 28 threads. And when it

72:59 or reaches this parallel section, it the open and be run time will

73:03 mine. How much work can uh, distributed amongst the threads.

73:10 in this case, only 16 threads get to perform the world because it

73:15 to evenly distribute work across the and the rest of the threats will

73:22 idle or depending on what opening period does with it. They may

73:28 uh, just get deleted. Let's go ahead and run it.

73:36 As you can see, we have threads that are performing work, and

73:42 thread has been assigned with one alteration those 16 situations off the affordable.

73:53 let's say we said number off threads than the number of federation. Now

74:01 going to happen? It'll it'll give her her thread. Correct,

74:14 so it will spawn for threads. again it will try toe evenly,

74:19 the work across all the threads. in this case that, uh,

74:25 , um, there will be four and each will perform four reiterations off

74:31 for loop and noticed that here. , Theo traditions that are assigned to

74:38 thread are sequential. So 012 and were assigned to trump zero. And

74:44 on in four and 4567 to thread eso for the previous example.

74:50 I know the iteration and the threat matching, but I'm assuming that's implementation

74:54 find right, not part of the . It doesn't matter which threat gets

74:58 portion of the work. Yes. there is a option. Uh,

75:06 modify this behavior, uh, which can be said by the clause that's

75:12 schedule, which we will discuss, believe in coming lecture. So by

75:18 that tries toe do a a static . So in that case, three

75:25 numbers and threads matches. But if said the scheduling two dynamic, that

75:31 change. Okay, Well, So we'll see that close next

75:41 Uh, so yeah, So this example is a little bit interesting,

75:45 I'll give one minute for everyone to . Look through it, um,

75:51 see what's going on. So just give a summary way have ah,

75:57 have two areas and be I uh, the first element off this

76:04 zero thistles, the initialization. Look we initialized a with some values.

76:11 we come out of that and said off threads as four. Uh,

76:17 we have our four loop, which penetration. In our case, Nutrition's

76:22 eight. So four threads and edit . Now, the interesting part is

76:28 operation. So can anyone tell me might be wrong with this with this

76:38 or operation here? Eso It seems high is a shared variable,

76:49 Mhm. So, uh, it be It will be updating and unexpected

76:57 because, for all we know that access to a I might be a

77:01 eye than the assignment to be I , Right. So I actually is

77:07 private variable because since we signed the that loop indexes are private two

77:16 But yes, your answer is partly . Because let's say a zero,

77:24 , went to trade zero, but let's say B zero and,

77:34 B one and B zero went toe zero, and B two went to

77:42 next thread. But the next thread also need access to be one that

77:48 I minus one and in case the thread happens to execute this instruction after

77:58 before the 1st 1st thread, Then output may be messed up to simply

78:04 that there is a data dependency amongst the little bit rations. And if

78:12 threats do not operate in sequence, , uh, you're out. May

78:17 be correct. Let's just go ahead run it. So as you can

78:25 you just in the 1st 1st, , you can see that there's for

78:32 threat to it required. It calculates five using a five and before.

78:40 if you notice before has not been yet, it gets calculated down here

78:46 thread one and the value that threat took for before zero, which should

78:54 not, which is not correct. that's why our output bees, they

78:59 not come out correct. So the here is that if you have data

79:04 in your low penetrations and you are careful about that, your outputs may

79:11 come out correct. Does that make ? Yes. So you said the

79:19 indices are always private. 44 right in This is that private. But

79:27 one threat tries to access in um, element that belongs to some

79:33 thread. Then there is a data between those two threads. Okay?

79:39 that happens in this case because of , uh, one. Right.

79:46 . Uh uh. Next sounds are . Is this one? So we

79:55 , in this case, a simple , uh, battle region.

80:02 as you can see, uh, are to a main threads that are

80:07 spawned for for the outdoor region, then each off the, uh,

80:15 outer region spawns two more threads using applause, which is non underscore

80:23 So my first question is, how threads, Uh, do you think

80:29 in total? I want to stay , but I'm not sure if that's

80:39 too obvious. It was mhm. . Anyone else? So intuitively.

80:53 it again. Six. Uh, , so it wants. Uh,

81:00 was right. There will be four spawned. So but yes,

81:05 logically, if you just look at code, you would you would say

81:08 threats. But that's not how the and parent time works. Eso if

81:13 if you just think logically. There's two out of threads each spawning

81:17 threads. So that's for four more of, uh, except except the

81:24 main traits that becomes six threads but open and peed on time. What

81:30 does is, or I should fortunately, what it does is since

81:35 tries to reuse the already spawned So since it already spawned two threads

81:41 for the outer section, it just to more threads to fulfill the requirement

81:46 four threads inside, uh, business section. So what it does is

81:52 saves the all the software stack or and everything for the outer outer region

82:00 use it reuses those two threads that for the outer outer region for the

82:07 the region as well. So the is, in total, there will

82:11 four threads. Now, a second . What do you think will be

82:18 output off off this program if I run it like this? How many

82:26 statements you will see? There is print statement here and once print statement

82:31 . So how many print statements will ? Will come out on console in

82:40 case would be sex. Right? . Uh, I wouldn't have asked

82:47 it was so simple. So in case, what happens is you got

82:54 zero out one. So you got to to print statements for the outer

82:59 threads. But you only got, , to print statements for the inner

83:07 the region. The reason being, , to enable working off nested parallel

83:14 , you need to set, this environment variable, which is going

83:19 nested the true. Otherwise Nestor regions not work, and this is important

83:25 it will be used in your next . So make sure you do that

83:30 you work with Nestor battle regions. , once you said that, you

83:34 see the correct output. So you out zero and out, one for

83:39 sprint statement for the outer two and then you get for outer zero

83:47 you see in zero and in one the same for outer first thread.

83:53 see, uh, in zero and one, uh, one more thing

83:59 take away from this example is that thread IEDs inside Ah, parallel regions

84:07 a nester parallel region are private and , so as you can see how

84:12 zero as in zero and in one well. And outer one has in

84:16 and in one as well. These are not two and three days or

84:20 and one. So, uh, that are assigned to threads inside a

84:27 of regions eyes our unique for each region. Any questions on that?

84:35 might. We're kind of out of . So thank you to kind of

84:40 problem we can. It was too . Can really do it a little

84:45 next time. Uh, structure, there's a quick one that you want

84:51 comment. I had one more question the assignment. Um, so rappel

84:58 , uh, both CPU energy and main memory energy DRAM. Um,

85:07 the it's us to compare it to thermal density that's provided by intel.

85:14 , and Intel provides it at the level. So are we more or

85:18 throwing away our valley for DRAM and using the one for the CPU?

85:26 guess it provides it for the But we're using one core right on

85:31 being said if we're trying to make accurate measurements, we would divide that

85:37 CPU thermal density by 24. Since only using one core on our

85:48 Yes, on. That's, um okay. Uh, Thio do that

85:55 long as you explain what to do Yeah, later on, I think

86:04 some of becoming assignments, it will or course. So I guess it's

86:08 long way of saying that in addition the core energy, there is also

86:14 parts of the chip that consumes So the energy consumption, um,

86:20 the packages in the inter concept is , um, directly, um,

86:30 to the number. Of course, a bunch of off. Um,

86:35 that non as intercourse encore parts of chip that, uh, concerns power

86:43 is not proportional to the number of . Of course that is being

86:47 So But for now, for this , you know, long as you

86:52 us what your reasoning is, it's . Later on, we'll kind of

86:56 it a bit. Okay, in of the deer, I am at

87:02 separate products or into doesn't say that into doesn't give you any part number

87:08 max power number for the memory. it's at this point most of for

87:15 information to see a little bit the between chip processor chip memory, power

87:23 the memory of power on it's possible one goes on the weapon, find

87:30 the power consumption is for one of dims. But, uh, then

87:35 need to go and look at how deems is it on this particular platform

87:40 find respect for those dames. And was more than we intended or I

87:46 you to do for this assignment. , yes, that's their own

87:51 Depending on about generation of memory chips is using, um, then it's

87:57 just the size on how many gigabytes is that you have, but also

88:02 scene. Most technology generation being used typically is in the 4 to

88:08 What range for them when there are active the idol energy for games are

88:18 lower. Okay, yeah, it's true, the General and today

88:24 more than expected. Yeah, that's mouthful, but it's good to know

88:31 had a questions following. So in data, it it depends upon just

88:38 data Mrs Rate, the total cash . It doesn't depend upon the Lord's

88:45 Lord of Wars to Mrs So barely the cash Lord Mrs and distort Mrs

89:01 . I'm trying Thio. See if got your question. Correct. So

89:06 question waas how Thio estimate the the memory traffic. Is that a

89:18 Effectively The question waas Yeah, the rate off the cash. It depends

89:25 the total Mrs Right Total cash, . They don't They don't depend upon

89:31 Lord or the store. Mrs. Well, the total cash Mrs depends

89:39 the load and storm is is since kind of the aggregates, including

89:44 This is so that's why I think advice to dio last level cash a

89:51 three cache misses and look at the number of Mrs because each one of

89:57 , whether it's for instructions or for or stores, will cost memory

90:05 We are just going to consider the one, right, because it considers

90:10 the mother Mrs Right. It's included the total and for the level three

90:18 , um, stampede processors a Sfar started to call Poppy can get the

90:27 number of Mrs but they couldn't separately instructions, stores and our breed and

90:33 , mrs. But it presumably collects total number. Mhm. Okay,

90:47 it doesn't. Yes. Is that the separate statistics for lows, Read

90:51 instructions. It just gives the But it's not bad because again it

90:59 one of each miss generous memory It's a little bit different. Eso

91:11 right may actually cause mawr memory traffic Alok. So it's not exactly capturing

91:22 memory traffic. Because when I talked caches have said that most,

91:29 cash policies today has a so called allocate. So that means a right

91:36 in itself, cars both the reason the right. So in that

91:43 right, Mrs Jim may generate more traffic than instruction, read or

91:49 No. Okay. Okay. Thank for glad your finger, you

92:08 And yet that question that was all the

-
+