ICS Video Player

COSC6365 Intro to HPC Fall 2021 - Lecture12_OpenMP-III

Transcript ×

Auto highlight

Off

Font-size

00:04	what? Yeah. So today I continue to talk about open empty and

00:22	leave room for suggest to continue to or some features of open empty And

00:30	didn't cover all the slides I had the last lecture. So I will

00:39	with some of the slides that were the last lecture and then and some

00:45	points for open mp. So um , so there was I think a

00:56	bit of discussion to lash examples or about work scheduling. I'll talk about

01:03	first and then some more about quantum and I'll talk to a simple example

01:12	open mp and hopefully get into a bit or something known as tasks.

01:19	is a more flexible way of dealing using open mp. All right.

01:25	we're scheduling and I refuse slides about scheduling but this is in terms of

01:33	regions, hard to divide up the among threads in a parallel region.

01:41	it's kind of most these who thought in terms of loop constructs was a

01:47	of iterations and the thing I showed example earlier, I think it was

01:53	Spectrum application maybe. And there were 1000 integrations and there were four threads

02:01	each thread that A Block of 252 . So that's kind of the default

02:08	of doing business. But I can what's known as the number of

02:17	Uh I'm not in the context of loop that is assigned to a friend

02:25	that's known as the chunk. So chunks to is just the number of

02:29	in the case of a loop or amount of kind of atomic piece of

02:36	sort of block work. That is scientific threat and one can introduce something

02:45	a static or dynamic assignment of the . Two, the threats. So

02:55	there are more chunks than there are and threads, just the chunks or

03:01	block of iterations assigned in a kind round robin manner. Whereas in the

03:09	case however, get the job done , get to grab a new

03:16	So basically the chunk are kind of on the Q. And the threats

03:21	pick off the next chunk in the and I have some pictures of

03:27	Let me give you some more intuition how it actually works. Then there

03:34	also another version known as guided, which case the chunk size is not

03:45	but it tends to be on the with the big chunks and one can

03:52	the chunks. That's and that's depends how the compiler guys decided to implement

04:02	guided procedure and the junkies tend to the names of this largest chunk and

04:11	it gets subdivided and then one can let the runtime system decide what is

04:19	good chunk size. So the later or more work at runtime and it's

04:25	flexible and less sort of control from user and you kind of use those

04:35	ideas when things may be highly data in the making directorate for making,

04:42	small partition, there are no conditions the loop. So the work is

04:48	at compile time, particularly if the size is known, but since there

04:56	no data dependency, but generally that's true. So in that case my

05:01	want to choose this more flexible scheduling hope that the runtime system has a

05:08	guidance for when it knows what's how to assign the chance today.

05:18	so there's kind of a little bit terms of uh when things may

05:24	as I said, when things are known at compile time, the static

05:29	probably the best option because than the or the effort in dividing up the

05:38	is done at compile time. So one time things can be very

05:44	Whereas when there starts to be unpredictability the amount of work in a looped

05:50	the dynamic me ends up being the version and the more one goes down

05:58	the top towards guidance and towards then their own time system has to

06:04	more work. And sometimes uh, has kept around. So in like

06:13	instance, the order option that one see from previous executions or during the

06:21	on the various chunks has been done try to make some predictions of what's

06:27	good allocation of work and threats might okay. So here's some of the

06:35	, one of the graphical illustrations of concepts of scheduling. So at the

06:42	is the static and as you can chunks are fixed in size and the

06:51	in Iran Robin way grabs the chunks they are and to which the work

07:01	divided and dynamic the which thread gets chunk it's not know ahead of

07:11	So it depends how quickly the work finished and the threads king has started

07:19	different times and whoever. So in case threat number three, I guess

07:24	ready to grab a chunk first and thread to and then thread one,

07:29	can also see that three and four he got the chunk was it did

07:34	very quickly and went back to grab chunk And in fact in this case

07:39	4th fret ended up in the end 3, 3, 3 chance.

07:48	in terms of the guided, there a couple of options of how things

07:51	have been implemented and as you can that it starts out with the biggest

07:57	and then in the next round of something, the chunk size have been

08:03	and successively reduced. And the other that to try to reduce um the

08:11	of scheduling but potentially Children starting with large chunks and then reduce the number

08:19	chung soon. In the end there be a few rounds of grabbing chucks

08:28	I think there's a couple of other of uh in this case, I

08:36	know exactly what the code was for this, but as an illustration of

08:42	happened in a particular case and in case using the static um, scheduling

08:50	that's it's not uncommon. So the friend on the first friend I need

08:59	may be used in a particular way end up in that case getting to

09:07	more work than what most of the threads are doing. So in this

09:12	, if in the threat case it's almost a factor of two difference

09:18	the thread that the maximum amount of and the one with the least amount

09:23	work on the other hand saying cold using dynamics scheduling things ended up actually

09:33	very nicely work or load balance between threats. So again, Montana,

09:44	, this is, you know how huh work is assigned to the

09:52	so hard to do. They work was using scheduling features in open M

10:01	and there is just another graph of ways of so on. And the

10:11	in this case using it chunks of fire for dynamic and think I need

10:17	it's not time. It's basically showing integrations in the loop are divided up

10:24	threats. And so any question on scheduling club. Okay town in runtime

10:44	. Both I had used it in other examples before and so we shall

10:49	Houston and the this demo uh on . So yes, a few

10:59	so I just wanted to comment on particularly, never get maximum number of

11:07	and I'll talk a little bit more now. They always just signed thread

11:12	the next slide or two. Mhm , a little bit tricky. In

11:18	of this cap max threads, which nothing has been requested in terms of

11:27	for the region then it actually gets the maximum number of threads available for

11:40	region. On the other hand, one has Before one gets to this

11:47	time function has one way or another to specify or properly have requested a

11:58	number of threads then it takes his from that specification request for a number

12:04	threads so it doesn't really tell you it happened. Forget max threads is

12:13	after a set number of threats has issued. So so what did returns

12:21	not necessarily the maximum available. It also be the next the number of

12:27	requested if that has happened before. wrong time function is called. And

12:35	a certain number of threads that is in a specific number of vets,

12:42	and then think the Houston an example also get the number of threads which

12:48	the number of threads that has been to parallel region but there was and

12:55	examples used also Get the 3rd I'd by using they get threatened number or

13:04	against the idea for a particular Some at the wrong time. This

13:12	queer functions and we discussed this the last week that three different ways and

13:22	Fontaine request threads and one is to what's known as the environmental variable.

13:28	on threads but it has the lowest and then they run fund function set

13:33	of threats takes priority over the environmental and the claws in a parallel

13:43	A statement that's the highest priority. it overwrites the other two effects.

13:49	not the same. And then this and it's just this a few times

13:58	a certain number of threats is not in the number of threads you

14:02	it's a request And on the next it's a busy one. So,

14:10	I encourage you to take a look it and I'll try to make some

14:14	to it because it's probably not feasible see on the screen, but maybe

14:22	is on the projector screen. But I think I'll start from the

14:32	in fact, So there is an save close to the bottom of this

14:39	. So so that can be kind direct and assess the number of threads

14:49	the region or it can be some flexible. So if this variable then

14:58	is false, that means they want fix number of threads for the execution

15:04	apparel region and as I said, the last LCF on this. So

15:12	says, so if I asked for threads then what's available and I'll come

15:18	to threads available in a bit then up to you ever implemented their own

15:26	system, what is going to But if you request fewer dressed and

15:35	available then you get what you Now on the other hand, if

15:46	okay that the number of threads may then if you ask for less than

15:55	available then you may get what's requested you may also get fueled. Mm

16:03	if you ask for more threads is in the dynamic threat scenario you get

16:11	number of threads. It is at as many as the number of threads

16:20	. Now the number of threads it's a function of So it's kind

16:26	slightly above the middle of the There is um says looks wasn't when

16:34	won't continue so so here's that's available is said as the maximum number of

16:46	that the hard work and support minus threats have already been used, especially

16:55	free capability of the hardware. So is capable of as a maximum minus

17:02	already been had told up by their friendships, two other regions in the

17:11	. So it's uh huh somewhat elaborate used by their unconscious system too,

17:20	to a set number of threats and what we try to stay as stress

17:25	it is that request not to guarantee what you get. So any questions

17:38	with runtime functions or threatened assignments to regions. Mhm Okay, so it

17:52	just a simple example, I think similar to worry where they talked about

17:57	I will not comment further on So and here's an example. Again

18:05	illustrating That you set the learner make . So that means now it's a

18:11	number of threads and in this case gold asked for as many friends as

18:16	are. Ah yeah, system can in terms of the number of processors

18:26	then that's the crucial thing and get ID for the threads and then it

18:35	yes, that one on the that's to global variable num threats. So

18:43	, basically says I think I said 11 but So in this case it's

18:50	single yesterday one updated but this is good way or actually then making certain

19:00	which may be useful um hopefully not correctness, but in terms of understanding

19:06	performance of the code that you check number of threats was actually signed because

19:13	, it may not be what was . Okay. Toy. And there

19:20	assignments have suggested to do this kind procedure used request a number of years

19:26	then you check what the runtime system gave you uh it's just quickly common

19:40	and a little bit distinction between the threat is totally logical entity and the

19:46	fred yes, The one sort of continuation of the parent threat in the

19:56	region. This we talked about last in so yes, demo. In

20:04	of that thread ideas are local to Prairie region. This is just given

20:12	complexity of open empty in this current . It's just a good again,

20:21	cheats sort of speak in terms of most commonly used open MPI functions that

20:30	can get along way with And they about 21 functions that are on this

20:38	. No, at the core of one would need to know to do

20:42	decent job with an open em people now I'm going to talk about this

20:51	example. So and the questions Okay, so this example called computing

21:12	. And it turns out there is simple equation that If one integrated on

21:18	interval of 10-1, it turns out be exactly by mm No. And

21:28	computer, we need to do this version of this integral. It's not

21:34	function. So simple numerical approximation of integral is this kind of was known

21:43	older approach that basically approximating? Because integral is the area under the so

21:54	the curve is defined by the The other four divided by one plus

22:00	squared. So the area under that between zero and one turns out to

22:06	hi, so approximate the area under curve antony then use this collection of

22:16	that are the lighter blue versions. And of course it's not exact but

22:22	a decent approximation in particular if the are narrow in the X.

22:34	So I'm going to do this collection rectangles and the discreet version of the

22:45	some warriors sequential code. So in case The internal wall between zero and

22:52	was divided in 200,000 rectangles. And this call does takes this little point

23:04	. So what it does pick the or each of these rectangles. So

23:15	the left coordinator of the rectangle is . And then it takes the midpoint

23:25	you know I and II plus So that's a good point. And

23:29	this is the integral. So it the height of the rectangle at this

23:36	of the rectangle. And then it up all the rectangle areas. And

23:45	the rectangles are all of the same you can not sums up the

23:51	it sums up all the heights. then since they have the same width

23:57	the end it does an application when step size or the width of the

24:02	. So this is just adding up height of the rectangles. So this

24:07	the sequential told. So now to open mp version. So well so

24:20	question. Okay okay so in this um this code was a hard

24:30	That's the tube creds to work on rectangles and then as we can see

24:37	a bunch of these variables or global . Um So the number of threads

24:46	also then one generates basically generate of values. There are as many.

24:55	basically each threat gets his own uh of partial sums or hide the

25:03	So in the two cases you can the blue and reddish triangles and one

25:09	gets blue and one friend gets we'll talk more about that. But

25:14	since each thread has his own some there is really no race conditions to

25:23	about the threats trying to update the same some variables. And then um

25:31	was that the number of threads requested the the two threads that gets to

25:38	a bump and then we have the region here for the open Mp.

25:44	in this case I. D. and threads without the ea so there's

25:52	variables when I say them and sounds same but pay attention to there is

25:59	different and threads in this code, one without DNA is local. And

26:06	each thread in the primary region. huh. And usual thing and then

26:13	threat gets his own I'd and then also up this is local private and

26:24	without the a from reading the number threads that was a sign. But

26:35	we have this global variable and friends are divided defined up here that then

26:41	get updated by one of the french . world race conditions and then we

26:51	this slope that than each one of threads get to welcome. So there's

27:03	parallel four here. So basically all called chunk here is executed by all

27:09	friends and in this case to threats since again they have their own partial

27:17	variable here that there is no race and updating there partial sums were they're

27:27	Luther and in this case and then used it in the previous example

27:32	So um each thread increments looping their web, the old number of

27:42	So it does things in a round fashion or the total range of loop

27:51	. And then it's the same thing of colder and get the height or

27:55	respect to triangle triangle. So after is done we have as many partial

28:04	sums. That's our number of So then we need to add them

28:10	and in this case this is outside parallel vision that goes from here to

28:16	. So this is basically sequential So there is no issue here in

28:22	of um race conditional. And the thing going on with updating pipe.

28:32	right. So any questions on this version open empty on the serial

28:48	Okay, so now let's see. here's what happens or obviously it's a

28:55	of threats is two and then there a table that has 1,

29:00	4 threads obviously was really compiled for Thread assignments. So in addition to

29:06	the run it for three and Yeah, so and then we have

29:12	execution time in the right column. , so and in comments to what

29:26	1-2 threads, that was obviously good in the execution time, but then

29:33	didn't really plan out too well. , so we have the colder,

29:44	And the suggestion why it doesn't scale two threads in this case.

29:54	uh huh. So cruz for listen to replace that, you ask

30:08	and I wasn't very clear on my . I know the suggestion was that

30:14	you're not getting the same number of , it is a little bit trickier

30:24	. So it was a good thing , that race condition right. Was

30:29	because each friendly has its own some . Right? So they could,

30:36	no competition. Everything is uh perfectly little balance in this case if

30:43	number of directions is a multiple of number of friends And in this case

30:51	was a few threads and 100,000 So Even if one thread gets one

30:59	intervention than the others, that would really matter since there's so few threats

31:03	they are basically even in four threads have what? 25,000 iterations each and

31:10	actually perfectly even No. So the part is yes. Mhm. So

31:28	this case we're four threads being so because Houston array Some array to

31:41	this, the one um partial sum the array for each one of the

31:49	. So that means that yeah the elements of the some array are in

31:58	same cash line. So At the one cash is one of the things

32:09	in order to get updated our That means different course, don't share

32:17	one cache show and threads are in different core. The cash line investment

32:24	to in a way Move or be between the different one. Cash is

32:35	the cache line has invalidated And thread updates Some zero. So then against

32:44	then when another threat is different core to update the same cash right then

32:53	these to grab it, copy it in this case the first court.

33:01	that's why for every time threat in different core wants to do something than

33:12	one that happens to have the current version. Things need to basically pick

33:18	it back and forth all the So this is what's coming on as

33:23	sharing in the sense there's no sharing of any data items but the data

33:33	happens to be in the same cash . So this is the notional for

33:45	and that's why this notion of our or cash lines and uh of course

33:56	a cache line you know 11 to may be shared them on a few

33:59	but not level one. So, the computer architecture in terms of what

34:10	are private to each core and also size of the cash line and horror

34:17	is allocated to cache lines. It's when it comes to understanding performance,

34:31	. Any questions on that. So have a part of the reason for

34:38	on talk about processor architecture. typical sizes of cash is and cash

34:47	sizes and didn't talk too much about coherence, but this is part of

34:56	the system needs to do. Yeah, from Mhm. All

35:04	So how does one avoid these So there are a few ways,

35:11	most naive and straightforward way is to kind of having of this summary.

35:22	basically the bad field in this not generously to the array so

35:29	So in that case it feels out cache line with things we don't care

35:37	. So the different some variables that are interested in ends up indistinct cash

35:46	. So in that case there won't any form of sharing anymore. But

35:53	kind of an ugly way of doing because that means that you're cold will

35:59	on the particular cash line sizes in platform you're using. So if you

36:06	on a different platform, a processor has a different cash line size and

36:11	need to change your parents. So works. But it isn't good.

36:18	the fact that it works, we see on this slide novel padding added

36:23	make sure that with some variables are the cash loans from this case it

36:28	the problem because now we get good up even or reduction in execution

36:38	not quite a factor for but not bad. It's close. So I

36:45	benefit from using more threats of the Within quote scales at least four threads

36:52	this case. Any questions on Mhm. Mhm. Okay. So

37:07	does one do to try to make code more portable? Wow. So

37:18	, I said one way to avoid some array and basically have local some

37:26	for each thread. So now there no variable sharing, there's no concerned

37:33	this condition for obtaining the variable because strength as its own some variable.

37:40	it means that at the end one to have a global variable. The

37:49	. Yes, updated. Well um local fred son entities so in this

38:00	this is done and inside the paramo but using the critical construct. So

38:09	that case Pie only gets updated when thread at the time. But all

38:15	get to update by. So this the way then to make the code

38:24	dependent on the particular architectural feature on processor. So it's a lot more

38:30	version of the code. And in case so that pretty much as well

38:40	the parent version and or one since a single variable but could also use

38:53	atomic state. It was only So atomic works a single variable and

39:02	can work on the collection of So in this case either one will

39:09	and the other one since it is reduction operation we can also use the

39:16	reduction statement to have the system take of um avoiding race condition and also

39:28	potentially in parallel instead of a sequential . Now doing in peril submission of

39:38	thread vice some variables. So that's option for this symbolical. And in

39:49	case it turned out there was not as efficient as using but it should

39:56	better. So for four threads in case tom is critical um we are

40:04	serial edition uh the four dress no didn't because too much of an

40:14	But where is the reduction may not been all that will be implemented but

40:20	should scale better onto a large number threats than the news from the serial

40:29	. Uh huh. And I think was more or less what I read

40:36	and take home condition about this cold are too avoid for sharing by instead

40:48	local variables. And then well they atomic critical time and the partial sums

40:55	use the production costs and function. . Oh now I want to switch

41:06	test your comments on that so and now we'll hand it over to suggest

41:14	a demo. So any questions on ? Mhm. Uh So far today

41:21	terms of or previous lecture, I'm . Yeah. Okay. So I

41:35	mr josh also with demo test but some comments first. So there's a

41:41	of different aspects of tasks and in way of some more the general version

41:54	parallel regions. Yeah. And I'll uh talk through some examples and see

42:04	far I get but I wouldn't believe is 20 minutes or so for so

42:11	to continuing the demo from last All right. So, well said

42:18	idea is to Kind of move away the strict 4th joint type of business

42:31	and have a more flexible way of synchronizing and scheduling. Task then what's

42:46	in the parallel regions for joint And I will probably not be able

42:53	cover all of it today but I then do it next lecture. So

43:06	and as you will see first, looks like in the examples I have

43:11	that is kind of looks fairly similar sections as it says on this

43:18	but it is the more flexible construct sections. So what so, tasks

43:32	initiated in the private region and then kind of Yes. Associated cold and

43:46	also the data environment and its own variables package up everything that's normally is

43:57	the parallel region but task and then independent of each other and I'm sure

44:08	examples of that and I think this just pretty much a comment that already

44:18	that, you know, in a parallel regions were in the implemented as

44:26	of tasks without opening it up to program actually him or herself initiated and

44:39	tasks. But now with the test came, I think, what was

44:44	? Open? Empty version three. it has been used because it solves

44:49	of the issues that the early 4th version did not address very well.

44:57	it starts with an open bragman open test simply and it has done a

45:03	of causes that controls what's happened. I will talk about these various aspects

45:12	tests, that's the goal. So tends to be I guess the way

45:20	used Most of the time is that have a single threat to create the

45:31	but began the task then are independent each other. Whatever France is in

45:38	parallel region then can grab the So basically one that phenomenally in the

45:52	simplest incarnation, basically threat is assigned the task by the runtime system and

46:02	the threat, that's what task it's the runtime system decide and if there

46:10	a president task then After it gets be more than one test.

46:18	So now this simple example then I going to get through uh huh.

46:27	is just this is snow carol That's pretty simple and see what

46:34	No, so here is to stand open empty without tasks in this parallel

46:41	And then we are in this case statements um Now if one has 2

46:51	as we do know there is no order in between threads so a number

46:57	things can happen in terms of what printed, so I will just continue

47:03	my own here and say you know threads in this case this happened.

47:12	that was pretty much what I had would happen but not necessarily guarantee that

47:17	would happen. Um So but it have been um that again they were

47:31	so two friends. So basically the both reds could basically have grabbed each

47:46	since it's replicated code, remember So the threads both have all three

47:54	statements, so that means one could Both Red one and to print a

48:00	before anything else. So they could been all kinds of different combinations of

48:07	three words, a racing car, on which threat gets to execute what

48:15	, but for Israel they will obviously them in the order that they are

48:25	. So here maybe you can see should be easy enough and it takers

48:32	this example, my content. Mhm , obviously since just one thread and

48:42	we'll go through the statements in order no problem. Right, so now

48:51	going to try this with the task . So in this case there is

48:56	friend that as the print of a and then it generates tooth tasks.

49:08	task that Prince race and one test prints car. So now what were

49:20	? Cool. Okay. Any adjustments huh. As Okay. So teach

49:32	and the eu Okay it was a bit of it kind of echoes in

49:38	end so I will thanks for the . Even though I can comment directly

49:45	it but yes, obviously what could ? And the reason is in this

49:52	that A is printed first is guaranteed then that thread generals these two tests

50:03	there is no particular order in in which the tests are executed. So

50:09	are not subject to the single threat that are subject to or managed by

50:16	number of threats in the parallel So in this case there was set

50:23	the bottom the slide Fun specified two of doesn't stay in the cold example

50:30	they are mr ram it in this with just two threads and it may

50:36	happen that the threat that gets the task May get to its statement 1st

50:46	it could also be that's different. get the car test prints that

50:55	So there is no order between racing even this task example mhm Later on

51:05	next time I will tell you about ways how you can organise or sequence

51:12	as you may want them because there ways for controlling of the task,

51:20	each other or wait for each So here's another one. No,

51:29	had another. Okay, print statement is fun to watch. So in

51:38	expectations of what this code might Well we have the issue again a

51:56	thread, it's just the two but this single thread has print A

52:07	print is fun to watch. And we have the two tasks that are

52:15	executed independently and in parable by In case different threats. The two

52:22	So clearly we get a first because was printed before anyone of the test

52:29	generated and then we discussed already on previous slide that racing cars and were

52:36	at any orders. And then we a single thread again, the two

52:43	are completed at some point and we is fun to watch. So here's

52:50	actually be what's happening that the single then at a and it's fun to

53:00	may get the job done before any of the tasks, get their job

53:05	. So there's no guarantee that the will be completed before the single thread

53:13	has a and it's fun to gets its job done. So tests

53:19	be initiated at any time that the system decide and it may not be

53:27	in terms of the sequence of the as you see it. Okay,

53:36	I think well um let's start the and if there's time left, I

53:44	talked about the next item for Yeah mm yeah, I think you

53:54	share the screen. Right, mm okay. For some reason here

54:03	ssh session keeps dropping off. yeah, mm yeah, mhm

54:25	mhm yeah, mhm Oh, mhm, mhm, mhm yeah,

54:49	Oh yeah, mhm uh just a of examples that uh we're left to

55:06	last time, so I guess mhm pretty much everyone now can know how

55:15	uh the scheduling works here. So obviously we have uh requesting a

55:21	so just assume that we get what asked for from the operating system and

55:27	just Anyone, any takers, what happen if we have 16 nutrition's for

55:32	for loop in if you use dynamic and uh if you use static

55:40	arctic best practices. Mhm. yes, with static you get an

55:47	distribution with all the threads and with , it will depend on how the

55:53	each thread uh got uh finished its nutrition. And if there is any

55:59	attrition available that it will take and chunk size defines that, how many

56:05	iterations that each thread will get at um at a given time there's one

56:11	thing that you can use a static chunk size as well. So in

56:16	case the difference will be let's say . you have 16 nutrition's um let's

56:22	you had two threads, that means get a pediatrician's per per thread.

56:27	? So from 0 to 7 uh number 12 0 would be executing in

56:34	normal case if you don't specify any size and the rest of them would

56:37	threat to uh thread one. But you specify junk size and in some

56:44	uh these uh the attrition zero and will be executed on trade zero,

56:50	two and three will be executed on one and so on certain. Then

56:54	distribution will be more in a round fashion rather than even distribution. So

56:58	a difference if you use chunk size static and if you don't so just

57:04	example here. So as you can that static, we would expect Uh

57:10	threads and 16 iterations the even distribution the tracks for inspiration and with

57:16	you can expect such um execution as . So zero executed quite a few

57:24	on them. Uh The last train executed two of the patricians, there's

57:29	distribution between the threats regarding their Yeah. Yeah. Yeah, that's

57:42	good question. It's not a Um the the use case for dynamic

57:49	mostly when you're uncertain of the amount time that it will take for

57:55	So let's say in in a simple multiplication case or matrix vector multiplication.

58:00	that case you are expected to work the same piece or the same amount

58:05	data on with each thread. So you can expect the traditions to

58:09	the same amount of time. But you're performing some other words, that

58:12	on how much data that each thread . In that case, dynamic maybe

58:16	little bit faster because if one thread finished then it's better that it executes

58:20	other piece of situations as well rather the that actually might have gotten that

58:27	situation. So it depends how your is. Mhm. All right.

58:36	think everyone knows the single uh construct well. So let's see if anyone

58:47	with single I just have two questions this example. So where do you

58:54	are the implied barriers in this uh the scope, is it here or

59:00	it the 2nd 1? Mhm. . Well, all right.

59:28	so single, if you remember, comes with an implied barrier after the

59:33	section and that it executes. So you can expect all the all the

59:38	to uh to synchronize with each So no thread will uh go to

59:43	second uh single statement. Well, least at that line in the in

59:48	source code before one of the threads finished performing the first print of.

59:55	, um with single statement, you add a no wait clause. And

60:00	that's going to do is it's going tell all the other threats that you

60:04	really need to wait for a threat might be performing the second statement to

60:11	and the other sides can just simply on with whatever is the next piece

60:16	. So you can expect an something like this. The first friend

60:23	, you're guaranteed that no other thread be performing second or third print statement

60:27	the first print statement is executed because an implied barrier after it.

60:33	In case of second print statement, we did uh specified no weight loss

60:39	the single statement. That means some the threats went ahead and printed the

60:44	print statement even before thread forgot print second print statement. So there's an

60:50	barrier after the single construct. But can skip that by adding this no

60:56	clause after it. Mhm. Okay. At this Is a tricky

61:09	. Alright, so take a minute at this code here. And the

61:15	here is what if you specify J private. And what's going to happen

61:24	you specify Js shared or rather not as anything and let it be shared

61:31	default because it's defined outside the parallel ? Good. So would you expect

61:38	correct output for uh for this program you said J Private? Or if

61:46	would you expect the correct output refused let it be shared? So while

61:51	thinking, I'll just quickly comment We have for attractions in the outer

61:55	because we have been defined as Yeah. And then the inner

62:00	we have five patricians, and because setting two trends here. Um In

62:08	, we obviously have four times that's 20 nutrition's. But you can

62:14	um Uh 10 of those attractions will divided each uh each of the two

62:24	. So I guess what might happen we subject to private or shared?

62:40	, oh. Mhm. We wow. There's models. It isn't

63:01	start because it's in the scope. ? Yeah. Well you don't have

63:08	parallelism here, We're just paralyzing the loop here in this case. So

63:15	of the four outer patricians to each be distributed to two of the

63:22	And then in the inner side for inner loop, you can expect Both

63:28	the threats to a trade over the Patricians for Jay. But would that

63:35	if you have a shared J. ? I don't think so, because

63:42	you have a um if you have pragmatist mp parallel for in front of

63:47	loop for let's say for this outer here by default, that makes the

63:52	uh loop variable as private to each . But if you have another loop

64:00	inside that same loop, that private does not get inherited by the inner

64:08	. So by default, J will shared. If you don't declare it

64:12	. And what could happen is sure both the threads might run the the

64:18	outer loop but because they are sharing . A variable one threat might updated

64:26	some value. Uh The other one read some uh some value that it's

64:31	Yet worked on outside thread one increased to from 1 to 2 and then

64:37	zero comes in And read the two it starts from 2 to 5 rather

64:42	going from 1-5. So that means sharing that value, Correent? So

64:48	may not get all the 20 traditions that. uh for the for this

64:54	groups of four times 5, 20 may not get the all 20

64:58	So here a sample out but you expect something like this. As you

65:04	see here we got I zero going J 12 J five Then I went

65:11	from J12 J five. But again because jay was shared, you apparently

65:18	up running J calls to to again I calls one and that might have

65:23	because I called zero may have updated to do at the same time.

65:28	that means J is being shared. so you may not get all 20

65:32	so that if you see this count , It won't go all the way

65:36	20 I think yes it just ended executing 18 nutrition's in total. But

65:44	you set the private jet to be explicitly then you can expect a correct

65:53	And then you get all the 28 here. So you have to be

65:58	in terms of the loop index is they're not private for inner loop.

66:03	the problem will be paddled for is for an outer loop. Mhm,

66:12	hmm, variable. Okay. You Yeah, all court move. Mhm

66:22	, variable. Cable be private because defined inside the parallel region. So

66:28	you want to check the scope of variable, just check where it is

66:33	if it's defined inside the barrel region it will be private to all the

66:37	. If it's defined outside barrel then by default it will be shared

66:41	you explicitly started to be private first . Last private. Any of

66:48	Okay. So yes. K is in this case. All right.

66:58	see. Yes, I think I the same uh example for tasks as

67:04	professor just showed on the slide. this is what he is going to

67:10	in the coming slides. But one that I wanted to mention is,

67:15	say if you had a variable uh as private for the panel region,

67:22	have you had a variable named chair for this barrel region then if you

67:32	all the threads, let's say you on J or update J whatever it

67:39	do, all the threads can do with uh their copy of J

67:44	But when the tasks are called that variable is upgraded to a first private

67:53	dust. Okay, so if you had private uh j you did not

68:02	any updates, that means you may a garbage value or any value that

68:08	, I may have said for a variable, that value will be um

68:13	into the tasks. So any private upgraded to first private here?

68:24	All right, I think that's most the examples I had to show dr

68:32	, maybe you can continue from here . Okay, so I want to

68:36	if there were any questions before then describe my screen Yeah, no questions

68:55	this, yeah. Oh I don't any right, okay, yeah,

69:03	for example will be a little bit exactly how variables are inherited in

69:10	tasks that are similar but not identical the case for carbon regions as I

69:16	mentioned. Mhm So um Scotland Yes, so the global variables there

69:25	behaving the same as in parliament regions as um Okay, mm she has

69:35	uh so local variables, there are and then it becomes first private as

69:46	can hang mentioned. So here is example, I think that tried to

69:51	that a little bit contrived in the that too open and pay parent regions

70:01	one I think I am be a and and subsequent declare be as private

70:11	then we can see A and B C are what defined before one gets

70:19	anyone of the parallel regions of the variables than D. Is defined in

70:28	For a 2nd problem region. So clearly private. Per threat in this

70:37	region. And then in this region is a task as well and then

70:47	is defined in a science within the . So now the question is

70:56	what's the scope of the different five variables defined years? So scope of

71:06	. Is pretty straightforward. It's only outside and there doesn't appear anywhere else

71:12	this cold. So it is within task. It inherits as it said

71:19	global property. So that's shared. about the what does B. And

71:34	terms of the task? And is still a brush fire? Yes,

71:47	think I heard it correct. So , it's first private because yeah.

71:56	inside the test. It's both private it also inherits what the the

72:08	So it's assigned a value in the . So that's why it's first

72:14	not just private. How about Mhm. We'll see is defined before

72:30	of the parallel regions. So it's same with shared right? So similar

72:40	A. Doesn't. How about d ? Well can we say about the

72:55	the task? Well these defined in parable region so is private to each

73:13	in the problem region. So in task and it is first grade and

73:23	is simple is just private because it's within the task. Okay. All

73:30	. So now what are the So A is pretty straightforward. That's

73:36	share. And it was designed the of one. So what's B What's

73:44	value of B? It is Justin huh. Parliament is serious. You

73:57	repeat. Oh, I'm sorry. some junk values since uh there's no

74:03	carried out by the uh well it's on the audience doesn't work too

74:13	today. So it's thank you. in the cold that is visible was

74:22	assigned in the various even though its private because after the second parallel region

74:33	be was never assigned the value Because private. So it's not b equals

74:42	because be in the private vision is different B. And local stores for

74:46	threat and it wasn't first private in parallel statement. So it doesn't inherit

74:52	whole stupid it's a new memory location their private. So it was never

74:58	. And the fact that becomes first within the task doesn't it's just important

75:05	value it had before entering the And if since it was never signed

75:11	, that's why it's basically I'm She is pretty straightforward. three.

75:22	issue defined too because the even though first private, but it was actually

75:26	the value. So we know what is right. And then he is

75:32	signed in the region. So that's problem. So it's as the judge

75:40	out and I tried to stressed all time. It's really tricky or one

75:45	to be very careful to keep track the status of variables what they are

75:54	or private and whether they are initialized not. So, so it's ample

76:05	to make mistakes unfortunately. Right. right, let's see. Yeah.

76:13	this is just um comment on So as I said earlier on,

76:24	order to manage or clear about whether are private or shared, it's good

76:33	always states exactly what you wanna intends variables to be and not rely on

76:46	status of the variables. Okay, I had something about synchronization, I

76:54	couple of minutes I'll just uh mention um um probably talk about the first

77:04	because I think I have a couple minutes only. So barrier is no

77:09	than we're used to and task weight I guess the opposite to no wait

77:16	in the parable regions statements and that also showed for the single statement.

77:24	here is the old example again, . And now with this task wait

77:34	statements on this case, even though don't know which order car and raised

77:43	will be executed. We know that will be executed before is fun to

77:50	is executed because of the task wait . So in this case there should

77:58	so exactly that is fun to watch always the last thing to get printed

78:04	of this desk wait state and Task group is yes. Uh

78:18	And yes I had talk about this to so now maybe we can see

78:27	at least I'll wait and see if one was tell me what gets printed

78:36	this case. Okay. So when have excess a global variable then we

78:45	the single thread that generates the first and then we have a task wait

78:52	then generate the second task. So but access a global variable. Now

79:03	do you expect to get printed? . And the takers. Okay so

79:30	first tasks generated in this case It actions, started up zero and gets

79:41	. So when the first print if encountered It's a Task one. It

79:50	X. And the second task Um doesn't get initiated because of the

80:00	waits statement. So They get the one is going to print x equals

80:09	. And then the second task increment again. So we'll get tax one

80:15	actually equals two. On the other . Mhm. If um we change

80:30	code to use the first private Then what what phone one intellect.

80:48	Yes. Uh huh. I think heard one and runs and that's what's

80:52	to happen. So 1st private. means each task gets initialized With x

81:01	zero. So that's why we get print statement even though there is a

81:07	wait is going to print the Excellent. And I think that's

81:17	Um Mhm. Let's see. No . I don't. Right. So

81:34	the thing that in this case because first test initial list X. Which

81:46	also private so that task so if task finishes then it's not clear what's

81:54	to get printed. So I think my time is up. So I

82:02	the next thing is I'll talk about but that's for next time.

82:11	Okay. And the questions. Uh . So I stopped recording now and

82:23

Previous Next

00 : 01
10 : 29
14 : 07
20 : 49
24 : 13
31 : 21
37 : 17
41 : 05
46 : 21
51 : 27
57 : 37
61 : 07
65 : 05
68 : 59
76 : 51