© Distribution of this video is restricted by its owner
Transcript ×
Auto highlight
Font-size
00:02 so the last time trying to convince that power and that is the consumption

00:08 in fact a big deal in many respects. And that's why I have

00:13 to the sport of the such PC for students. Also, the gap

00:17 to, um, not only be about power and energy consumption, but

00:24 learn how you can get some information total the codes do in that regard

00:32 . So today I want, as mentioned last time trying to wrap it

00:36 on, I will talk about two in particular something known as dvf sor

00:42 voltage and frequency and control for which is something that all the processes

00:48 these days and that cannot see. fact, um, benchmarking in the

00:54 that you can get unrepeatable benchmarks unless kind of locks things in. And

01:01 I'll mention a particular standard is being , uh, for as an interface

01:08 what the process is due and the system known as the P I and

01:16 . Depending upon how quickly I I'm gonna talk a little bit about

01:19 Dent. ER Data Center case started Facebook that was interesting, but depending

01:26 time, a social. I may mentioned that much about it in order

01:30 move on to open and peak. um not insured. Trying Thio.

01:40 voice this sprung. So now about go on MHP Politician frequency scaling.

01:51 is it? So I think at once before I talked about See most

01:57 properties that it's the power consumption is to the voltage square and times the

02:04 . That's a dominating, dynamic energy for the most technology, whether it

02:10 for processors and memory of some other off functional unit. So the point

02:18 that there, as a kind of ish line, shows here. If

02:23 application is not particularly sensitive to the frequency on the processor, then 1

02:31 want to keep a relatively low So because then it would not affect

02:37 runtime much. But it reduces the consumption so you gain in energy efficiency

02:44 slowing down the clock a bit. that's clearly the case if you have

02:49 that is memory about in terms of application, and that's partially what I

02:54 ah, repeatedly sort of mentioned to Thio get an assessment for the

03:00 whether it's Memory LTD's and not in of size but in terms of

03:07 by and with all agency. Or it is CPU, Lemme limited.

03:12 I'll come back to that in a to So this is the basic idea

03:16 controlling voltage and frequency, which is dynamically, or northern processes, whether

03:22 from Intel or M D or IBM some others. He was kind of

03:29 little bit of all this slide that done many years ago, but we

03:34 that started something known as the Green list as a way of promoting energy

03:41 and computing. It did this, , I can degree in something cluster

03:48 built when he was at Los Alamos Labs. And what this slide kind

03:53 just shows is the colonel Light colored is the slow down or the performance

04:04 by applying this dynamic voltage and frequency idea So most of the time you

04:11 see for the codes they tried the impact waas less than 5%. Where

04:20 the taller um, dark red bars and an energy efficiency gain off in

04:30 10 to 20 plus percent, depending what the application Waas and the upper

04:36 hand graph shows kind of true I would say they're more than a

04:45 . Where is the top center Is comes from Nass Parallel benchmarks that

04:52 except the benchmark. Now that's more Colonel Cold pieces of cold developed by

05:00 many years back. And it includes G. Conn radiant method. If

05:06 familiar to some of UFT for you and I seem to get sort.

05:11 Luo is caution elimination effectively. So a reasonable range off kernels are or

05:19 algorithms that is used to many scientific engineering applications. But I'm not going

05:26 go further interest. It just shows experimented that on a particular processor,

05:32 15 20 years ago, in so this is a little bit conception

05:35 what I talked about. And this an example was actually running on the

05:41 processors. So until has algorithms that frequency and voltage is on their chips

05:50 a function of how the workload and more that chose this read

05:58 Design is how the CPU power consumption ah is affected by the frequency,

06:13 it's as energy. So that's assume fixed times. It doesn't say

06:16 but it is affecting up the power as a function of the clock

06:21 again using the so called square low C most the dashed bluish curve.

06:30 sort of declines with increasing frequency shows that is typical for applications that are

06:40 or clock frequency limited off the So it tends to be that,

06:47 , the run time decreases more quickly the power consumption goes up. But

06:55 clock frequencies. So that is the for a well optimized may takes multiply

07:01 , for instance. And so one and that has been used to control

07:10 and frequency for such applications is was as, um, run to the

07:17 or race to the whole basically, you run a fast as you

07:20 and then you stop. And for bound applications, that's a good

07:29 whereas from memory bound applications, that's the good strategy. So this particular

07:35 that they called Earth for Energy, around toe raise to the heart

07:41 try to based on real time as this applications execute decide a lot

07:51 the optimal strategy for clock frequencies, this was one of the greenish curve

07:58 to illustrate. So that's something that an operational try, unless you try

08:02 prevent it. Then you run cold sky like and subsequent processors from Intel

08:12 , so I'll talk a little bit this standard and has been developed for

08:22 an interface between the hardware and the system. And this is first just

08:28 very global view of the application of on top of the operating system.

08:32 then has an idea, or what hardware can do and what the car

08:41 are. And these days that I mention before about all processes processes today

08:50 temperature sensors built into them, and the answer is being used in controlling

08:56 , all teaching and frequencies in order make sure that things don't overheat.

09:03 some algorithms like rappel. That is you're using for Assignment. Three.

09:09 take thermal inertia into consideration so they control and allow um, sort of

09:20 , uh, participation for short periods the chip is relatively cool. So

09:26 has Fella sophisticated Thurman models to make that things don't over eat. And

09:34 is something that happens again by Where? So there is a

09:38 I mentioned that before their separate processors , now the course, but separate

09:44 of, um, projects that decides how to manage power and clock frequencies

09:52 the Children. So this is sort high level diagram. Aha ! This

10:00 c p. I is architected if like. And that's something that's primary

10:09 , well started to be developed for processors because they are operating most time

10:17 batteries, so that's a very limited resource. So power management has been

10:26 big deal for a long time, S o, as it says on

10:33 lighter, many different kind of what they call the global system first

10:37 top. And then there are is of system. Sleep states as they

10:43 , and I'll mention that develop it then I will spend some more time

10:48 about C states and also peace tapes doesn't show on this particular slide.

10:56 here is a little bit so the states and peace states are the ones

10:59 are particularly relevant for this course and you're doing with CPUs and memory,

11:10 it's basically very cartoon shows. It's . The operating system in Communicates with

11:17 is a C P I subsystem that vendors do support and systems firmware on

11:24 vendors decides on what things they want manage themselves in which things they want

11:30 leave, uh, or let the system also help manage. So here's

11:39 little bit more detailed picture on how things work. So the yellow box

11:44 the system states on the way the works for all of them, whether

11:50 CPU states for system states or any of the other A components in the

11:56 is that index, or number after the letter determines something that is

12:04 operational and then as a digit increases the letter, it's increasingly deeper sleep

12:16 , and I'll talk a little bit about that when it comes to see

12:19 and, yes, most of he states. I'll talk about the

12:26 also for peace steps, but they're sleep states. So this just little

12:34 . What he stated states actually mean terms of the system states, I

12:40 not talk much about it, but can see again. That s zero

12:46 the fully operational states. Nothing is ah turned off, down or

12:54 So any degrees, everything is kind fully powered up and fully ready to

12:59 and do things. And then, if you turn to some highest system

13:06 , then things are not ready to in terms off CPS or memory.

13:14 I said, I'll talk a little more about the sea in the P

13:20 and this slide just tried to show what's kind of used to control the

13:27 we expect to see in peace And so first, uh, the

13:36 here on the upper half on the side, this clearly shows the

13:43 um square low again in person before three square times F. And that's

13:49 dynamic part that is affected by the and frequency control that is through the

13:58 control off participation and energy consumption on today. You also mentioned,

14:10 a few terms the clock gating and modulation. I'll just explain them on

14:14 next slide very quickly. I won't into it. My clock gating.

14:18 music turnout off the clock modulation is little bit refined version of complicating

14:26 Gaining is means that you turn off to some subsection of the chip

14:33 In one of the previous lectures, show that today's chip they have a

14:38 of what's known as Power Islands that individual control off the power on the

14:45 a typical The core is its own islands, or they can be turned

14:50 and off individually. And then there's power domains for other parts of the

14:58 . That comment, of course, , and on the bottom left hand

15:06 and shows what affected by clock gating clock modulation for that matter and what's

15:12 by the power gating. And here's instead, that would explain the

15:19 Just the clock. Gating is simply the clock for a period of time

15:24 issuing a cop Stop clock assertions and then it stops the clock.

15:30 that's it. Whereas the clock modulation a bit too, finalists said then

15:35 have basically one thing that that enabling to turn often on the cloud for

15:40 periods of time. So if you a little bit more flexible of dynamic

15:45 of the clock. 1 may use modulation, but in actual life

15:51 um waas more frequently used or often in chips before the dynamic voltage infrequently

16:02 became the norm. So it may be used in certain cases, and

16:06 certainly implemented on chips. But the s is the dominating control mode for

16:12 and energy consumption. So here is little bit of an illustration to try

16:18 give you some idea what happens when enter some form of sleep states.

16:27 the reason we're having several suits levels sleep state is that the various levels

16:37 sleep states are again use dynamically, depending upon the period of in

16:46 the processor or core may enter quote , a deeper sleep state. But

16:53 deepest lead state is also associated with longer time to get it back to

16:59 operational mode. So, as the of graph on the kind of center

17:07 the right of the slide shows is , um, the C zero.

17:12 a fuller operations. Stay at the . In fact, today is in

17:20 first columns. Everything is kind of , but When you enter this,

17:25 one state that is a light sleep , if you like, then you

17:32 turn off the clock to the But everything is on. So cash

17:37 our retaining their information or data so really happens. Thio the cash sub

17:46 on chips How everyone you enter this state, then you may actually plus

17:55 lower level cash is like l one l two. And if you have

17:58 L three cache for it typically means that the content of L one and

18:03 two is written to elf the other cash and eventually, as you enter

18:10 so the furthest deep state that they seven on Intel processors than all the

18:18 are flushed and the cores are kind fully turned off. And that's also

18:24 in the highest energy savings. the most of the highest, Layton

18:31 . They came back into business. eso the next couple of slides shows

18:39 little bit of the complexity that has . There's now processes are no longer

18:47 courtship, so it means each of course has their own C state and

18:58 kind of follows what within labeling that . I showed on the previous slide

19:07 it's just a sad added another C front. All listen C state for

19:13 no meaning. The core C when you get the double, sees

19:18 they signify that it is unique for of the course. And otherwise something

19:27 pretty much says what was on their life except a little bit of description

19:33 the C six and C seven and on the previous slide. And,

19:41 , then now there is again. course are a single processor or

19:49 as tends to be the word that being used for complete processor that gets

19:54 into the socket on their board. then the package also has see states

20:00 the package. C State depends, , the sea states of the course

20:07 the processor. So it is essentially the package assumes the lowest C state

20:17 the course and the processors home. cartoon, interesting to the right of

20:23 graph to the right, basically shows while if all the course and to

20:27 three for instance, then the package also allowed into going to it's C

20:33 state on correspondent Leigh with C six so it's good to be aware that

20:44 see states, for course, are managed as and then depending upon what's

20:49 situation on some, the chip, , common features in the package for

20:54 the course and, uh, gaps according to what's needed by the most

21:02 court only most likely sleeping court. I took this as an example.

21:11 is from Intel architecture Daniel can find the Web, and that shows the

21:22 , the normal normal time to exit entry, for that matter, the

21:28 levels. So sleep states. So , you can see it's in the

21:34 you know, microsecond range, A couple of 100 microseconds if you

21:40 in the C seven state and, , it's good to think about,

21:49 know, microsecond doesn't sound like and it isn't. But in terms

21:53 CPU cycles, it's actually quite a off work that could have been done

21:59 you have to wait for things to up. So it is good to

22:04 keep in mind with a clock frequency , or the chip in understanding what

22:10 penalty is for being in a sleep . Onda. Then I shouldn't say

22:21 , but the peace states. This something that, yes, performance states

22:27 this is actually the clock frequency off CPI use. So that means the

22:35 , uh, see status C So it's a fully active and that's

22:41 only case when the peace states a . And, you know, turban

22:46 is part of the peace states So the and again is in this

22:58 , T zero is kind of the clock frequency, and then it goes

23:03 on here is I'm sorry. It the opposite that, at least for

23:10 Inter guys that, um, the number is the lowest clock and this

23:17 for a couple of the Intel But the clock frequencies are for the

23:26 peace states on. As you can it, it's not the continuous

23:30 Soul is not the knob you turn basically can dial in different frequencies,

23:36 that comes from that. Those are generated on some kind of yeah,

23:43 the frequency divider off frequency multiplier. operates and discreet into Bolton for these

23:50 processors. It's about 132 140 make hurt for each jumping clock frequency.

24:00 on the Bader, it's something I again for at the Empire.

24:05 It's kind of an oldest process at point, but it's kind of hard

24:10 find this information. So I'm sorry don't have an update. But the

24:14 intervals hasn't changed much for its typically 100 plus megahertz peace between each

24:22 uh, these p states and again peace states, for instance, is

24:28 by algorithms like the Earth algorithm. it's also possible from users space to

24:37 to particular clock frequency. But usually requires some privileges on the system to

24:46 able to do that. Mhm This is kind of a summary off

24:57 I was saying that there is power . There is, um, an

25:05 to go in and control piece but normally is done by firmware that

25:14 temperature on the various parts of the into, um, consideration determine voltages

25:22 frequencies. So it is the case many off the current generation ships that

25:31 a few 100 temperature sensors, pieces of silicon Thio, understand what

25:41 is kind of hot or what parts the chipper recently cool so you can

25:49 raise the clock frequency. Andi, things are managed by the operating

25:54 As I mentioned in in a comment an earlier lecture, the operating system

26:00 choose toe move our clothes from the area to cool area on the chip

26:07 it's restricted from the insult. and I think that comment on that

26:17 and yes, so earlier when I about the need for managing power,

26:27 , uh, there was this Google just over 10 years ago. What

26:34 complained about the power consumption on standard were not proportion to the workloads.

26:42 they sort of, even when we're , was pretty much nothing. So

26:48 ships for idle they're still consumed more 50% off the power. So there

26:56 not much correlation between part of consumption workload. So the since then,

27:04 I mentioned, the voltage control has on to the diet used to be

27:11 the diet on the circuit board. now it's on the chip itself,

27:15 there's no power domains for each one the individual course and other key parts

27:20 the chip, and it's also, only a few years back individual clock

27:27 for the frequency control for the individual . That came a few years after

27:34 power control. But thes efforts has cause that the part consumption is

27:44 Ah, big advanced tours being more to the work. Little eso.

27:51 this case, the island ship may consumer around 2025% the full power at

28:00 work. Also, it's more proportionate the workload, but it's still not

28:05 fully proportional. And that's buy a . It's because of active power

28:18 so that will stop that offered some about this kind of rapid idea or

28:24 you about that there is dynamic control happens, and anyone is gonna wanting

28:34 do more advanced work. One can go in and control it oneself.

28:39 it is a good thing to be barrel because in how you develop

28:55 no more or no questions, then will just very quickly illustrate something that

29:04 give just a little bit of I'm not coming to me go into

29:08 . But anyone interested there is on blackboard, a paper from Facebook

29:19 which the next to slides are taken discusses, have Facebook controls the participation

29:32 its data centers. And I think quite interesting article Lucia put in So

29:41 it is just discovered. It's actually think, for the first lecture,

29:48 I started to talk about power. it's in under lecture age. You

29:51 find there this article from Facebook Um, so the way things works

30:04 not for a small cluster, but the big data sets. And mm

30:09 , sites It is that they as I mentioned power that is equivalent

30:19 small town. So they make draw 10 2030 megawatts or even more sometimes

30:26 the giant data centers. And that you have pretty much, at least

30:34 the quote unquote last mile dedicated car . And those are expensive the

30:43 So you don't over provision that so you build the supply infrastructure according

30:51 the expected demand, and that means can't really exceeded because then the equivalent

31:00 refused blows at the network, our level, and that if one of

31:09 big data center goes offline, It's significant load that disappears on the Net

31:20 that can actually cause instability in the grade. So this is serious business

31:31 this level. So controlling the partial important. Of course, the data

31:42 owners don't want Thio by a whole of access infrastructure capacity. So there

31:49 Thio just by the right capability from infrastructure so they don't want to again

31:57 for over provisioning. It's also the that if you ask for, do

32:05 on the utility, builds out infrastructure spent a lot of money on the

32:11 , and then it turns out you're using it much. Then there is

32:17 contracts with utilities. There's also kind this if you don't use as much

32:23 as your contract ID for and by . Several years back, there was

32:33 in The New York Times about Microsoft I guess, misjudged. They built

32:42 new data center in a smaller community come for our utility needed to build

32:51 lines. And then it turns out Microsoft did not nearly used as much

33:01 as they had contacted for, So were penalties and the cause, and

33:07 turned out that the penalties were stiff . So Microsoft choose to basically run

33:18 full blast for nothing, just in to avoid the penalty but for no

33:23 so working or useful load being carried . So anyway, this is a

33:31 deal to managing power. Mhm. so here is a little bit off

33:40 these things tends to work, given I just said about infrastructure and this

33:47 of points out ability for infrastructure at levels to sustain overload. So if

33:56 look at the red line on this what it shows that one has about

34:03 seconds. If the Lowell is about 30% above the designated or contract ID

34:14 before the main circuit breaker trips to utility network. So it's not much

34:22 to do adjustment of car at the point to data center after the rock

34:32 . That is the dark green You have a lot more time because

34:37 there's, you know, when you the central limit here and and

34:42 if you have a bunch of independent , then they tend thio, not

34:48 worst case doesn't tend to happen at same time at the whole event,

34:51 you get more lever in terms Using too much power at the Radcliffe

35:01 these slides have chose a little bit how the car variation is within certain

35:12 windows at different levels again in terms the data center, leftist Iraq level

35:17 right is the total data central and it shows that if you take

35:24 the entry point to the data the main switch your breaker than within

35:32 second vendors. At that level, are not very large. Most of

35:39 variations appears within just three seconds, if you allow time 10 minute window

35:50 once, most of the variations you know, within 5%. But

35:56 this larger? So the larger the in the and then take one interest

36:03 , then she in terms of the types of applications that run how much

36:08 happens in various periods of time, I'm not going to go into

36:13 But this press specials what happens within minute, how much off the variations

36:19 . See the effort accumulated distribution functions you, uh, the total number

36:26 our variations on that happens. So this case, pretty much everything happens

36:34 an, uh um the, um , all right. Our variation

36:49 It's, um, to take the line. That's some storage applications or

36:56 . Most of the variations happens within are less than about 10 ish

37:04 Um, now, So what to ? Things that have a hierarchical,

37:11 , structure for managing them. So manage things are direct level. And

37:16 go back and look at the first of network diagram or or data center

37:21 feed structure they do. It's hierarchically . Um, so they talked basically

37:31 rappel to get the information that the server level. And then,

37:38 the strategy is basically to have power , then that so they have a

37:45 , as this is the captain And then if they allow power,

37:52 for a short time period thio potentially that a little bit. But then

37:59 mean it reaches this threshold, and they do turn off services on.

38:07 turn off enough services that they hope get to the captain target, and

38:13 they keep the capping in effect until load has dropped below this uncapped ing

38:21 , and then they remove it, then the time to be that the

38:24 increases again. So on this is of just the same thing for the

38:31 level. And then I have a of examples here where the shows little

38:37 the various services and on the blue . And how many? The

38:43 uh, for the Web service and servers and feed newsfeed servers. And

38:50 many of these service was operating at given part level? And then the

38:54 line shows kind of capping targets. many were affected by tapping?

39:01 on this shows a little bit huh? The performance. They last

39:06 the functional production, which is, I wanted to show the next couple

39:11 sides were again quickly just to get idea What, um, how this

39:17 works in practice. Um, and greenish line shows the actual power.

39:28 then you're having, um, Parliament set for this group of servers and

39:35 the blue curves and shows how many were actually kept when the green line

39:41 the car. Captain Target. And , as one can see in the

39:45 follow their first blue rectangle that when green line drops of fixed shouldn't it

39:52 get the young captain target then, , the service and no longer capped

39:57 then comes back and goes. And I had a couple of other examples

40:02 showing more in detail. What happened ? These are in the article that

40:07 at. And then I had a on the case study that I thought

40:12 interesting when they show how things happened things went bad and how by this

40:20 power management that they have for their centers when something went wrong, how

40:25 actually then tried to recover and the didn't fully work. So they had

40:30 do several kinds of restart. But never Tom got to the point through

40:38 capping that the service totally The data did not get offline altogether, but

40:46 , so anyone interested in that type see how they actually do things for

40:51 real life. It's interesting, so this is a little bit what

40:59 claim. Andi. There are many Google. That's the same. They

41:04 use now automated party management in their centers and they all claim is

41:10 Significant amount. Oh, energy and Indians. Uh huh. And that's

41:18 I had in mind for talking about in that wall. Stop in that

41:26 extra slide you can look at stops screen sharing for a moment and

41:34 on to, um, finding my empty slides while I happy to take

41:51 . Mm, no questions or that it. In terms off harbor and

42:07 to manage hardware, they're actually in off performance and power. So now

42:17 switch to talk about open MP. , um, I'm sure most of

42:31 probably heard about open emptiness and when made and used it, Um,

42:38 unless along talk about what? The kind of system, as our systems

42:48 for open, empty, uh, water basic paradigm for using him open

42:56 people it is, um, and programming model and the memory models.

43:05 and I start to talk about open constructs, but I won't mostly cover

43:13 next lecture on by the a bit the victory following next lecture. So

43:22 , I guess against a reminder off talked about service. So this is

43:29 one indirect way of saying that Open . Primarily you should do it.

43:36 for the programming individual multicourse servers and typical server, you know, like

43:47 stampede to servers and dual socket servers Do they are Numa servers.

43:55 we mentioned before that there is memory associated with each one of the

44:01 um, on. But then tire has a shared at your space.

44:11 regardless of which processor or socket and core is it knows about the address

44:20 that corresponds to, uh, the chips associate ID with the entire

44:28 regardless to which socket there are So that means that they are kind

44:36 the new my architectures type variety, access to the local them's is faster

44:46 access to games. Um, the subject, but and then this,

44:56 shown that before so and again, MP is for shared memory systems.

45:04 not the only way of doing but it's probably the most commonly used

45:12 . Later on, we'll talk about p I that has message passing interface

45:17 you can also use for shared memory . But it's typically labels more heavyweight

45:25 of more overhead with it. So not necessarily what you want to use

45:32 a single server, and then one more directly use poll six threads for

45:40 shared memory programming and anyway, and MP is kind of convenience layer on

45:45 , proposing spreads so may dealing ah, parallel programming more easily.

45:55 , of course, there's automatic privatization you take the code, Um,

46:01 is using standard programming languages. Next C plus plus fortune. What have

46:09 that you should have no notion off or structures of memory or anything of

46:16 nature and hope that the compiler was to figure everything out. And so

46:23 , the success of automatic prioritization is limited. So that's why one

46:30 um, some form of added information , um, in terms of open

46:39 . It is by using directives something says on the next slide. So

46:47 MP is employed. You use one the standard programming languages and then one

46:55 . It's compiling directives to it. , when informs the compiler on what

47:04 wants toe happen in certain pieces of , and in order for things that

47:09 , then there is a collection around routines that supports they open empty.

47:17 then there's environmental variables that one juices define the execution context off the

47:26 empty cold. Um, the common of misunderstanding that may happen in the

47:35 days when we learned about parallel programming to not be aware off the fact

47:42 it has no notion off anything beyond shared address space. And typically that

47:50 only existing on a single server. of getting it beyond that. And

47:54 mention that briefly. But in terms stamping to, I don't think there's

48:00 layer you know, the software layer kind of, um, give the

48:05 of a true shared memories. So essentially the first order of business,

48:11 something you can use for programming single , and I advise you to get

48:23 when they open MP website. It lots of very useful information, as

48:31 hints or compilers and tools, lots presentations and videos, tutorials and,

48:38 , lots of very useful information. I encourage you to go to this

48:45 . Um, it also, of , has the current standards, and

48:52 it's very helpful to go and figure more details about how things actually

48:58 or at least what the standard Because there are a number of it

49:08 degrees of freedom that compiler writers and in order how to implement things are

49:15 . Everything is totally locked down in standard on here as suggestions for

49:26 I think there a few years but they're still highly relevant, so

49:31 don't necessarily contain the latest improvements in standard. But please, for the

49:39 assignment they're going to get. These were perfect defined. They cover everything

49:46 will need to know about. And very good videos, and I watched

49:52 so I know what they are on foot. Yeah, so this is

49:59 little bit Repeat a what brought up today and earlier about this notion of

50:08 that does in one aspect off shared processors. And that's the case for

50:18 servers. Today they are young memories with socket, so that causes there's

50:29 a uniformed access, depending where things in the address space. There has

50:35 symmetric about the processors, but they're longer common. This is basically the

50:43 structure that shows the application is a off again codes with compiling directives and

50:51 my use, uh, compiler that the directive and open MP compiler on

50:59 , then has with it libraries to some of the constructs. And,

51:06 I mentioned environmental variables that helps you the execution context. And then there's

51:13 run time libraries to help manage threatened . And that's kind of a joint

51:20 between specific upon MP runtime libraries and operating system. Who does what Onda

51:32 that we mentioned a little bit. that access So again, standard

51:37 Today they're numa shed memory system, there's lots of compilers available to support

51:47 . It is also what's known as , coherent distributed member systems, are

51:54 ? Some clusters have had this I'm not so sure that,

52:03 how many vendors supported today say the graphics used to be the dominating supplier

52:10 , uh, but that that Silicon do no longer existing that swallowed up

52:16 . If I remember correctly, and not sure that that's supported any longer

52:21 any one of the new products, may still the systems after that has

52:29 . And then, as I mentioned can create the software lay on top

52:33 distributed memory systems. Distributed memory systems just a collection of shared servers that

52:41 the shared memory connected over a But then we can create the illusion

52:48 shared memory by software layer. And well, sometimes called the studio

52:55 The SM or available be a So, um, and then,

53:02 course, the multiple ring systems that talked about. So yeah, as

53:12 mentioned that before that it's kind of layer on top of post six threads

53:18 make programming more easy or high level directly using posting threats. And the

53:30 is that the programmer or the user the strategic decisions off. But you

53:41 or want to be executed in parallel which may defined to keep sequential

53:53 And then for these so called parallel of regions of the code, the

54:00 is supposed to be able to figure how to do the details off,

54:08 , paralyzing it, Andi sharing work among threads, and we'll talk about

54:15 when it comes to the various constructs is being used in opening, which

54:22 here's a little bit more detail what division of labor is between the programmer

54:28 the compiler. Um so, as said, programming gives hint or directors

54:39 what the compiler and should try to out in terms off, how to

54:45 things and how to generate multiple threats those sections. And it's not

54:55 But there are then what's known as for allowing the programmer. Also,

55:05 decide how they work closely is supposed be shared among threats. And

55:11 um, there are synchronization primitives. are also constructs in the open MP

55:22 has implicit in transitions for the It doesn't explicitly need to put securitization

55:29 into the actual one off the more was a Traffic and Difficulty Fox about

55:41 an open MP in particular when it to performance. But sometimes also respect

55:48 correctness is to make sure that the sharing is correct. First, we

56:00 the correctness of the program and, , sensible respect the performance. So

56:07 talk more about that. That's go the constructs. But that's one thing

56:13 this was a non trivial in terms , open and be. And as

56:22 says here, there is no automatic in open emptying. That means it

56:30 take a sequential code that is not through directives. Thio the parallel realizable

56:41 you like. So it doesn't take sequential piece of code and try to

56:46 out what can safely be executed in . So it's on Lee in response

56:54 director from the programmer. So the then takes his directives and try

57:06 Paralyze it, basically generate a number threads. Then we'll talk about that

57:15 more detail. You know both the off. It's being used on bond

57:21 that happens, and then it does load sharing. Either it has some

57:27 rules, or it follows the instructions with the constructs of how to share

57:33 load. Among the threats, and a society that some compensation

57:46 No, if it's I think I threats a few times, and I

57:52 I even may have, uh, what it is before. But just

57:59 case anyone is uncertain, so thread kind off the atomic, a minimal

58:08 in this case that is an independent of control. So it's a execution

58:15 if you like that and has its program counter and has his own register

58:22 . It has its own associative but it's a little bit off subtlety

58:28 when it comes to open empty what means, and we'll talk about it

58:31 terms of the memory model. But it does a stream of instructions that

58:37 to be executed with its only in counter and register so that it's more

58:51 , then the process. That process may have one or many threads to

58:58 the task, and it has its address space where threads my share address

59:07 , uh, designated or assigned for process. So the process is sort

59:13 high level concept or broader concept that past threads, um, to use

59:23 to carry out the work. And is just maybe, you know,

59:30 a little bit of the program for process, has pretty much everything and

59:36 sac point to program conference and whereas then the individual threats they have

59:43 own. That point is program concerts registers, but they do share things

59:50 are common for all the threads in process, like the Total Address,

59:57 and user idea and other things. , too, may be required for

60:05 and all kinds of other purposes. why would one and use open

60:15 So the idea is that it should a portable program, and it is

60:24 song as open MP is supported on platform, and the version of the

60:39 MP is again supported on the So it's a little bit of trickery

60:48 not all compilers supposed support the same version. So if you use

61:00 a version of the standard, that is more reason so to

61:07 that is supported by one compiler like . C. C. For

61:12 it may not be supported by compiler another platform or GCC, so there's

61:20 little bit off gray zone in terms the portability when it just comes to

61:25 functionality. But the main idea is the same source code, but they're

61:31 the given sets of directive should be compatible for any platform up there.

61:41 there is no guarantee that you also at good performance are fun as optimized

61:50 open, empty code for a particular . That means taking into concertante features

61:57 that platform it may not work all well on another platform that doesn't have

62:04 same set of features, and I it by using directives is typically start

62:13 a sequential code, and then you directives. So again it's you have

62:21 to profiling tool to figure out what of the cold is most time consuming

62:28 then also realized that those pieces of code could very well be executed in

62:37 . You can start calling the program trying to paralyze those sections of the

62:43 and then successful and move on to of the called that after parliament

62:52 maybe new cases off the cold that now the most time consuming. But

63:03 , optimizing things for performance is not a trivial. That's right,

63:13 so we'll talk about that more. , So here's the basic idea off

63:23 open, empty kind of works. so mentioned. There are this construct

63:31 in the former directives in the source that informs the compiler that certain segments

63:40 the code is things that one would to see paralyzed and that's known as

63:47 regions on when the region's is comes an end, then one is back

63:55 sequential processing domain. And so there one thread that runs through all the

64:04 private regions that you may have in code that is known a semester

64:09 And it always kind of lives. then, in a parallel region,

64:15 threats respond common, illustrated by the pieces in this diagram. Dr.

64:23 . Yes. So this is more less client server model for open up

64:30 . Yeah, well, one time use it. Morris A fork

64:34 Since it's not different devices, client tends to be that there is one

64:41 and then can forum artwork to different . Eso In this case, it's

64:50 the same computing defense. That or or even chip that can then support

64:59 than a single threat. So that's logically, it's typically you'd as fourth

65:06 parallel list. Okay, eso now could be that the threats that is

65:15 in this forking action gets handed out clients. But this is kind of

65:23 then, when it that's more often the process level. So it's than

65:32 lot more heavyweight than yes, kind of allocating a separate set the

65:42 and stacks and program counter because the may grabbed instructions from the same place

65:52 the master fed. So there is necessarily in the program handed over to

65:57 other server. So that's why I and when? Because conceptually is

66:08 The I want because off the difference this is a lot more lightweight than

66:14 tend to happened in a client server . It, um So this is

66:26 the basic idea and share memory Yes, that there is this fork

66:32 joins a threat. There is synchronization . That happens, and we'll talk

66:37 about all these different constructs. you know, once you have created

66:43 number of threads than money is to , Yeah, eyes the work that

66:47 supposed to share it industry we carried in the region. How is it

66:53 to be shared? Among the threats we were talking and but more of

67:02 details and next lecture I'll talk a bit more about what happens in the

67:10 few minutes. So here's the basic than to make things to happen.

67:17 create a parallel regions on one use , as I mentioned a few times

67:23 open MP construct, and they're slightly syntax and fortune and see so most

67:35 students today, they don't c and plus plus and, uh, even

67:40 somebody also no fortune. So it slightly different a format fortune that I

67:48 show on this light. So has this pragmatist that is,

67:56 input to the computer that now comes open MP constructed than has a name

68:05 a collection of causes. And I talk a number of these constructs,

68:10 , and show one today, I , and then the causes that has

68:13 deal with how you want to construct be carried out. But the basic

68:20 is you create this region through this MMA, and then the region is

68:27 and enclosed within the curly brackets here then at the end of this parallel

68:34 , there is an implicit synchronization. doesn't need to synchronize the threats that

68:41 working on this parallel region, eso . The classes can specify the number

68:49 beds and all the behaviors, and talk about on. Here is I

68:56 one example that on show today and 60 parallel simplest form start region,

69:06 there's different forms of the parallel And this is the very generic ones

69:10 says What's comes. I want to a parallel region with the number of

69:17 . It doesn't tell how many threads use, but you can do that

69:21 show how that can be done uh, if you don't specify

69:28 you'll leave it entirely to the operating to decide how many threats to use

69:35 whatever algorithm it chooses to do and they can choose to do it

69:40 on whatever number off threads are available that particular instance. In time on

69:48 server you're running. And, the availability maybe not just based on

69:57 threads use. But as we it also takes, um, temperature

70:03 heat states that so it also may the number of threads based on what

70:09 thinks that the hardware can take without too hot. So here is just

70:17 little bit off a lot. An of what the left thing investor shows

70:26 open, empty code, that as in this case, it actually

70:31 , um, that for the parallels that will the entered after this

70:39 Mama the program. I wanted four and then on the right hands,

70:45 it shows what the compartment. And searching the code in terms of generating

70:50 fourth Feds and get the particular code then make sure that the four threads

70:56 generated and eventually synchronized again. Then said the I also mentioned the memory

71:08 that is important. And then we're to talk about various constructs. But

71:17 the shared memory model is the fact on this things are specified.

71:30 all fetch shares the same under So you have potentially you start the

71:41 is essential poll they have the program an address space, and then you

71:46 Paramount Region. Yeah, uh, number of threads are generated, and

71:52 threads can access any part of memory things are specified otherwise, and that

72:01 cause headaches and that you don't necessarily order and the progress of individual

72:11 So there is no guarantee that threads execute statements, and in order you

72:19 them to do so. That can crazy because, um, things no

72:28 to be deterministic compared to the sequential because things are not enforced in any

72:33 order. So for that reason, may choose to have certain data being

72:43 . Two threads. And in that , Onley. That thread can access

72:49 data. And I'll talk more about things worked on there than,

72:55 cause is that you have in order deserve how data should the rules for

73:02 data accesses? Um, yes, is the private data and its own

73:10 space. Or is it still in same global address space just inaccessible to

73:15 threats? What happens in general in , is that you, um,

73:25 within the address space of the So it's in the global address

73:30 But those memory location is only accessible the threat. Okay, so that

73:38 new memory gets allocated within the global space for the program, so it's

73:46 their programs at the space. But unique for the thread, so I'll

73:57 more about that. So that uh, it's a safe condition.

74:06 if you declare data to be it's initially un initialized. Unless you

74:13 it otherwise. So you need to careful and not believing, even if

74:19 use the same name that it automatically things from the global address space accessible

74:24 everybody. Or that at the end results gets accessible because then against the

74:33 , when the thread dies tall, is trickiness. We'll talk about

74:41 so this is a little bit more the structure. So I mentioned so

74:46 they one parallel constructs, basically to the parallel region. Then there's work

74:53 constructs that I haven't talked about, we'll talk about then. I just

74:58 this data model for the data and we'll talk about that that works

75:03 terms of shared in private variables. synchronization mentioned. And then there is

75:10 from variables that I was talking and I thought I had,

75:17 few minutes left, and I'll show on a quick example. And then

75:22 a repeat. The next number just show this non determine is that happens

75:27 basically block I talked about before. Basic Block is a piece of code

75:31 those are not familiar with. I compiler terminology and told that it's the

75:37 are set of instructions that just as entry point at the end and an

75:43 point at the end and no entry in the middle. So here's an

75:48 of a piece of cold. And here's now one block basic block.

75:57 this particular Golden says be one. the reason that Instruction five is not

76:02 is because if you look statement it has a go to statement of

76:08 so you cannot jump into basic So that's why five starts in New

76:14 . And then there's nothing going into of these six and seven or eight

76:19 . So this is now the second block, and then we can look

76:25 the next basic plot. There's nothing into, and then we have to

76:29 to nine. So that's again by . Was knocked in there one reason

76:36 well as they go to five statement that basic blocks and then we have

76:42 statement 13 goes to 23 so that definitely in the locked. And if

76:48 go to 23 we see it basically and nothing happens there and nothing else

76:54 into that. About 20 State 22 to go to five, so that's

77:00 sorry This one is a vacant block we have an exit and go to

77:06 . There was just one statement Then there's these two other basic

77:11 So here is basically but the cold looks like in terms of basic

77:17 And I will just wait. But will repeat it next time. The

77:22 couple of slides. So here um, and took generic Hello,

77:28 . And I will leave that in slice and the website by the

77:32 This will start a parallel region through program Open MP parallel. It doesn't

77:37 any number of threads that I want do have, um, but then

77:43 want to print out Hello world, it has for the thread id

77:53 So that's what you get. Thread investor gets it ID for threads,

77:58 then you run this cold and here's the output and, as you can

78:03 , is kind of jumbled because there no particular order inferred in which the

78:09 get to print out where they are the execution of the code. But

78:14 will talk about that next time. time is up, so I'll take

78:26 Yeah. So the next time I talk more again about the execution

78:34 that and how it took control, management off what's shared and what's

78:44 among other things. Because that's to personally. The most tricky part and

78:52 see help financing both of correctness and . Yeah,

-
+