© Distribution of this video is restricted by its owner
Transcript ×
Auto highlight
Font-size
00:03 so last time can sort of brought to N. P I. And

00:12 it's intentional or scope is in terms programming and also have a little bit

00:20 background, too, the basic structure terms of N p I. And

00:28 it is for clusters, so that no global memory. So I will

00:41 just briefly remind you about the context the next couple of slides. But

00:46 is But the focus on today's lecture be actually talking about the message in

00:54 about N P I in a couple different context. First, uh,

01:01 sense there, sending and receiving. what was mentioned last time as point

01:07 point communication as supposed to the collective that deals with groups of processes.

01:19 then we'll probably hopefully get a little into the one sided communication. We'll

01:23 how far I get that out. wanted to Yash together kind of a

01:30 demo off NPR work, so I leave some time for him. Thio

01:37 M P I. So here's again context, the collection of independent,

01:44 complete computer systems that's terrible memory their own disk and everything else and

01:52 are basically put together and to form cluster through some form of interconnection

02:00 So it forms this again hierarchy off , so to speak, not only

02:06 and I would done, but potentially around, depending on in the

02:12 depending Maha the network is put but it's also that needs to be

02:18 all the different capabilities are moving. , which is again reflected in the

02:26 of a hot NPR, is put . All right, so this is

02:32 dimension no global. I just So there's only independent that just places

02:40 is a related to the individual MM is being used on the dominant

02:48 That the same program runs on a the notes not to use in your

02:57 and that how things gets it should . The class across the cluster is

03:02 that programmer has thio decide and figure how to realize as well as then

03:10 associated communication. To get the process to cooperate, resolved. The problem

03:15 trying to get solved also said that Structure is kind of typical. What

03:22 and one another networking class. It's of fairly intuitive you have a payload

03:28 then you have a header off some . That's NPR calls an envelope that

03:33 a little bit off source and destination that we're talking about that more

03:41 Um, and this was his notional that is used. Thio form or

03:57 processes that one wants to allow to messages between them without interfering with other

04:08 of processes. So that's the role the communicator on. We'll talk a

04:16 more about them today, and here the picture. I think, more

04:21 less finished with last lecture that shows these communicators can be defined has also

04:31 out that the numbering process is known rank our local to each communicator,

04:42 they will start from zero. And I've done David sequentially and the question

04:51 up. But the ordering, all labeling and will not address up that

04:58 more today than I did last But I will talk more about it

05:04 lecture. So now toe what's new today, and then it asks about

05:14 messaging so there are or mod sense and corresponding receives, and I'll

05:27 a little bit more about this four mode, there is no no standard

05:32 buffet and ready. And here is little bit more description again yet,

05:40 I don't have a Texas a little busy slide for presentation purposes but the

05:45 a little bit of what is the features off in this case, the

05:53 and in the synchronicity and, the standards. And, um is

06:08 that the process I want to send thio in other process and the same

06:18 than that is what you can Um ah needs on one hand the

06:32 received in the receiving process. But are some details that important to

06:42 It s so what it tries to for the standards. And there is

06:48 one can, um, that sits post to send our run the call

06:56 descend the team without first checking whether receiving process have actually gotten to the

07:08 where it has executed a corresponding received , uh for the message that they're

07:18 to the So that's bomb parked that process that calls the MP I send

07:29 have to wait until they receive is to the communication library for the receiving

07:41 . The other thing is, that a little bit tricky. Is that

07:52 when this send completes depends. So communication library is implemented. So

08:04 It and it completes at the point the communication system takes full responsibility,

08:14 of, for the message is either or properly Bufford. So the send

08:26 where the message initially let's put, , can be reused. So it's

08:33 little bit of gray in terms of status actually is in terms of the

08:38 that is being sent, Um, the standard son, the synchronous sand

08:51 more off the rial handshake. So it says that it will not,

09:02 , complete before there is the handshake the receiving message, whether it actually

09:10 received at the other end. It's different matter, but at least it

09:15 clear not there is some F King , and the process for getting the

09:23 is started on the buffer mode is handing it to buffer. And then

09:34 , um, the buffer send has of done his job, so to

09:42 on the cold can proceed beyond the sent, whereas the ready mode it's

09:54 means that there needs to be receiving , state o r. That's in

10:02 . N p I call and the process already in place before the reticent

10:09 be executed. So it's kind of association. There will be a little

10:17 . There's more to it impacts than will come on the next few

10:22 So that's concerning. It is a bit. It has to be very

10:26 and take a good look at exactly the conditions are for completion off these

10:34 roads in order to both count on being received or not. And when

10:44 team proceed to potentially reduce buffers. , and there's also this notion of

10:54 and unblocking communication. And the blocking something that called any one of these

11:02 routines will not complete until the conditions always talked about has been met.

11:13 non blocking communication means that, you , guarantee you request the communication or

11:23 sending or receiving, and then you it and hope that things goes well

11:29 the cold can proceed. Uh, , this library call. Now,

11:38 you, uh, dependent for corrections the gold that things are actually properly

11:47 so if and use non blocking communications their ways off trying to check

11:52 I think the correct things actually But before I talk about this,

11:57 guess I happen somewhere Slide That shows little bit about the different versions off

12:07 blocking versions, non blocking send functions the corresponding things is for received that

12:12 didn't put on a slide, but non blocking ones has the eye for

12:18 in front, off what the communication or send load this that you

12:27 whether it's sense sentence, Love Already mhm um, on this just

12:38 a little bit, uh, about these the structure off the sense in

12:44 case against the received maybe on the flight. And so, if you

12:49 at the argument list for the call the library routine, they will see

12:54 the first three arguments in this case describing the payload by pointing Thio wearing

13:03 starting address in the memory where to it and then, um in the

13:10 of how many accounts off the particular type that is being used for putting

13:18 the message, all the data that be sent. So last time we

13:28 about the different data types on day any was used to find data

13:33 So it doesn't necessarily need to be blocking memory, uh, say integers

13:41 votes or something. But they can try to, you know,

13:49 um collection off memory addresses being put as one of the data types that

13:57 get repeated count number of times. then it also identifies. Uh,

14:07 part is through the envelope information that about the destination process that is in

14:17 same communicator as descending process. So where the communicated places role. Because

14:29 process runs this code, and the itself, us in the information

14:35 So when messages comes in that the then knows from work communicated, this

14:46 is surprising and make sure it does properly on. Then I guess that

14:57 for the received, and that's kind the same thing. Course it describes

15:01 payload and secondly describes, uh, aspect. And then it has started

15:09 flying that allows you to figure out things went well or not.

15:20 Um hmm. Yeah, this is a repeat again to specify that the

15:30 the blocking sun does not necessarily guarantee things has Bean received is just as

15:40 before it proceeds that the communication system taken the responsibility for that. The

15:49 gets delivered on that it made itself used buffer spaces. That is not

15:59 of the application to make sure that the applications point of view, you

16:05 receive suitable for you. Use One yes so they can't against.

16:16 a caveat. Um, so they in the sons and received doesn't necessarily

16:26 to, natch. But if you more items than the receiver is prepared

16:40 receive, then things doesn't go too . On the other hand, it

16:45 send less. Then the receiver then thing to fight. And

16:55 I guess, like this example, , or the simple send receive where

17:07 can see the two again for sending received there has to be matching sense

17:11 receive otherwise they called may hang or . They go bad. It's all

17:22 is, uh, the various parks terms off pointing thio their locations in

17:32 in the respective, uh, processes and receiving process, where in memory

17:40 want either to retrieve data from or to retrieve data from or starting to

17:47 to the cost of data. Then was the count, and then there

17:50 a double or double in this the data type to find out exactly

17:55 much on potentially where things are supposed be deposited. And then,

18:05 in this case, we have the destination part that they senator said I

18:12 to send them this data to process . And then, um, in

18:20 case, you have to then So on the process, run its

18:25 responsibility of the programmer if it to out which processes wanted to,

18:32 send and receive the data. So this case, on the one of

18:38 processes in this case, the one gangs zero will send a message

18:43 only one of them will receive that . You know that among the rank

18:48 and and then it has to be matching source test being used for the

18:55 to this library routines. And then also then specify the communicator in which

19:07 processes lives. On this case, was the default come worked communicated that

19:16 all the processes, but they could any other used to define communicator that

19:24 used in which these is sending in seating process, uh, to which

19:30 belong. And then the tag was . All center and saber,

19:37 Keep track off. That's it. between the two. In case so

19:43 messages being sent on, it gives an idea, sort of each off

19:49 messages. Otherwise, it's just amount data and the communicator and source and

19:57 , but 2010 Necessarily what the attributes are for these messages, but the

20:04 allows you to do that. Let's if I'll stop or fine.

20:11 I can stop there. And I'll about deadlock because your soul in the

20:14 slide, which is a serious problems terms off or can be in terms

20:21 N P. I. So any on this basic message model on sending

20:29 messages and describing, uh, fatal the source and guest that again localized

20:45 the same communicator? Yeah, so that deadlock. So I guess

21:02 can ask you or not heading on slide tells you that there is a

21:08 on this. But this code that how many can perhaps and one can

21:17 has tell me why this may this May Deadlock or Vendela, you have

21:29 circular dependency because they both called perceived . Yeah, so they're both gonna

21:35 waiting for something, right? Because air blocking communication routines So the receivable

21:43 there on not proceed until, it gets something from process one.

21:55 in this case, process one. start out coming in this case

22:03 uh, and receive call on, to get something from process zero.

22:12 that means that since process zero is of waiting on the received for something

22:19 be received before it can proceed, never gets Thio. They call to

22:25 something to process what and similarly process don't get to send something to process

22:33 . So this is the thing that have them That's why this particular

22:40 their dogs, because again, these blocking community library routines so they wait

22:49 something which is a completion. So is the one thing that has to

22:56 careful that, um so this sport talked about someone has to kind of

23:04 things like what was done in this , that order and which should do

23:09 library calls on the to communicating processes to make sure that the cold can

23:17 . On this case. Things are because we have and now process

23:25 Wanting to get something from Process one process one can. Is starch out

23:31 trying to send something to process So that kind of message exchange

23:40 should proceed. And when it's done the proper way, then the second

23:44 is also probably ordered in terms of sense received in the proper order.

23:53 so this one, uh, is fine. So that's one thing.

23:57 you write your send receives, be in terms off it, wondering in

24:05 sending and receiving processes that you don't . Thio called library sense and receiving

24:15 order that causes things toe Hank. one can gamble, as this slide

24:25 , because it was discovered. uh, mentioned in terms off the

24:33 sense that it can proceed. If communication library has taken care of business

24:40 leave it the application code to then it may work out fine,

24:48 library may not, uh, be in a way on the platform you're

24:54 such a that is, in the case. And then it will

25:00 . So this is what they call avoiding on the deadlock again because of

25:09 that the send can proceed when the i library routines, please help

25:22 And so this is then a kind combines and receive routine that then the

25:33 I library takes care off, avoiding . So in this case, I

25:42 there's an example on this next So by using the send, receive

25:49 ends, both receiving and sending or settle processes. M. P.

25:58 want to make sure that the matching correct, so to speak, between

26:05 and receives. So it combines the receives into single statement and allows,

26:12 this case a mystic exchange to take problem. So if they want to

26:24 message exchange, you know, it's not uncommon. And I guess next

26:30 I'll thinking you have been example or you use message message message exchange.

26:41 , and the example will be, , why not again economical examples that

26:47 back and again and again. I Jacoby attractive salver that had shown up

26:53 couple of times already in this So then you can imagine that if

26:59 have different off these great points in map two different processes, then since

27:08 Kobe worked on kind of everything your point neighbors values, that means,

27:19 there will be, in fact, exchange of data between neighboring red

27:25 So she looked at the great Each one takes grabs things from these

27:31 neighbors. That means one point grabs from the right neighbor. On the

27:37 hand, that right neighbor also needs from this left neighbors that becomes an

27:42 . So extends are common in kind many algorithm. So send receive is

27:48 very good, um, way in case, to accomplish it without risking

27:54 long and e think Uh huh. duplicate. Sorry, I'm just the

28:16 against Sorry. Right now. Don't the difference in these two slides.

28:24 why that happened anyway. So a bit more about them if I want

28:32 use a non blocking communications, in case the cold proceeds without necessarily knowing

28:40 it's safe to do so. So is really up to the programmer

28:44 Make sure that the code returns to result. And as something I mentioned

28:56 then, if you depend on things , being completed in the correct way

29:05 you do certain places or actions in code, then, um, one

29:13 use what I'll say on the I'll talk to that slide in the

29:17 . You can use this MP. wait call, because when you do

29:24 immediate son yeah, give Thio and for that sent, and then you

29:34 test later on whether that particular send has been completed properly or not.

29:45 this is the way that you can assure correctness off the code if it

29:52 on the sand, have been completely waiting until the communication library tells you

30:01 itself to proceed. And the point that once said on this previous slide

30:11 it potentially can get a better performance not waiting some time until the communication

30:20 tells you that it's safe to Look, so I guess the other

30:30 Thio synchronized routines is it also has barrier that then allows this they're all

30:39 to not proceed past the barrier until all get there. So it's very

30:46 to the barrier. It talks about open. And he so any questions

31:02 far? Um, yeah. I'm not quite understanding. Why on n

31:09 I send this blocking. Yeah. that's funny to me. It's kind

31:17 personally and you a little bit It is that it does not proceed

31:26 the implementation. Uh, the uh, message passing in the basically

31:34 library frieze the buffer. Okay, it z the buffer that it makes

31:45 send. Okay, The chemical the communication, the underlying communications

31:53 so to speak as either internally buffered so you can reuse that buffer and

32:01 , overwrite it in the application can it. And depending upon how long

32:07 that takes, that is what the is. It may sit there until

32:13 got ceasing well from the actual communications that I now have it, or

32:20 now delivered depending upon how the library implemented. When you do a non

32:30 immediate, it doesn't care. And have to be careful not to overwrite

32:34 before the communication routines take the first of thing that you wanted to send

32:46 not tell. Clarify? Uh, . Yeah, I'm still grappling with

32:54 send now. Um, but it's more clear than it was before the

32:59 sound. Asai said It's tells the software there is data and where it

33:13 . So it may copy what you from memory. The communication library,

33:22 MP. I send calls. May you there not generate the

33:29 And yes, sir. Point to in memory and assume the communication library

33:35 get it from there. And if use the blocking version off, send

33:45 code that may otherwise be working on piece of memory. Will not trying

33:53 modify the content of that memory until The communication library tells you I have

34:01 delivered it to the other end or have it. And the buffer somewhere

34:09 see immediate son. Then once the is done, whether the library has

34:16 retreat the data from memory or you don't know. So if you

34:20 change, please data in memory. the communication libraries goes and gets

34:27 it may be different. Then you to send so in an immediate

34:36 Since this is all happening within, , I mean, one process sending

34:40 other ones receiving, but within the that's sending, Does that spawned another

34:45 that does the sending so that it simultaneously Um, actually, it the

34:50 after the immediate send? Yes. there is there parallels on going on

34:55 a single? Okay, that makes that made a independent threads. And

35:03 maybe more heavyweight process is depending upon . Other communication libraries implemented. But

35:09 are concurrent operations, potentially so The only thing that limit see concurrency

35:19 this blocking or waiting until things are safe to be reused. Okay,

35:27 . Thank you, Dr Johnson. . No, no. It's very

35:30 that this was the shades of So it's nontrivial thio. We totally

35:36 on what actually happens underneath. All , so yes. So this waas

35:50 one other aspect that is important that for as long as the process is

36:02 , single threaded things are kind of to be in order, but otherwise

36:07 may require that you use more information figure out that messages are if the

36:18 and which messages are received It's UH, multi threaded codes you may

36:24 . Thio take actions to make sure the order that you wanted to be

36:30 actually what happens if it's a single . The border delivery's guaranteed,

36:44 otherwise fairness. It's not guaranteed, things can the or processes may

37:00 It's very unfair. Share, heart use off communication actions or messages.

37:18 s so. That's what I had in terms of the point to

37:23 Expect off N. P I. on. So in terms of eventually

37:37 to understand the told behavior, it's so simple, given that in terms

37:47 the blocking, it's not exactly The code is going to be a

37:53 it depends again, as we talked on how the communication library Dustin's,

37:59 blocking one is non blocking ones are very much more clear, hard

38:06 but than correctness of the cold becomes issue for the programming if it depends

38:16 certain behaviors in the sense and receives , um, so it's a little

38:27 , I guess, not exactly the that what we just talked about this

38:33 and received because it's all about sharing and, um that was also,

38:41 think, for me one of the trickier was subtle. Portugal Open MP

38:47 terms of sharing variable so private variables making sure that things is handed color

39:00 . Now the next thing that is as collective communication on reduction again was

39:10 we talked about in terms of open that was getting data from multiple

39:17 in that case, on doing some reduction of combining, um, the

39:24 off and, uh, for M I. And there is also then

39:32 the correspondent thing between processes. But also a few things that I don't

39:38 we talked about when it came to and be or it doesn't it had

39:42 broadcast. We did, uh, it was such there's also functional condemn

39:50 copy things, too. Multiple eso there's broadcasting reduction and then there

39:59 also they gather, scatter. Uh , I do not get all If

40:05 talked about open MP what I talked it in the context of victimization as

40:10 concept and then n p. I also what's known as all through all

40:15 teams in the world to talk about new now, Um and we're probably

40:22 get to the one sided communication because wanted So you have to get a

40:27 to do some demos off N p today and then I will continue that

40:34 time. So now on to the communication. And so that's between groups

40:44 processes within a communicator. So here , uh, cancel pictorial illustration about

40:55 , um these collective communicators are the is basically copying data from one note

41:05 or process alright and show careful process Other processes within the communicator scatter is

41:17 of dividing up what is on the process and distributed pieces to receiving

41:28 Reduction is the kind of inverse, converse of broadcast, collecting things up

41:34 we talked about before. And gather scatter are ways in which,

41:43 yeah, there is just collecting things a different processes into a single

41:47 and this scatter as a seven once opposite. Also, all I'll talk

41:53 what that means. If you do all broadcasts, which we'll talk

41:59 for instance, that means that everybody everybody else's thank you. So now

42:06 against the broadcast in the structure is been illustrated Here you have one process

42:14 want to share in this case, variable A with all the other

42:19 And then you can use a MP broadcast to do so. I'm pointing

42:24 in this case, um, starting in memory where you have the data

42:30 want to share where the other processes tell the data type and how many

42:36 those who want to share with the . And then there is the

42:43 that again, for which defiance is our processes which will share this data

42:52 the NPR broadcast as well as what sources for potato. So all processes

43:01 these things and again, the ones then, uh knows the route.

43:07 those are have the same communicator as one generating the message then will receive

43:16 being broadcast. Yeah, and it just a very simple example in instead

43:23 just initializing you can, there's just instead, if I didn't want to

43:28 in on every process from broadcasted, this is many times more of a

43:33 thing. Um, because and that is, for instance, in it's

43:49 of algorithms we had that for? guess it doesn't depend on what anyone

43:56 knows. Can you get created methods that case, the forming in the

44:00 and some values have that needs to broadcast it out to everybody else.

44:05 it's not that to set something from , but it you use reductions.

44:10 then the result, the reductions and that gets shared back out to the

44:15 processes. So it's useful. Yes, here is the reduction part

44:26 In that case again, the it's conversation now is specified. Um,

44:34 the local memories for each one of processes where you want, um,

44:41 to be retrieved from and how much it is in this case. Simple

44:47 . Just controlling point value and then you want to results to end up

44:56 the, uh, sort of target receiving process for the reduction so that

45:09 it's within the same communicator that you do the reduction operation. And here

45:20 the kind of reduction operators that is supported. There is, you

45:27 in Axum, product multiplication and logical , as well as potentially finding the

45:36 and maximum, including the particular process , that has this maximum value that

45:46 sometimes useful to know. Um, I think this is, um

45:57 Another interest reduction example that I put showing high can do, in this

46:04 , just local sums for the competition with this process. And then

46:12 there, uh, the global sometimes up the partial local stamps. And

46:21 think, yes, but that I put the more concrete example.

46:25 this is the pie examples I used open and P. But now using

46:33 instead of using it on open MP do the exact same thing. And

46:40 look at the gold, which I not talk through in detail. But

46:45 just it's the same idea a lot distribute rectangles on his oilers, the

46:52 simple integration method in Iran Robin And, uh, the for loop

47:00 is, Sit there. And then s so that the generates some local

47:07 of rectangles and then is used NPR to sum up, um, to

47:14 of the local rectangles. And it's of the processes, Um, and

47:27 is the gather scatter routine, and is again in terms of the gather

47:36 collect data from the various processes into of the processes that is the kind

47:42 the route, the receiving process that's by the rank attributes or, if

47:51 do the opposite, then you distribute collection of variables or data types from

47:59 process to all the other processes. that is against the guest of the

48:12 scatter operations. Andi, I should a little bit more specifically here on

48:18 sidewalk. They're different. Uh, tributes are our fields in the gather

48:26 correspondingly, against serious. Just an on the gather, more concretely and

48:34 the scatter the same thing. Mhm it does and what the different arguments

48:42 the call describes the. There's just simple example again, the conversation in

48:54 counter. And then I also mentioned there's an all to all on

49:02 I well, it does this. said it. If you're doing all

49:09 , all reduce, for instance, means that at the end, on

49:16 call when it's complete that they all . This sounds that every process in

49:28 communicator we'll have the results of the operation. So, as it

49:36 effectively, it kind of is like reduced, followed by an MP.

49:42 broadcast now the implementer of the and I have bean lazy and they

49:53 fact, implement that they all produce the rug, sort of big as

49:59 independent calls and all reduce and then by an NPR broadcast by the proper

50:06 . But depending upon the platform in this cold is being run, they

50:18 be much more efficient ways sudden implementing I will reduce. Then I reduce

50:26 by a broadcast. So and this something one can try to figure out

50:32 you run it on some platform, gather whether they all reduce its performing

50:39 better than separate, they do some and all gather is correspondent thing that

50:51 also results at every process, gets from every other process. And,

51:01 , these are, I guess the operation I haven't talked about much about

51:10 is also known as a general as parallel pretexts operation. And what it

51:21 is that in the prefix that you get sort of running some if the

51:34 operation is plus so so whatever variables are trying, they say, plus

51:47 Thio. Then process one gets, the some off what it has as

51:58 as that from process zero process to some of itself plus some from zero

52:06 one accept us if you get the accumulation off stuff. If it is

52:12 plus for the operator for this camp those there's even being programming languages defined

52:21 use a spiral prefix as a primitive or language. Attribute, um,

52:30 and great. Pretty nice. An code if scans are implemented.

52:42 so I guess it was just a on there. All gather that Waas

52:49 about on the previous slide. piece of gold tells me I can

52:56 this right, And you can take look at those, uh and this

53:07 , um yes, they're also all which is the scattered version off correspondent

53:20 from together. And I think I about to stop there. Yes,

53:25 stop and take questions, and I then suggest to them. Oh,

53:35 Andi questions first before she just takes . Then I don't know if you

53:49 had joined. One of the first I said in the lecture was I

53:55 to get back a little bit about ordering and said that once talk more

54:03 that today. But we're talking about next time. Okay? You

54:10 for the shed somewhere, like to 10 works, it's more complex than

54:17 assignment in open antics. I'll talk that. First time we talk about

54:30 side, the communication, and then talk a little bit more about processes

54:37 allocated and ordered next time. Okay. So yeah, right.

54:50 was kept sharing my skin. Sure. Okay. All right.

55:00 , yeah. So so be a demo off the proposed of the FBI

55:09 that you need to use in the . It was pretty much what that

55:15 Savannah discussed after learned in the I get to see it more in

55:20 format code now. Um, so you can see, uh, this

55:26 will be giving on the bridges I'm already on a computer thing is

55:32 a simple computer that you can get to using the Do you interact a

55:38 ? Uh, now, first thing bridges that I've noticed recently that you

55:45 todo this particular, uh, Uh, that's called Maria Lord,

55:55 slash in terror and underscore m. . I think is the one that

55:59 be using, uh, on and , so that that can be also

56:10 the American pilot, which was believed combine the court. Um, first

56:17 is basically just a sort of a world program, and it basically

56:21 uh, simply the skeleton off its and the program that you need.

56:29 , in any simple FBI program, first thing you need is 300

56:35 the FBI dotage, so that you including your programs. Um, next

56:41 that you need is, uh, call the function MP. I their

56:46 in it as off now, but passing any parameters. Absolutely initializing like

56:51 it off the default parameters. Once done that, basically, you can

57:01 do any FBI operations. The second important thing in the FBI,

57:08 programs. Is this function towards the , which is the MP? I

57:14 . So this function needs to be from all the all the processes of

57:18 basically cleans up what's going on the and all the stacks that may have

57:26 for your FBI processes on remember You have processes, not, uh

57:34 all the number off processes that you be that you will spawn our actual

57:41 on on they should not be confused friends. Uh, now, we're

57:47 the most basic function that's provided by icmp ICOM size, which lets you

57:55 figure out the number off processes in runtime runtime environment. It takes a

58:03 particular constant that is provided by, , by the FBI library,

58:10 commonwealth. I think this constant, gets initialized when you call the FBI

58:17 it. And it contains, information about all the processes a trust

58:24 . And this is just a variable will get the output of this function

58:30 will contain the number of processes in first one. You can also call

58:38 icom rand again using the communicator MP bold. So this communicated, as

58:44 said, and includes all the processes the global space and in that global

58:52 , you can request the rank, , your particular process. Now,

59:00 thing you notice here is that we , uh, spawned any,

59:06 parallel section like we're doing open. the the reason eyes that because as

59:12 as you call MP, I in everything inside FBI in it or after

59:18 p I. N s call will executed by all the processes that will

59:22 . So you need to have conditions , such as the rank off the

59:29 so that they can do the work you want them to do.

59:33 it's basically a a counterpart off magma the family, which means that all

59:40 processes will be doing a lot of work basically replicating the whole populations that

59:47 don't want to happen. Then uh, function here is NPR get

59:55 of name. There's what this function is it goes to host file,

60:01 is managed by mostly the operating It contains the names off the particular

60:09 on the North. In our will be, um, uh,

60:15 other particular moment, and then you do also for life, which is

60:22 sure what Instead, anyway, this function at this variety with the name

60:27 the something and the end basic later , uh, you need to call

60:33 FBI finalized. Now, the compile FBI program way can use the compiler

60:40 P I C c for C four bits, simply just providing source code

60:53 . Andi. Once you've done that execute your program, you can use

61:00 and B. I don't provides lots different flags that you can use.

61:06 there are different flags for process uh, quite a lot off lights

61:15 you can use, which are used different focuses. But for now,

61:18 just use the flag envy, which for the number of processes and just

61:29 for now. Because for this I got access to only one processor

61:34 bridges, which contains 14 course. this is just for the demo

61:38 usually for NPR processes. You will , uh, more than one

61:44 Generally, in this case, I got access to one model on Once

61:50 provided the number of processes as the . Just give your executed belay.

61:58 you learn that you think things out the processors, the names in this

62:05 names happen to be the same for the course, because there's a on

62:10 course, uh, belonged to one knows on a Z can see,

62:16 , like open and be there is most specific. Ordering in amongst the

62:22 the different processes that that respond. you can see that Bank one,

62:29 , executive before 90 Any questions on ? This is just a basic skeleton

62:39 NPR program. Mhm. Okay, move to the second example.

62:50 here, thistles. The example of to point communication. As you can

62:55 again, we have the basic skeleton NPR in it from sites and Hong

63:00 . Uh, what I'm doing in example is I have set the source

63:06 be frank zero destination to be I have just one teacher that I'll

63:13 sending from source to source to the count is the number of elements that

63:22 will be sending and taggers left with . But again, I stopped.

63:26 also mentioned that is just a way communicating amongst the processes. That what

63:32 ? What idea of the data is being said? The main part of

63:37 program is eyes in this section So in this section, um,

63:46 compare the rank that we got using I come back to the source.

63:52 so again, as I said everything MBA in it is executed by all

63:58 processes that respond. So we need to make sure that the functions that

64:03 want to execute are being executed by processes that we want. So in

64:09 case, off the first check if World Bank, uh, matches our

64:15 process I d. And on Lee that source process, I d we

64:21 our our data available and then execute I sent. This is the topic

64:30 syntax for MPs n provide the address your buffer provide account off the elements

64:38 you want to send. You also to provide the data type. And

64:42 this case, M. P, has input data types so that there

64:46 no mismatch amongst the different architectures. there are different architectures for different processors

64:53 you're using in your MP I or to say so, that right

64:59 , uh, defined general data types that everyone can match what data is

65:07 . So that's FBI. And in case, destination process rank on This

65:17 the bag and then you need to provide that communicator, uh, which

65:21 this case happens to the Global Communicator the FBI combo because the communicator that

65:27 used to identify the banks uh inside inside a group off processes.

65:37 then once you have implemented MP, sent for your source process, you

65:43 also you need to also have n I received for your destination. So

65:49 , another condition here, make sure we only, uh, it's,

65:54 , implement empty I received for our process on very much the same.

66:02 uh, seems index here. Nps signal. It's a constant provided by

66:09 , which basically says that it does care whether the, uh, the

66:16 transfer happened directly or not, Which any cases, uh, not

66:24 But you can use it, but you should not, because you should

66:30 make sure that received the correct And in the end, you can

66:35 have the you need to have the finalized. And again, compilation is

66:41 the same. Uh, in this , I'm just provide two processes respond

66:51 . Once you executed, you can that process zero send data five.

66:58 one and process one correctly received data process. You know, any questions

67:06 not, And remember that FBI sent MP I received are blocking goals.

67:13 you, uh the execution will not for volunteers, but, uh,

67:20 transfer's finished on questions on that. not, uh, here's an

67:36 Off the deadlock that we saw in in the slides the FBI received an

67:43 send. These are dropping calls. if you so happen to have to

67:49 calls before percent, then, as just discussed, there's a circular dependency

67:55 your program sit in the deadlock. part of the process is waiting for

68:00 other process to send something. And you don't, that you see that

68:06 happening that you would not see, , any output off, maybe get

68:14 for some reason. Oh, Yes. As you can see,

68:22 just, uh, sitting in both processes are sitting and waiting for the

68:27 process to send a sense of data stand that welcome Commissioner. Yeah.

68:42 the next example is, uh, way off removing these deadlocks so either

68:50 can reorganize your MP. I sense MP I received, so that does

68:55 happen. Or if you really want keep this order for some reason,

69:01 can use the non blocking calls. is the envy I received NP i

69:07 I I sent The index is Both are pretty much the same.

69:13 in this case, you have one parameter, which is the request I'd

69:19 , which has a type off FBI . So you need to define

69:24 Uh uh, have another initializing Um, and as so in case

69:34 non blocking communication, you can either have thes calls for MP. I

69:40 an FBI receive and keep doing your , but at some point, you

69:44 obviously want to make sure that your transported finish on. In that

69:50 you can use M. P. wait and provide the request for which

69:57 want to make sure that things did finish correctly. So you can have

70:04 I right here, just to show the difference. I've put NPR great

70:09 after the last of them. In case, I put NPR race before

70:17 last print, so you can see will there will be a a little

70:22 different ordering in which both processors, , Trent print the final conduct for

70:37 , and as you can see uh, rank one. The final

70:44 was actually finished after this'll received was for a process, you know?

70:54 that's why this final printers came after was done for process. You any

71:02 on that? Yeah. Okay. not than other moved to. So

71:14 were three examples for point to point . Now of here, I have

71:20 one example for, uh, elected , as we just saw in the

71:26 from the NPR broadcast and MP, reduce goto. Both of them.

71:31 uh huh. So, NPR broadcast we saw in the slides, is

71:37 collective operations. So in this you do not need to have a

71:42 for executing NPR broadcast because it needs be executed by all the processes that

71:50 participating in the broadcast operation. In case, it so happens that we

71:55 broadcasting toe all the processes in the group, which is the which is

72:02 by MP icon Vole. Let's say we had the broadcast for a certain

72:09 off processes, then we needed to a condition, uh, to make

72:15 that only the ranks that belonged to certain subset off processes on the execute

72:22 doctors. And what this does is takes, uh, area or a

72:29 , which contains some data, the of elements in that particular buffer,

72:35 type off that buffer. So we our butler defined as a character at

72:41 SNP. I has a car data defined, uh, inside it as

72:47 . They also need to provide theme source off this broadcast. So

72:53 in this case, it happens that use rank zero as the broadcaster,

73:00 then they also need to provide the . And once you do that,

73:05 could, uh, this called, , broadcasters enable for all the processes

73:12 the FBI found world global lateral Um, in terms off MP,

73:19 reduced. So here you can see I was calculating the time taken by

73:25 broadcast and the motive here. WAAS . Uh, get the maximum time

73:33 of all the processes for performing broadcast minimum time and the some off all

73:38 time from all the processes. as you can see, this local

73:44 , variable will be, uh, for all the processes because extend,

73:50 , the global gold space. We not put any conditions across measuring the

73:56 , so just, I mean, executed by all the processes. And

74:02 what we can do is we can the MP. I refuse function.

74:07 so we righted, um, the and that needs to be reduced from

74:13 the processes. And then there's global time, which will be which will

74:20 all the reduced values. Uh, again. Is the size off the

74:27 the data elements that we have seen we're introducing or the number of elements

74:33 we're reducing? This is a data FBI. Max is the operation that

74:39 want to perform, uh, on data elements. And zero is this

74:46 the rank off, the off the that will get all the final reduced

74:52 . And again, we also need provide the communicated. Okay. And

74:59 in this case, uh, we lying Max reduction, um, introduction

75:05 some reduction on the local time so we get all the three values from

75:11 the processes. There is also FBI average operation as well that you can

75:19 thio use you can use to get average time from all the processes.

75:30 , so in the end, if again put a condition that if the

75:36 off the process running this part of code is zero, that is the

75:40 that we reduced for the values, that can provide us with that will

75:47 on the final reduced charges. That's global match time Global. Meantime,

75:53 can use all of that to uh, compute the average time as

76:00 . So it is. It's so quite, uh, it

76:14 you know? So this was the time out off on the eight processes

76:18 was taken for broadcasting. This is minimum time on this is the average

76:23 . So question for all of you why is there so much difference between

76:32 , time and men? Time among processes. Any guesses? So,

76:43 the NPR broadcast, every process sent , um sent it to every other

76:53 , right? No, only only process, the rank 01 broadcasted its

77:00 to all the other processes. but the other processes had to wait

77:11 they received everything in the office Uh huh. The main reason is

77:27 of the newme architectures So it's not that all the processes have the same

77:37 , Uh, to thank that I'm process. They are located,

77:42 slightly, um, further or closer , Uh, Thio France zero.

77:51 it's just the Leighton. See, , that doesn't reach you see such

77:56 such a difference. And when you execute these programs for, let's

78:02 more than one note you will actually , uh, are more significant difference

78:08 the timings because it can happen that you know that you got access.

78:12 could be, uh, they had different corners in the cluster. So

78:18 will be the time that it takes go from one note to the interconnecting

78:24 cluster to the other note, which can be very large. It's

78:30 a question of how broadcast is implemented , um, in the poor unquote

78:39 case, it will be as many operations as there are receivers.

78:52 because, um, there is one and a bunch of receivers,

78:59 So the value has to be replicated on the most naive way of doing

79:06 is just too. Do as many sense but same source and different

79:15 that will implement the broadcast. So the the sensor sequential because there is

79:23 source and that means that it takes start to finish to guide. I

79:30 the message glasses not just because the , but because of other replication is

79:42 . So I'm thinking, and in would you build things like and

79:52 Then you can build the broadcasting tree the network so that messages gets replicated

79:57 the network instead of being replicated at source. So it depends on

80:05 It's implemented also, if it becomes serialized or there's some parallels going on

80:12 the book. Yes, Think, , convey the other questions.

80:35 Now I will come to my last for today. Uh, so in

80:40 example of what I try to do spawned eight processes and try to divide

80:51 into two local cops. So until , all the examples that you saw

80:56 lthough processes part off one global but it can happen that you may

81:03 to divide them, uh, into subgroups. A Z signed one off

81:11 later slides in the lecture as Eso that is also provided by FBI

81:20 eso In this case again, we with M p i n it just

81:25 get Frank, We got the size the communicator. Uh, that's the

81:31 communicator. Uh, then this example basically what it does, is

81:37 um every process stores its rank in sand buffer, and towards the

81:45 we only oflife reduce, uh, the subgroup off the rack. So

81:52 the end, all the subgroups will a some off off all the ranks

82:00 there. Anderson Group. Now for first, you need to call the

82:07 ICOM Group, which gives, returns all the other banks off on

82:16 processes that are that are spawned inside global communication. Once you've done

82:26 uh, you can provide these conditions that all the ranks that belonged to

82:32 first half off the ideas, So this case, it will be 012

82:37 three. Uh, that needs to included in a new group And all

82:45 ranks, uh, on the second , that's 4567 They will be included

82:51 another new group. And that is by calling this function FBI group included

82:58 inclusive. It takes the parameters for group, then it needs the number

83:09 processes that will be part of the group that will the creation. Then

83:15 means the Ides off those off those and then a trick tons all the

83:24 of that new group in, in that, a new video.

83:30 this is done for the second half the how have the processes as well

83:37 again, uh, so these were only process specific operations that you needed

83:44 the conditions. The rest of the need to be qualified. All the

83:52 and so next is that we need create a new communicator for each of

83:57 group. And that is done by calling NP. I can create takes

84:04 again the Global Communicator and the new . And so, since this food

84:11 be executed by all the processes, process will have only the reference to

84:17 variable for their particular bank. So 0 to 3 will only have information

84:25 , uh, thanks your country and they will have a new communicator

84:30 will have information about on Vito's those . And so once the new communicator

84:37 groups have been created. Now we use MP I all reduce, which

84:43 reduces the values from all the all all the processes in that particular

84:52 one one of the but now already a little bit different. So it

84:57 the values from all processes inside all processes. So a normal longer

85:07 only the final result would come to of the processes. But for all

85:12 , the result will be available to the processes by people. And what

85:17 basically does it have some sort the off those particular groups and uses the

85:25 communicated. And in the end, can each frank friends what I've

85:32 I think that received off the very of use because you're fine. You

85:43 see they have the global ranks. as you can see, the global

85:48 group rank are, uh, So global rank goes from 0 to

85:56 . And since we divided crooks into rounds off each, so each group

86:01 a ranks from 0 to 3 and to 3, and in the receiver

86:05 , it's just a some off local for each of the each of the

86:13 . That's pretty much, unfortunately, questions? Like a small structure.

86:28 you. Yeah, thanks. Thank you. Any questions? We'll

86:50 to talk about N P I and . No question. Against some of

87:03 first stop the

-
+