© Distribution of this video is restricted by its owner
Transcript ×
Auto highlight
Font-size
00:04 so today a little bit more about I Andi. I realized I may

00:11 cover everything I wanted to talk about NPR today, so there may be

00:15 a little bit next lecture to. , um, I'll start up talking

00:27 the one sided communication so today and various suspects of it and hopefully also

00:34 about something. There's no NAS virtual . And, um so in the

00:45 lectures about NPR and it has been known as the two cited communication where

00:53 and receiver processes there, um, in one way or another, whether

01:00 blocking or non blocking. But they are whether it's a symmetry between sender

01:08 receiver. This is not the case it comes to the one side and

01:15 , and we'll talk about the differences and how the one sided works and

01:21 it was, uh, introduced. when that c all right, so

01:34 is steady. I remember that processes are very much self contained on like

01:42 when I came to open and be . In this case, each process

01:48 his own private memory. There is Chand anything, really, between

01:54 There are have their own code and own in the way we do

02:00 They have spin the model that same for this class used they all the

02:10 , and they all have their own . And as we talked about so

02:16 , it waas message passing. The option for is changing information or data

02:23 the various processes. So now to this one sided communication one actually need

02:31 . Some notion off a t least accessible memory. Someone process can either

02:42 data to retrieve it data from some process memory space or right into

02:50 So this is the notion off, , having a mechanistic to create remotely

02:58 memory, as it's kind of illustrated this slide. And as you will

03:04 when we talk about it, that again done within, uh,

03:12 So it is local to communicators the these things is being done.

03:21 eso this is basically then creating uh, notion off the global address

03:30 for a particular communicator and the vocabulary terms of M p. I

03:40 it's called the window. So all processes within the communicator they have should

03:50 access to this created global and and we'll talk about how this is

03:56 created in the next several slides. just shows what you can do.

04:04 can go on retreat data from other and globally accessible memory, or one

04:13 write, tore up, take that of memory. It still means that

04:17 decide who create this globally accessible memory part of your private memory, but

04:25 doesn't have to be all of So here is a little bit of

04:31 dynamic of whatever click through illustration of difference between the two sides of communication

04:37 the one side is for this is the two sided first. What happens

04:42 again that there is maybe the receiver that post receive, and then the

04:51 again. I want to send And then these buffers, in terms

04:56 the center, then gets handed over the NPR Communications library, takes the

05:05 from the Senate buffer and or tells the pointer to where the data should

05:11 from, and then get said via receive buffer to the receiving process

05:20 So this is again illustrating a little of the synchronous behavior in terms of

05:29 the communication library manages the data that and receives, uh, statements in

05:36 code ask things to be happened. it's that's what's a little bit to

05:41 discussion between non blocking and blocking and happens why things can proceed, even

05:47 of in the blocking version. But has to wait until again. The

05:52 a library has taking care off and that things can be overacting now in

06:01 of the one side and communication so done differently is on this case

06:09 side. It's the same, but than it receives the data than it

06:14 can directly deposit the data without receiving on the receiving process. So this

06:26 the basic differences. It it is of one sided only one, and

06:33 in this case was illustrated for, , descend action or the desire to

06:40 or update the memory in the recent process space. This just shows a

06:50 bit kind of illustrate. The difference that one part off why the one

07:00 was introduced is performance in the Zonda Predictor, and if you have blocking

07:08 now, there's only one process so you don't need Thio in any

07:13 wait for the receiving process to be to receive the data. It's also

07:22 case that when we have the two of communication that always needs to be

07:29 sense and receives, and there is such requirement in the one side,

07:34 communication. So depending upon what's your in, is the code on the

07:43 process. When it comes to one , communication does not need to have

07:49 explicit expectation or, um, have matching sort of receive statements in the

08:00 because she don't necessarily know how much or have any sense to expect.

08:06 it's kind of difficult to figure out many receives you need to insert

08:12 So that's the other part of creating flexibility in her processes interact by having

08:18 one sided communication. Dr. Johnson , confusion back to slides ago to

08:26 one side of communication Example. Eso yes, the erecting was on the

08:32 . Um, so those air those the buffers, correct, they

08:37 Say what the n p i communications does. Okay, so I

08:47 e guess it could be a but not necessarily. Right,

08:51 so that that, um, that's the data that was sent and

08:58 That's there all the time, So that that's what makes it possible

09:03 the receiving, um no to Hey, I'll get it whenever I

09:08 , but I don't need thio signal the other one as opposed to excited

09:15 . Now it doesn't single anything. , it's like kind of a normal

09:25 statement just updates that memory. It happens to be in another process instead

09:33 the process. Memory off this in case, sending process. Okay,

09:41 it has no knowledge of whether or it has stayed in the buffer until

09:46 , uh, requested from the It's no buffer. Basically, it

09:52 a memory segment, so it doesn't on, look and receive buffer or

09:58 for the data. This just accessing and it may have get updated or

10:04 . And that's you have to make that if you depend on being

10:09 the other mechanism you need to use order to make sure that the message

10:14 gotten there Okay, so is this side of communication? Example?

10:21 can we think of it as sort the way that you can make promises

10:25 the compiler to guarantee that something is happen. Ah, good question.

10:33 need to think about that. There independent processes. Um, they have

10:47 own program counters. And if you synchronization and special ordering, you need

10:53 be very explicit about it and these . So I'm I think to put

11:05 more in towards the the I'm trying think of the differences between the guarantees

11:12 have whenever you're doing two sided versus loss of guarantees whenever you have a

11:17 sided communication. All right. um, if again you need the

11:31 of the cold depend on data having updated before you proceed to do

11:41 then the compiler does not automatically help out. And by inserting stuff for

11:49 , do you have thio? Make and I'll talk a little bit.

11:54 sort of how you kind off guarantee or synchronization when there is not involved

12:01 the communication action itself. Okay, . So because the it's natural to

12:17 of think of the very any different running on intuitive want time to think

12:27 it on similar, capable process but it's now not guaranteed. So

12:39 process, even though it's the same , may run on something elegant many

12:44 as fast as they receiving us the process. So it may have

12:50 you know, going through a lot instructions, a unique time than the

12:55 does. And if you're ensure that a certain action in the code things

13:04 to be updated than you need to sure that that has in fact happened

13:11 some mechanism, I probably will not you. Okay, that makes

13:19 E was just commenting because the obviously side of communication has less,

13:27 handles associated with it, if you . Eso that implies other stuff that

13:31 have to do as a programmer. have guaranteed that, um, to

13:35 able to take advantage of the one of communication to begin with,

13:39 Yeah, not so much at the . Well, for correctness.

13:45 I think you have to guard against . Make sure that when there

13:51 um, ordering required for corrections of code that one has to be explicit

13:58 it and how the code is Um, in terms of the

14:05 it's the headaches off the implement er's the MP I Library Communications Library to

14:10 sure that the hand shaped between sender receiver and access to the particular receiver

14:22 memory is, um, can be safely. So that, um,

14:33 off, he says, synchronization, . That is all that he didn't

14:40 the N. P I communication libraries not, something that one would need

14:47 worry about in the application code. as I said, the logic ordering

14:54 to be it's not built into the that the level of synchronization you want

15:02 happen through this Andruzzi process. Now the couple. So ordering has to

15:08 , then explicitly. That's when to extent it's necessary. Okay,

15:17 Thanks. Thanks. Okay. Welcome good question. So it's It's

15:24 Uh, correctness is guaranteed and what mechanism are. All right, so

15:31 can we're on this slide. So think I'm pretty much said that

15:36 uh it increases the flexibility. And d couples, um, communication form

15:45 ordering a synchronization that may be required correctness off the concurrent processes. Not

15:53 huh. The N p. I represents, um and I think I

16:01 much also said this already. so maybe I will, uh,

16:09 to the next line. So the sit next, set the slices,

16:15 how do you, um, basically ? All right, uh, about

16:22 parts off the process. Memory is from other processes. So on this

16:34 something pretty much when I said I'm quickly to that NPR's terminology for this

16:48 off process local memories that kind of up in it. Pool that is

16:57 . Bye. Processes in a communicator known as this as a window.

17:07 , so and maybe you'll start to sense by shows a little bit of

17:11 structure is so that means the MP library and needs thio. Know what

17:18 address is in the different processes as off this quote unquote global address

17:27 So when there's, um, communications nah, it happens in the right

17:36 of the well accessible memory space. there is a few off the

17:48 P I functions when you called um, within quotes generate this global

17:59 space, so it depends. as he tried to stay on this

18:07 here, whether memory for this the accessible and you just have already

18:17 allocated or not. So the allocate second line on the slide best player

18:27 allocates and globally accessible memory as well make it immediately available. The other

18:39 is one can also create this, , space if it's already is allocated

18:50 of memory and they want to included the global address space. And then

18:55 is a dynamic version where one can attached and detached, um, memory

19:02 this globally accessible address space and all talk about they said a little bit

19:09 on the next few slides. So tries to basically tell again which communicator

19:20 window nuts. I want, global accessible. Other cities you want

19:26 create or which communicator it applies and . So the name of that global

19:37 spaces and the rest of the part defines that particular address space. So

19:47 is some flexibilities, you know, address and the size and displacement from

19:53 starting address. That and there is flexibility in terms of how you do

19:59 , that it doesn't necessarily need to the same on displacement on all the

20:07 involved in setting up this global live the global yeah accessible address space within

20:16 communicator. So there's some flexibility, it's, um, how this is

20:24 done. But all the participating processes the communicator needs to process or have

20:35 call in the coat on. Then guess it's just listed. I'm not

20:43 through it. But eso this question the same for arguments wanted the last

20:49 . Thio inform MP I about how want. Some think that much than

20:54 can use Thio Tater Access rights. then there is just a simple science

21:08 cold and in this case, creating window where the variable or their race

21:14 , a memory space defined by a a starting address, Uh, in

21:21 case, the communicated with all the . So it's not restricted to some

21:27 of processes. In this case, waas three entire collection of processes the

21:34 World communicator And then, as it here, one gets the allocation and

21:42 they, uh, memory space should be freed up properly. Um,

21:51 it's a similar and I was creation the allocation of memory if it wasn't

21:57 done, but and then it can immediately available as well. In the

22:04 address space. Let's see. And is an example of that investigation.

22:10 arguments again, for all of them involved using the and convert Communicator on

22:24 . So this is the dynamic um that then can be used to

22:33 some later point and attach or deep after it's attached, if they want

22:43 date touch allocated space from this globally places so initially it creates the

22:53 but it doesn't initialize it in any , so it's kind of empty.

22:57 then, um, one can use attached and against use an example.

23:04 says Attach comes here also. First create dynamic, um, address space

23:14 then later on a particular piece of memory is then attached to that particular

23:27 . So, um, this is my guess how they actually function called

23:33 like And what the arguments things. , so you have it. It's

23:41 goes beyond what your has to do your assignments are These are typical of

23:46 advanced concepts and things I talked mostly first thing that you should be aware

23:53 these one sided communications that have gotten lot of traction. But,

23:59 but this intercourse I choose and not to make you use it. But

24:05 you want to use it and or least ends up using NPR and other

24:12 thoughts, of course you should know . Have some starting point producing this

24:18 side and communication. Okay, so was a little bit about how Thio

24:27 this and initialized this globally accessible memory a fraction off each process.

24:40 local memory and what's not then, this global memory part is then still

24:49 to each one of the processes, other processes cannot reach or update that

24:55 of the memory now the actions and Typically I used for Theis. One

25:04 of communication is was known as gets puts Andi sometimes with some associate ID

25:14 . And so this is also what been common is in what's known as

25:20 memory programming. Margarine indeed do have shared memory, as Waas has been

25:32 that actually physically implements what Peter then called as a distributed shared memory.

25:41 it has explicit mechanisms and, you , hardware on firmware that gives the

25:49 of a shared memory to the programmer you can write on bond retrieve variables

25:57 if it waas say in the No, that's we've talked about before

26:04 in the shared memory programming models of there are many than put this basically

26:11 into the shared memory. And, , is received data from the shadows

26:18 and I will talk about them in next few slides, and trying to

26:23 them on here is kind of a off them. Uh huh. For

26:29 . So here is then, the of this put that's you than one

26:36 wants to write into another process And as long as it is in

26:42 globally accessible part of the memory in promote process than theirs, right or

26:53 should proceed. There is no notion time, so to speak. But

27:02 and it gets handed over to the of library and the uh huh process

27:10 executes the port can continue on the process. I have no idea doctors

27:20 eventually happening or going on, so depends again synchronization, and I will

27:31 about in a bit. So it , Um uh huh. One needs

27:38 be careful and using this to make that one gets the correct behavior of

27:43 coats. Yeah, this is basically opposite that the process request data

27:51 you know, the process. And long as the request addresses something in

27:56 global a shared memory off the other , eventually the decided that will arrive

28:05 the requesting process. Mhm. Um , uh, there is also the

28:17 that is kind of the put with . So that means, uh that

28:28 target address, then get updated words value that is being put being on

28:47 plus operation being added to whatever it or is in that memory location.

28:54 it's right when and I had effectively what's in the target address. And

29:04 there's a number of UPS codes like had subject multiplication. And and

29:14 there is the reverse thing where you return, even got the data and

29:20 Thio, whatever is in the local . Yeah. Yeah, And then

29:31 a couple of other ones that can and swap and fetching up and fetching

29:36 is, you know, pretty much to the get accumulate in comparison.

29:43 is that? Taking the comparison and a conditional swap. So these are

29:48 basic kind of share memory, like that the one sided communication model enables

30:00 , I think. Oh, any on that? So far. So

30:14 so these, um, communications actions says that they behaves as if,

30:23 , you are kind of the only acting and the global memory. Now

30:34 said, depending upon a lot, of the code requires and there's issue

30:42 in which order things happens. And this says, I think the next

30:46 slice may illustrate it better because it's of pictures instead of text.

30:51 um, it tells when there is ordering implied between subsequent communication requests on

31:03 there is now, so talk to . But this is the documentation,

31:10 text to this pictures of this says , uh, illustrates what it said

31:16 the previous slide that basically, if have so and sequence of put

31:23 there is no guaranteed ordering in which will be updated or delivered upon the

31:33 process so there can be many reasons things the right out of order organs

31:41 on the border on the receiving Just as example again, if these

31:51 runs on different nose in the network no guarantee that the path that the

32:04 so they put statements, takes in network on the same and that

32:13 um, time it takes for and the first put to reach the receiving

32:24 is any shorter, it could be longer than perhaps what happened. It's

32:30 case for the second good statement. again, there is no guarantee all

32:38 , border in the receiving process and in puts and gets. There's no

32:48 in terms of the ordering, when it comes to sequence off the

32:56 statements. And apparently the intern standards that for that case, the ordering

33:04 to be respected in terms of which order in which takes the communication statements

33:11 executed. Eso the next sectors sites I have is showing a few of

33:23 different ways off mhm and forcing, would say, ordering or synchronization between

33:34 events that the process is they And I guess one of the important

33:45 is that there's this notion of a that is basically Windows off time,

33:50 you like between the synchronization events, think it's the best way of describing

34:01 , and hopefully in the next few for make it somewhat more clear.

34:10 one set, Uh, that makes sense, uh, intuitive against this

34:18 of defense or kind of like a that you start one of these air

34:26 and say OK, from now there is, you know, receptive

34:34 and by some later point in the , you want to make sure that

34:39 communication actions between these two fence posts completely so then this a book.

34:50 is what he said before, there no particular ordering in terms of successive

34:54 reports and gets. But there is guaranteed the order, and when it

34:59 to you accumulating communication instructions or function . So in this case, all

35:11 processes involved in this window has to these two France instructions. So this

35:19 one way which one can guarantee, instance, that the put action has

35:26 on the receiving side before one start access that memory location that presumably was

35:34 to be updated by another process. , now, this is,

35:46 a little bit more flexible in the that the target and the origins our

35:54 have execute there in our own subtle definition, all the start and finish

36:04 minute. So, in this uh, the targets starts an effort

36:13 the post and completed by away, , on the origin that actually is

36:21 several cars. Another start by the on than a completion off the

36:30 And I remember correctly, they their are not allowed to be nested,

36:41 they have to be sort of matching on targets and origins on. Then

36:54 what I want to talk about before kind of the active ones were both

37:00 and origin are participating and defining the . Where is the passive target Molder

37:11 that It's just that the in this there were origin process. Want to

37:19 sure a certain communication actions has completed you move on in the coat.

37:28 in that sense, you think kind create these two fence post a lot

37:36 the lock and unlock part on this just still the text this carbon war

37:47 what the arguments are. And again applies to the window. Uh,

37:55 is, um, globally shared I think there's another one that's again

38:03 a similar defines what is involved in locking, unlocking or guarantees distinction,

38:16 or of sequencing of actions in the . So this is it. Everything

38:27 . But just to illustrate that for correctness of the gold sequence may

38:34 important that when you use one side communication that synchronization has to be made

38:42 . Thio one off the synchronization function where there is the kind of the

38:54 rigid, uh, friends function called the totally local lock on lock or

39:03 somewhat more flexible? All right, target function called that is separate down

39:17 origin and target processes. So I know, even though it wasn't very

39:27 , my description that maybe at least some idea off better idea how the

39:41 between sending and receiving processes can be in the one side and communication.

39:56 I was a question raised by other the horse. So did it help

40:05 ? Yeah, of course. so the next example is not so

40:09 about using synchronization, but next examples trying to illustrate some of the performance

40:17 in terms of using one side of . But, uh, I would

40:28 that on the flexibility has been the driving forces. Why putting gets were

40:35 On one side, the communication was in MP I. It wasn't part

40:39 the first for so I think, standards releases. So this is trying

40:50 illustrate the notion off the one side communications for base for the puts and

40:58 . And there is a bunch of here, and I guess what they

41:03 the first one sided N p That was the one sided N p

41:07 . And the other ones are different off, I will say once at

41:13 to started the FBI, Okay, a computer company that has been particularly

41:23 on high performance computing systems if you know it. But for those who

41:30 know crazy, synonymous with supercomputers and kind of spin there very much on

41:38 product line, Um, so when comes to the put on the left

41:46 on this graph, you can see Leighton see that is remarkably lower for

41:57 particular for small messages. And they they, I think both access or

42:08 longer ethnic. So even though it not show that Georgia difference, it

42:13 algorithmic such a reason with fraction. when it comes to potentially wanting to

42:23 processes than there's not usually a whole of data, so the messages or

42:30 data that is being communicated maybe quite . So in that case puts and

42:36 maybe quite beneficial in particular for the once cleared a message body, it's

42:45 than it's not so apparent. What gayness, Because then the payload this

42:53 again, apparently for the get, is a round trip you have to

42:59 and received. The data was not as pronounced as or the foot

43:04 and and then there's another one for here, um, in terms of

43:11 message rate, and in that it's also noticeable again. This case

43:18 look a lot of factors, for the smaller messages on then it's

43:26 that marked for very large messages. and then I think there was a

43:35 of other also examples here off potential Um and I guess one thing that

43:47 the left hand silence that's higher is . In that case, it was

43:52 search. And I think that's one that may be good to be aware

44:01 in terms off the blue curve in graph on the left hand side that

44:12 communications software, depending on walked is being used for actually sending

44:20 Um, the Macklin the mechanisms used be quite different for small messages compared

44:35 large messages. And And I wanna taking you networking class. Uh,

44:46 have become familiar in terms of TCP in terms of jumbo packets or

44:52 So the mechanisms that may be used again small and small messages may have

44:59 buffering mechanisms. Confirmed a large And that's my guest is the reason

45:04 this very significant drop in terms off bandwidth. I once you get,

45:11 guess past 32 or someone's most 64 whatever in the number off elements are

45:22 terms of this, uh, Well, that is just just one

45:28 thio caveat. So to speak toe aware of that underlying communication mechanisms being

45:33 for M p I libraries, um end up that if you do some

45:43 , you'll receive behavior like the blue instead of a nice smooth one

45:51 uh, this is again guests, other more application oriented benchmark. So

46:00 cases, you know, the benefits marginal. In some cases, it's

46:04 measurable. And I think that waas have an example. I remember just

46:16 using, um, the one sided . So any questions on this,

46:24 , performance graphs or benchmarks? If , I'll talk a little bit more

46:35 an explicit way in which, the get them put in prison.

46:46 example I'm going to talk about is just use inputs as far as I

46:50 . Now, um, it is Kobe example you have seen a couple

46:58 times before already. So it The notion of this competition should be

47:05 familiar. So the basic operation is realization operation in which and no,

47:16 is updated on it. Slightly I think in there previous jacoby

47:22 Um, that was used for open be I waas. Yeah, five

47:27 stencil. But at that point, it was only updating the center point

47:35 just used to for neighboring points in case, and the five point potential

47:41 actually also using the center points But it's kind of reminder difference.

47:47 point is that one needs to access from four neighbors. Now the thing

48:00 , now we're trying to use, , not honest certain with several

48:07 But the reason for several processes is want to use several notes in a

48:12 . So now things gets partitioned. , but summer measures or sub grids

48:22 process and those processes are most likely going to be on the same

48:29 Some may be, uh, but on how you want things to be

48:37 , I think of these processes for exercise, I would say being on

48:44 notes, it doesn't necessarily matter because is only about the logic interaction between

48:54 . But as we can see on right side on this graph that when

49:01 comes to mesh points that are of edge off the sub domain of the

49:06 mesh allocated to process, it would data from another process. So this

49:14 where we come into Uh huh. would need to get data from processes

49:25 adjacent submission is so, um, yes, I said that before.

49:41 think that's in terms of processes. typical rule is that psychology owner computer

49:49 , she got submission a scientific That process also does all the competitions

49:59 to do the update on that So in that case, in this

50:06 , that means that sort of the submissions requires the Jell O nash points

50:17 cells in order to do the full off the white submission. So

50:26 sometimes these are the owners boundary sometimes the shadow regions and its many

50:32 names for these yellow pieces. So typically is being done than that?

50:42 process allocates somewhat larger submission that can the Jell O pieces of the

50:54 So those yellow pieces pieces are basically or replicates. What, in energy

51:04 ? Uh huh. Process. So guess this is what they says

51:10 that when ads than these yellow shadow and that can now be holding the

51:22 the white pieces need to update on cells or great points. Yeah,

51:32 on this line, based on the shows what long process needs to send

51:43 adjacent processes because the green ones are things up a needed by the yellow

51:53 the four adjacent processes to the one has a green in the box.

52:04 no one can use this one sided To do this on. This is

52:11 the origin target, but first, do the one sided communication and then

52:19 can use now the foot statement to update those regent. And let's see

52:30 there was another slide for this. , this is just the cold.

52:36 , it does this thing. But I talked about the cold,

52:44 any questions on this example of how tend to use in many codes that

52:51 this notion off 100 regions. Now waas just a five point stencil.

52:58 there was kind of only one this or so in adjacent process needed.

53:06 it's your has, um, a stencil. Um, so go on

53:16 slides here to sell the stencil. a stencil. So if you,

53:21 instance, have something that goes to great points away from the center point

53:29 each direction, um, then you need to reach deeper into your neighbors

53:36 also send more of your own grades adjacent neighbors. And there's also doesn't

53:44 to be just following the coordinate You can also have and diagonals and

53:51 complex stencils than anyone that does. analysis knows that this, um,

54:00 for how you use great point values be considerably more complex. And this

54:11 kind of known as higher order stencils tends to potentially gives you and more

54:19 it cone or to your integrations. you do electric tive algorithm. And

54:30 for that matter, there is um, fundamental 36 that quantum chroma

54:40 . They also interactions using more complicated for for interaction. And they come

54:50 sector used to represent the physics. yes, on that, that picture

55:00 there. This is, of happening for all the stencils or the

55:06 , right? Right. For the second row, all the way

55:09 the right has put arrows left up down. Right? So the idea

55:16 that, you know, basically every execute this put statement, so there's

55:25 one being shown here. But if cannot take the middle road here,

55:29 it means the process on the left correspondent Lee is, um, execute

55:39 to its neighbors. So the center on this picture will also be the

55:47 of puts from its four neighbors. , Perfect. All right, now

55:55 make one more statement relating to not asked about earlier. So now,

56:06 out from what the algorithm is um they use this kind of

56:21 uh, attractive jacoby algorithm, in case you may need to do as

56:27 done in an example, in some lecture that you have old and new

56:37 . So the green doesn't kind of over right yourselves. So they kind

56:44 do things and the lobster lock step terms of updating. But there's also

56:53 that is known as chaotic relax ation where you don't really care. So

57:03 that case, uh, the receiving of the put statement, it's when

57:10 it rates it. Just use whatever it happens tohave whether the center point

57:17 several integrations behind or ahead. It care. So is an example,

57:27 on what the it intuitive algorithm whether you want to enforce and ordering

57:38 . So the global integration steps, you like, or if you don't

57:43 . So whether you use some form fencing, synchronization, block and lock

57:51 not, it depends. Um, , the numerical algorithm there you're actually

58:04 and maybe I'll get to talk about before the course is over. But

58:09 relates to, uh, algorithms and to elaborate a little bit further on

58:15 . So if you think of this in, uh so you're very large

58:22 . The process is there are the off the center. Point may not

58:29 allocated to anything. Any notes in physical machine that is anywhere close,

58:41 where you happen to be. So communication time for some neighbors may be

58:48 short and two other neighbors it may very long. And so that's why

58:59 in terms of parallel computing, sometimes a synchronous algorithm has gotten a revival

59:06 of trying to enforce synchrony because if you can use an asynchronous

59:14 maybe in the end, billion require lot less time for reiteration on

59:22 and you can reach the answer more . The position you want.

59:30 so here is just a little bit to illustrate the cold for this particular

59:39 . And you can see there's just for put statements that illustrate the four

59:48 that is, too for this sort no ourselves. And then there's 24

59:54 West in terms off what the target are and what the for the sense

60:01 for the puts, you know, the source and what the destination.

60:05 it uses the, uh, different of windows designed to the,

60:16 different boundaries Selves so on. And believe in C. Ashkan perhaps.

60:36 so I think these codes are available on blackboard or I forgot to check

60:42 own blackboard. Um, let your I think there there are available.

60:49 if you want to look at the codes, there are there,

60:53 not just this cold segment for the statements, double check. But I

60:58 it's under the contents section of FBI . Yeah, I think there should

61:04 example codes for everything and very Or if that it's there. If

61:09 not already, I'm sorry, but reminding to check before where they

61:18 but you can also, um, fact, this Google putting in there

61:25 name It's most instances you will find website and get sore still. So

61:40 is just quick example how you can it in a product. Um,

61:47 you do matrix matrix multiplication using to accumulated. We're in this case,

61:54 , they used the multiply, or add operation to do the,

62:04 , in a product. Ah, this is another again, just where

62:10 can find the gold? I hope is. Well, we will double

62:14 . Can my apologies for not checking of class? Um, so this

62:20 a little bit of discussion, I already had. When it's useful,

62:27 use their different synchronization roads. And depends again on the algorithm. And

62:35 could also depend on the environment the that you're actually using for the

62:45 Let's see what else? Okay, I'm going to switch a little bit

62:54 dealing with processes, so I'll stop a second to see if there's any

63:01 question on one side of communication. it'll be different topic for their over

63:09 rest of us. The class Okay, So this notion of virtual

63:22 is something thio that addresses how the are logically organized on. Then there's

63:37 other part that is the mapping, the processes and then mapped on to

63:45 physical machine. So the reason for notion of virtual apologists In the early

63:56 , um, of M p i , I will say to fold one

64:07 like in the Kobe example. I mentioned the from the applications perspective.

64:20 kind of beneficial to think off. process is being organized, I

64:29 In that case, a two dimensional . It could be any dimensions.

64:36 34 Whatever is useful in case of application, But in that case for

64:44 Jacoby algorithm, it's, you East West, North South. Neighbors

64:48 the grid is kind of the logical of writing was cold. So that

64:54 one reason for these virtual topology and that was the condition.

65:03 The other reason well, and that's relevant. Depending upon the application,

65:10 tends to be began in Terms then, is scientific and engineering computations

65:17 these highly structured rates may not be dominating as they once were. That's

65:24 more flexible petitioning off space today, maybe was the case in the early

65:31 . So computing. Um, but other one waas too. Matt,

65:44 process is onto the machine in a that was beneficial. We respect to

65:51 the various notes and the machine were . And we haven't talked about interconnection

65:59 yet, but I will in the lecture and for I would say,

66:08 15, 20 years. And even , today, some form off multi

66:18 mess has been quite popular. So small collections of processors to the measure

66:32 very used and may still be Andi, um, you may remember

66:39 I talked about processors processing architectural thing her for something. Um, when

66:48 comes to multicourse today, the high count ones that tend to have the

66:53 day mission to connect. So there some potential benefits even on a single

67:02 toe. Have wrappings that kind off the physical into connect in order to

67:13 do contention in the network, depending what you're algorithm is doing otherwise and

67:21 together machines upto find dimensional tourists have used and are still being used by

67:29 vendors. But they're also other Topology is off interconnection networks for processors

67:38 for which the petition, um, may not be particularly relevant. But

67:48 , I will talk more about that I talked about interconnection networks and then

67:51 other topology that was supported as a topology is to be able to define

67:59 graphs. All right, so the thing is, so I guess this

68:04 what I already talked about. when the right well grand center one

68:12 the typical the STD regular mesh that used to be quite frequently and still

68:20 irrelevant for two or three D form in terms of physical things being simulated

68:26 the lower right hand corner shows more some dependence craft between, um,

68:33 , you know, functions or Actually, this in a level left

68:37 shows in. Not a typical, know, final element grid.

68:42 that will final volume type creed that may use for doing some simulations,

68:48 aware flow around the rain. uh, you know some other things

68:55 your car and I'll talk about how carve data structures up and allocate

69:00 Two notes in a later lecture. business. So now this is what

69:10 talk more about next time on the thio. The process is,

69:18 to the actually the machine. But move on. Thio talking about the

69:25 P I petition first. So here's of the petition things. So what

69:35 has then is M P I routines allow you to create a Kardashian

69:46 And I'll talk a little bit about different routines what they did.

69:53 so this is kind of a little Summers like what I want to talk

69:57 . So I think I have slides each one of them separately, but

70:02 is the first one they create is creating cutie condition and from a Nexus

70:12 communicator. So you have one communicated an input to the card create

70:17 And then they create another communicator and talked about the other ones hospital.

70:24 there's that create, um, function , and there is couple of

70:38 and I'll say you will. I on the difference line here about There's

70:43 said. There is one, input car communicator Now and then you

70:57 a new one so the import could the convert that it's have all the

71:04 that has no structure, and then want to create a sort of mesh

71:12 configuration of processes on you specify the of dimension there could be 2 to

71:18 or whatever number of dimensions you want then for each off the dimensions.

71:27 also define the extent of that dimension putting an argument to say how many

71:34 alongside the quote unquote X axis and many processes along the Y axis

71:42 Then it comes again, motivated by lot off the miracle analysis golds.

71:51 you want periodic grids or measures, sometimes you don't. So you can

71:57 specify for each off the dimension if want to wrap around or not.

72:06 then there is the reorder attributes you specify. And that is whether

72:17 P. I is told Thio preserve process rank from what ING is in

72:29 inputs and big communicator so it will the same rank in the output.

72:37 if N. P. I is to assign a different rank in the

72:43 Communicator, and I think that's pretty what it says on this slide.

72:52 reorder is a little bit, not so easy to understand in the

73:00 it does not in itself do any or mapping off processes to processors or

73:17 units of any flavor. But it MP I and M P I libraries

73:22 may have an information about. But machine looks like and based on other

73:29 in the mapping process, come up new ranks that both potentially benefit the

73:41 . Um, so here's just unexamined showing the case where there's wraparound along

73:48 axis and no wraparound along the other . Okay, now there is also

73:57 deems create the team that, is trying to offer some service unless

74:10 know what you kind of really want the shape of the petition great that

74:20 being created. So if in the um array, you get, put

74:35 value zero for the extent of the of process of the number of process

74:40 a particular dimension N p, I take the US ah, opportunity to

74:48 and figure out walked from that performance of view. We respect of what

74:57 knows about the cluster to assign a length or number of processes and what

75:10 tends to do it tries to make as kind of Poland called Square as

75:23 , whether that indeed is beneficial to application, it doesn't really have enough

75:29 to know lot. The usage pattern in the code Thio guarantee that that's

75:36 fact, the goal that is the . Now the other thing I want

75:43 say. And I was, try to figure out I think this

75:51 a function that is not supported in NPR implementations. I could not find

76:00 in open M p I, on other hand, in another NPR implementation

76:05 Azam pitch I faras. I know still there. Um, so here's

76:16 little bit concrete example off what it do on as it said on the

76:21 slide. Yeah, um, things to be easy. Evenly divide herbal

76:32 the total number off processes by the length. So that's why things may

76:39 come up the way you I hope , because there's this restrictions that things

76:45 to be evenly divisible, including the number off process news that you

76:54 You can also specify through the endings lengths, if you like dimension

77:02 extents. Something was product is smaller the total number off. No such

77:08 half. In which case, kind . That's fine. But you can

77:13 Thio. Basically, it has to in the communicator that you're given.

77:20 we'll get the matter. Andi, encourage you to take a look at

77:23 things, but you can see that . If you have. It was

77:32 nodes in this case in two dimensions the first role, and then they

77:36 to do it a Z evenly as can, and I'm then the comes

77:43 a three by two. Great for dimensions. Since, uh, it

77:51 everything. So two by two, small and three by three is too

77:55 . Began three bite Um, and guess this may be a good

78:05 And don't the what? The ordering terms of processes it uses, as

78:18 said is fairly is undetermined. It's specified in the standard, and NPR

78:25 free to use whatever it does. reorder is set, Um, and

78:34 way it may do it. It's to use draw major ordering. And

78:38 you look at a lot of examples there, that's what people you should

78:45 on their slides. But it's not true that that is, in

78:49 what happens because again, NPR implementer free to use whichever way they

78:55 Andi Potential use machine configuration info Thio I'll do the mapping the between processes

79:07 the Input Communicator on the processes in Output Communicator. Let's see affection.

79:19 a few more examples on We'll try do that. And I talked about

79:23 mapping next structure, as well as combination off Opening P and and the

79:36 Materials guy and where you can use N P. I run Andi

79:42 you know, eight processes and in case that you want to condition with

79:50 dimensions in a four by two Then it tells you which rank gets

79:56 to reach on the to, dimensions and this a couple of

80:03 and you can look at them. think there is, uh, and

80:08 there are basically information routines that tells making gap information about the petition greed

80:18 , the number off dimensions in each of the access and whether their periodic

80:21 not so 10 see what your that also do the conversions between rank and

80:34 in the communicator and this'll is called . You can get the,

80:42 the condition coordinate for the rank that process is e think this is

80:51 There's an example that you have, and this is one, I think

80:57 last one I wanted to try to , and I will take questions.

81:01 then I can answer more questions about petition. And I will also talk

81:06 the great one next time, since didn't get time to do that.

81:09 as also sometimes, um, instead using the puts on the one side

81:15 communication one can sometimes is useful and and used. It's basically shift operator

81:22 left. Shift right up, It's a try, and there is

81:28 algorithms that kind of tend to use , coordinate organization one also have a

81:35 30 on. This is kind of example of that. Okay, there

81:41 more I should talk on. My is up. So make a few

81:45 comments about the condition next time and about how you define graphs. It

81:50 in the slides ever uploaded, so stop there and take questions. If

82:11 , I will stop the recording

-
+