© Distribution of this video is restricted by its owner
Transcript ×
Auto highlight
Font-size
00:00 mhm dr lecture time. No idea I get a green frame this

00:08 Mm So we'll see hopefully will not , you know, I don't know

00:15 it comes from. Um so today will continue with another tool that is

00:25 for understanding how your cold performs the you're using and that you know as

00:35 towel and after I give you I introduction a little bit of the characteristics

00:42 features of now. So yes with demo how to use tower as well

00:50 again demo some of the features of . Yeah. Oh okay.

00:59 So but now stands for this tuning analysis utilities. It's kind of a

01:06 kit that has a wide range of and it uses many other packages for

01:14 task. Like we mentioned last time used pop before getting access to registers

01:23 collects data about the processor behavior and memory system for instance can tell can

01:32 more and we can also do more that. Just an example of other

01:38 that are out there used to accomplish broader scope of things that it has

01:46 so there's many ways of instrumental in and we'll talk about that as well

01:50 depending upon what you want, it use different underlying for supporting packages and

01:58 of what I will give examples of not the latter part of the profiling

02:04 tracing since we talked fair amount of features on Tuesday this week. Mhm

02:14 this can use the picture show before shows the kind of broad scope of

02:21 and that it uses various other packages specific functions or tasks and as it

02:31 it's a comprehensive toolkit and as you see on this coming slides It follows

02:40 methodology and talked about last time. the instrumentation part and then there is

02:46 data collection part and then there's the part. It has also a good

02:53 of tools for the analysis part. . And this this this is pretty

03:02 just examples of what I talked about time plus some things that public does

03:08 do. So this you can also at IO and other features um network

03:17 and so on. The probably does support at least not in very wide

03:26 . Okay so I'll talk about the few slides, different ways of instrumented

03:33 codes and as josh mentioned last Uh huh. The top provides nice

03:40 for instrument in the cold kind of automatically. If not automatically. Sometimes

03:50 can be done to so instead of having to invest in search statements in

03:56 code like you mentioned for poppy you have towel do that for you and

04:07 the instrument, the instrumentation can be so you can control how the measurements

04:12 being done. Whether you sort of measurements, we have probes or some

04:17 future or to do some more indirect and there's various kind of scopes,

04:25 can do the whole program or you do things by functions or by loops

04:31 in many ways you can control how is being collected and then as I

04:37 it has a good set of analysis for trying to make sense of the

04:41 that is being produced during the measurement and I was talking about some of

04:47 and I think she? S will them or some of the tools you

04:51 for trying to make sense of the that you collect and this is just

04:59 or less whatever. I designed this of three phases instrumentation, measurement and

05:05 and it kind of characteristic is the the instrumentation can be done and working

05:11 a source code and one and and the other bottom here in the blue

05:16 is to directly work with a binner the execute double. So there are

05:22 levels in between in terms of using for instance or all kinds of rappers

05:29 try to instrument the code and the touch what they're doing until I think

05:36 just using the computer approach the instrument called But so you can correct me

05:43 I come to that and as I we talked about events last time in

05:49 form that would probably type events. is basically in that case it means

05:54 aspects of your cold like a number sections, number of cycles, cache

06:01 that are then label as events and can define exactly what type of information

06:08 would want to collect about your code then there are somewhat I guess more

06:14 level aspects of profiling and tracing that give examples of today and then again

06:23 to the analysis section And the next is a very busy one that you

06:30 if you look at it on your and you can probably make a little

06:35 sense of it. But the point essentially just to give you the impression

06:40 it is a very first the type and comprehensive tool that have support from

06:46 aspects of getting information about your cold analyzing the cold and then they're analyzing

06:55 try to make sense of data collected we'll talk about some of these suspects

07:01 um Tower is again a very comprehensive tool so we just want to in

07:09 introductory course to get to okay pay chance to pay attention and in case

07:16 use it beyond the course then you where to find more information and get

07:20 exploit more of the features that we do in assignments. So one um

07:29 I guess I highlighted that last time it, the data volume that various

07:39 give you can get overwhelming. So should both the conscientious of what tests

07:48 you use but them and also needs be pay attention to how much data

07:54 need to be able to answer questions are asking yourself about your code and

08:00 just shows that you know at one , if you look a profile information

08:06 not usually overwhelming. But if you to the other end to get trace

08:13 that means that also has the time into what happens in your cold.

08:18 been an about face, it can quite substantial in terms of the data

08:24 organ that you get. So this is just trying to make you aware

08:29 that what you're asking for consequences also but its output and then also has

08:37 for how you can actually get some information and all the data you

08:42 Mhm And there's just the instrumentation So Tower provides means for automatically instrumental

08:53 it. You're cold use what they the G D T or the program

08:58 tool kit, compiler instrumentation but basically your code um using towel and then

09:07 was about computers for the platform you're and then it does the instrumentation for

09:12 and that's I think the way we you to do instrumentation for the assignment

09:18 of course you can always do it . But then there was also ways

09:24 linking the code to various libraries that and specific features that are interested in

09:32 you may not get to the automatic that tower provides and then do you

09:42 a minimal impact on your code than may In fact used tools that does

09:50 the binary two then collect information from execution of their coach. And so

10:03 is kind of just a little bit to just get some a picture of

10:08 automatic instrumentation um works and when it to town so it's best at us

10:20 and analyzing of the so scrolled and it uses um it's kind of library

10:28 instrumentation capabilities and depending upon how you instrument with particularly instrumentation, then it

10:38 an instrument. The code that you okay used for the execution. And

10:45 is kind of just an example of compiler and I think that's what we

10:50 for demo later that just shows that up. Compiling is straightforward with you

10:56 your FORTRAN compiler then you use towel do the calculation and you get code

11:07 that according to directive such a This particular example says mp in

11:12 So that means this particular example happens be for somebody doing very code for

11:20 cluster using mp a message passing library the principle is the same. Weather

11:28 a sequential killed a parallel code of flavor and then you have various ways

11:36 controlling of instrumentation done. And here just a list of some of the

11:41 that you have for telling the compiler about the instrumentation to done and how

11:49 information they want to the output. believe that Joshua thought it's more about

11:57 of the examples and what is the marine. Sure. So yeah.

12:07 actually welcome to interrupted their comments and of the particular slides. Yeah.

12:17 and here's just an example of some the other software packages that helped in

12:23 case. But there rewriting the binary by the computer. So dining is

12:30 fun? And the stands for dynamic simply that has been around for quite

12:35 time. And then there are some were packages again for rewriting primary binaries

12:43 you can use. Mhm. And example on the bottom is again from

12:49 parallel told this the empire run um that's similar to estrogen for islam slur

13:02 that they used uh just first example assignment. Yeah. Yeah. And

13:11 is just to get an example. . Uh support an example. You

13:16 some of you do use it in similarity about you see uh in the

13:23 most codes are parallel. We'll get that later in the course. So

13:30 the measurement part is one is again do director observation by using probes that

13:39 checks on the status of the code various parts and then it's just um

13:52 and again depending on what approach you're . You can then collect different types

13:57 information but it's your control as it Director what information will have fine instrument

14:04 your code in that way. The one is the director performance measurements And

14:10 some ways if she they just use , they don't, it's instant in

14:17 very high level kind of an interact of the code is performing. But

14:27 you can do like with carpet uh also in terms of other packages that

14:34 support event based sampling or ebs that so on. A few strains will

14:41 using. So that was the two . And the other example I guess

14:50 the used to define events um begin the senate sample was just using the

14:57 set the clock before and after Figure out what it took between the

15:00 calls. Um, but generally there anatomically increasing and except as uh so

15:10 I was born with some of the timers they may re cept in case

15:18 call them. Someone has to be . It's not necessarily true that they

15:22 all um metrics that you collect depending the function we're using is monitoring with

15:33 and then you can do just for events in Nicole. You can this

15:39 that's atomic events as supposed to they timer for a routine that is not

15:46 atomic or Aloofness or something. So is more capturing specific events set in

15:56 cold. That's the example here particular how much memory for instance is used

16:07 particular pointing Nicole and then you can different what I'll call refer to

16:15 you can do it some proteins, can do classes, you can do

16:19 , you can do loops and sure , so different ways of doing it

16:24 so you should meet them. That's well. So, um, now

16:34 few examples of using power and send and then I'll give examples of puts

16:43 towel generates and after that I will to suggest a democrat to actually use

16:52 and it's just preamble a little bit um how to both instrument and then

17:02 about the random but then how to but um, tar collects about your

17:12 . Mm hmm. So as I , no. Well talking about the

17:20 blocks in the class with news against features of town uh, form and

17:30 essentially for providing profiling and tracing that the class here. They're using it

17:39 a compliment papa to poppy. And as a way of using party but

17:50 instrument the cold. So the difference it's not on the familiar is a

18:04 gives aggregate the information about the It has no notion of time.

18:11 it's just for whatever segment of code the entire code that's running. It

18:17 you the global or total picture of happened in the execution. Whereas the

18:26 also has the event in a What happens with the code and what

18:33 called at various points in time in cold. So that's why execution traces

18:43 substantially more data than just doing profiling your coat and I'll talk about the

18:54 example here. So but just on slide so under the profiling colony,

19:03 best to see a number of function in that particular code that was

19:09 And it tells you in this case it stands on top of the bar

19:14 that um the units that you see their bars as second. So you

19:22 control the resolution. For instance if want to use time, whether you

19:27 seconds or milliseconds or whatever the unit time is that is relevant for the

19:33 . We can also have other things number of instructions or number of calls

19:38 many other aspects of your cold but gives a total number of times spent

19:45 Time that is spent in this for in the 1st 13 on top.

19:50 this like you I case sweep Um No it's also on top since

19:59 . I'll come back to that and future slides. But you mentioned that

20:05 time at the exclusive is just for particular function. The unique pieces of

20:14 . So if it calls other functions of that time is not included an

20:22 type. But if you have inclusive , it would report all uh calls

20:32 by this routine. Nike sweetie. . And it's again the total time

20:40 all the different calls not just For car. So it just different instances

20:48 the culture and the team may take amounts of time. So it doesn't

20:51 you any detailed information, just the for the whole ground and in some

20:57 it sorts them nicely. So in case it's easy to focus if you

21:03 to optimize the cold, you go the ones most likely that spent uh

21:08 the code spends most of the time so there is just a little bit

21:16 the maxime in total. And as mentioned ready you can do it for

21:22 function for basic blocks for loops it threads and processes. We haven't talked

21:29 much of a difference between threads and but we will in subsequent lectures and

21:37 you have the whole range of attributes those particular scope that you can collect

21:45 towel using you and it it turned either Poppy or some other tool that

21:51 haven't talked about in detail but how would know what to use if

21:59 give it the attributes and scope. in the profiling there is what the

22:10 that showed and talked about in terms profile that's known as a flat

22:15 Um I guess it's because it's kind one dimensional in some sense, it

22:23 give you much insight as a set what actually happens in the code except

22:28 aggregate information where a cycle path profile you also the control flow information about

22:38 happens in the code. And then can also have uh to find special

22:46 profile. And I give examples of profile as well as flat profile.

22:56 so there is just another flat profile and in this case again it's time

23:06 I guess it's short here exclusively inclusive difference. And this is again another

23:12 plot. And uh I think there's much to more comment on that.

23:18 read they did for the other flat except it's just list a bunch of

23:25 calls. And again, this is parallel cold written for a cluster because

23:32 shows mp that is this message passing . Someone you can tell if they're

23:38 as to what that this system parallel . Mm And here is just an

23:48 of one of how uh you can tell and what instrumentation you want in

24:01 loops. And I think so, demo it later on. It also

24:06 your options in terms of what, much I'll put your want from tower

24:12 terms of using the for instance, verbals uh option. Now there is

24:23 no, just a different type of flat profile must in a kind of

24:29 level thing. So in this case a different unit of time microseconds and

24:36 metric was used at the time all in this particular case. But it

24:40 that this uh nope, what do make multiply matrices for this particular cold

24:47 was clearly very dominating forever the what spell. But then there's also

24:51 other loops. But yeah, that themselves don't spend too much time.

24:58 that means maybe these shallow slopes and other functions that perhaps this uh in

25:10 other bars to and I give another example of things met more based on

25:18 choreograph and start to decipher what And I think this is just an

25:26 of how to control what you want terms of and this is instrumented no

25:37 video town. And I think again will spend more time I'm going through

25:44 particular or related example. Um so more uh that's another flat profile.

25:59 instead of doing time in this case can also use composite measures, sums

26:06 are just in time. In this you want it, the user wanted

26:12 get the instruction rates or invested, the number of instructions and than the

26:19 it takes. But it's an You get the average for the whole

26:26 . Uh huh. And otherwise there no the next example is to try

26:35 get the call back and I think next slide will basically show you what

26:39 get in that case. Um so they call path profile in this case

26:47 gap and the fraction of total time and for this example in the barriers

27:00 and it's again exclusive. So come the inclusive example. Then I think

27:06 next slide or two. But so just shows you different things you can

27:13 out. But now here is actually representation of the consequence that's when you

27:22 get and have the call graph. it just shows in this case that

27:28 main ways to potentially do different calls the function one and the front books

27:34 five. Um and then how so control flow works in this particular code

27:45 what function is being called that we to call from Function one man either

27:50 four or two and then they are called function to then call function

27:57 Now that's the kind of simple example sometimes it's useful to get some understanding

28:02 what the control flow is as well how much time is spent in the

28:07 functions and the next line that shows little bit perhaps more typical for a

28:15 application as supposed to some simple kernel what things may look like and then

28:22 becomes obviously it lot more complex that of start to analyze it. But

28:30 towel can help you gaps. They sequence and for you the cold and

28:41 you analyze it. Um and here yet another example and I'm not so

28:56 this one, I think I have clock that is the inclusive time slot

29:03 profile. And so in this case attention to a few of these so

29:10 can show you the difference coming So here's one routine and it's cleaner

29:16 . one that consumes most of the . Here. They have active and

29:22 there is a bunch of other ones the ladder here. Now the next

29:29 it shows if you tie in the sex or profile, okay. Using

29:36 as the attributes with the code, using now the inclusive time. The

29:45 if function that was the one that most of the time if you just

29:50 at routine itself um this final most . So that means it's called somewhere

29:57 the cold and even though the majority a big fraction at the time for

30:02 whole execution is spent on that routine It's called by other ones. So

30:08 is kind of the whole application and if you go down here then you

30:13 this objective function down a bit and can see that it's the same number

30:20 fact as um this time exclusives or some sense from looking at both the

30:28 of inclusive and exclusive. You can fact conclude that the addictive doesn't really

30:37 any other function but there are ways looking at it in other ways using

30:47 . So I think on this side just extracted some of the thing to

30:52 the difference between exclusive and inclusive and kind of like the blue areas pointed

31:01 what I just talked about in terms the objective function and there are other

31:09 that uh that's kind of the opposite that this disc interest function that is

31:17 the bottom of the exclusive list or profile. That in itself doesn't do

31:26 unique cold for this function. But you do the inclusive timing. That

31:33 it calls a bunch of other So in the inclusive timing, it

31:39 Actually consume a good fraction of the time, more or less, 60%

31:43 the total time. But again, because it calls other functions to do

31:49 job. So it is useful to both, I would say so exclusive

31:56 clearly a good thing to when you to figure out while you want to

32:03 the effort to improve performance. But also sometimes useful to look at the

32:09 times to narrow the search for what look for. And this graph shows

32:24 little born or detail of. Now basically on the left column here.

32:30 have the next thing on the calls being used in this the application as

32:39 call. And we saw this descriptive that was at the bottom of the

32:47 profile. And see that exclusive time very small for instance, but the

32:55 time is definitely not small. And it calls uh one function space out

33:04 and then they call this objective function I highlighted a few times. Hearing

33:09 at these graphs and one can see that uh it you look at the

33:21 and inclusive time for dysfunction, they're . So that means you can,

33:25 you mentioned, it does not have embedded calls in it. It's just

33:33 so called that Yes, Dima can look at the numbers here in terms

33:40 see what happens if you look at discotheque is fixed function here and took

33:44 20 inclusive, 22 something um seconds believe it's the unit here and then

33:54 can see that corresponds to oh sorry if you add up certain numbers

34:06 you should get the right numbers So if you look at basically this

34:12 function here, the ad back and these two times here and then the

34:19 time for the space function that calls , they add up to basically the

34:25 time that you have here and you go through some of these other

34:29 you can decipher how the times are making sense in terms of what's inclusive

34:38 exclusive given the culture that they have the left hand side, this output

34:45 gives you the number of calls. so in this case you can see

34:50 this event function. Yes, called lot in this case about 180,000 times

34:59 um going through this or completing the town. So you get a lot

35:05 information about both the total time spent the function and the number of times

35:13 called. So that on the other and inspect time recall is not very

35:17 . But so you know you may have thought of it if you just

35:23 one instance of the regime that maybe not that critical because it's fairly

35:29 But on the other hand if you at the aggregate time and you may

35:34 to pay attention to it and when talked about a little bit about how

35:42 structure code to make efficient codes and didn't already are aware of. Its

35:47 calls are expensive. So for instance you do this many function calls and

35:52 time for function call is not the per function call is perhaps not very

35:58 . You may want to look at of reducing the number of function costs

36:06 this is just an example of what can get out of town in terms

36:10 statistics and getting insights of what uh happens in the cold. If you

36:16 the call yourself you may know it well but if you're given some codes

36:23 you try to figure out what goes with it, this type of information

36:27 very helpful to help you just so called. Um than some examples of

36:40 again it gives a time stamp in to what the profiling does and chose

36:53 sequence of events in time and what events takes for each one of the

37:00 . So you get do you remember first? It's like a shoulder of

37:06 difference between profiling and tracing and I'll you some more examples of what this

37:15 looks like but and this this is of what it does and this example

37:22 again for a more difficult situation than have done so far. That means

37:28 have a number of threads or processes system for processes. If he program

37:36 cluster is using FBI Processes on this for two different processes and then towel

37:46 truck, what happens in each process ? So you get basically tries to

37:54 per process and the time stamp when events happens for that particular process.

38:04 then you can get the global view having the traces for all the different

38:09 emerged. So towel manages all these for you and again, more complex

38:20 . Yeah. And this is just way off uh making sure you again

38:28 the trace as an output and so made hemorrhoids, I think this is

38:36 very good example but it's still there I'm going to have a better example

38:41 think next that gives you a little of the different aspects of output depending

38:48 what underlying software tell uses for helping produce uh both the profiler and

38:59 So in this case you can see the middle here kind of trace where

39:03 are different functions are color coded and this case again it's a pretty

39:09 So you have and processes and then can have as it said here,

39:16 also for in this case, notes the top bit mixture but also a

39:22 here. So you can follow things threads, you can follow things for

39:27 time and of course should have a high resolution time. The graphs are

39:35 somewhat nontrivial to um Interpret. So has to be again, again conscientious

39:42 what one is asking for in order not to get so much data,

39:48 can't really see what happens. And the top is kind of just the

39:53 type of information where again things are coded on their per thread on the

39:59 basis in terms of the parallel Uh huh. So I think this

40:08 what I have already I said well maybe a good as a reminder.

40:14 don't think I need to repeat it maybe go the federal cuts of

40:19 Okay. And then I think I leave it to suggest to the

40:28 I think that's the proper thing and there's time left then let me show

40:33 some more slight about visualization tools but it's because for them for them to

40:41 , so I will hand it to yes, so I will stop sharing

40:50 screen I guess. Yes, I take over I think. Okay.

40:57 , I should have done it. . Well well, uh well the

41:08 I'll be showing will be in context using now as uh that's for coffee

41:16 then I'll show some of the Examples how to use fire across and how

41:24 navigate through it to check the profiles 1000. Right? But first on

41:31 stampede to uh what you need is because you need to be on a

41:37 nodes to do all the experiments. you can do it on logging roads

41:41 don't do it. Uh but uh starting with using italian personally to make

41:50 that you load the model uh for and they close attention here. There's

41:58 two or three different versions of our on the campaign to uh and the

42:03 that we will be using is uh dot two. And that we know

42:08 doesn't have any problems. Uh working copy the other religions have had some

42:14 working with poppy and make crash sometimes make sure you use this particular

42:19 Yeah. Uh once you have done , make sure you uh that second

42:25 loaded there, we got uploaded. the first thing uh that you need

42:30 do and to set one of the were able for how that calls the

42:37 make violence. And the way you do it is by simply uh setting

42:44 that we're able to the right location the way you would know the right

42:49 is uh the make file is located the location pointed by the environment variable

42:58 south. I think it's simply do uh dollar tao that should give you

43:05 location and if you alert to that mm it should give you the list

43:13 that directory and it will show you make files here. Now the make

43:18 that you want to use is this that have intel O M.

43:24 In its name? Not the intel . I won we will we will

43:28 using intel MPI one because otherwise if use the until the FBI one,

43:33 will require your program to be an program which currently we're not using.

43:39 what you need to do here is the uh down make trial variables to

43:47 uh to point out that uh find so that you can do by using

43:52 command here. So now make vertical town slash the name of the Mc

44:03 . And once you've done that just make sure that correctly out here there

44:14 is pointing to the correct made Now. The other thing that you

44:19 to do is let's say what bobby metrics that you want to measure and

44:26 said that what what you need to is that the environment variable called down

44:33 equal to the uh about the event you want to measure. So let's

44:40 we want to measure uh single precision for For so I just said that

44:48 metrics. Environment variables about events. uh now here the example that I'm

44:55 is the matrix multiplication code that you've for your assignment And here I have

45:04 N equals 1000. So that means matrix multiplication, the total number of

45:11 are too and cube. It will too big operation or two billion

45:19 So now that you have set up two things that thou make five and

45:24 metrics, what you need to do to uh compile programs using one of

45:32 towers compiler rappers. So if usually you may do is use the compiler

45:38 gcc And so as your source code and the uh the output by

45:44 The the simplest thing you need to is replaced D C T with the

45:51 from doubt for the second pilot, all you need to do. Once

45:56 have set it up, you make and the metric uh environment variable.

46:03 question for me if it's okay. how do you know which compiler?

46:11 uses jesus? It uses uh intel from the uh from the uh make

46:22 lamps, you can tell mostly. . Yes. Okay. Yeah.

46:25 generally in the make final length when when you can figure and install it

46:30 it has to be with violent contribute when it is configured and installed generally

46:37 make violent contains the name of the as well. And it also convinced

46:43 of all the creatures that that particular file. Support with support,

46:48 It obviously supports the tv t the program uh or what uh some data

46:56 . It also supports the open and grounds as well using the make

47:02 Okay. Yeah, it's general Uh if the vendor compiler is available

47:09 the platform you're using, I would using the vendor compiler, it's

47:16 But no guarantee that the code will um more efficient in using the resources

47:25 Gcc. But gcc sometimes beats the compiler. So it's no guarantee.

47:29 , I said. But the starting that would advise to use the vendor

47:34 if it's available. Yeah. And case somebody doesn't know that Gcc is

47:41 gm you compiler. And if you to use intel compiler for c it's

47:46 I G P the gtc with And that would be using the intercom

47:53 . Good. Okay. So now , coming back to town. Uh

47:59 you use the style compilers, rappers compile your code. And when you

48:04 that, that basically in walks down thou starts instrument in your soul.

48:13 it starts with passing your code using program database, scared It has all

48:20 instrumentation calls inside your source for using internal module called how instrumental. And

48:30 this perform called the linking with all other libraries like party and open mp

48:35 whatever you're using. And then after linking of object file you get your

48:43 name, uh whatever you provided in command be provided, uh Math molest

48:50 house. Put my limbs you get instrumented executable. So this is not

48:56 simple executable. This is an instrumented now. Yeah. Now the good

49:02 but now is as you may have by now with my the democratic

49:09 you had to insert the puppy calls your source told. And good thing

49:15 pal as you may have noticed now you did not have to change your

49:18 . So that all the only thing that you needed to do was to

49:22 your soul with the 1000 pilots And that also comes with the uh

49:31 that once you have some file your , you don't necessarily need to recompile

49:36 thoughts to get data about any other events. And I'll show you how

49:41 works. But uh close getting back how to now get some data about

49:48 the event that we search for. research. Uh Happy single decision up

49:56 the event that we want to Now then uh want to execute your

50:03 . You use another apple from now called the tower exact underscore exactly be

50:10 with a bag with the miners capital . Plans. Uh And tell it

50:15 serial program when we will use open . P. Or mp I or

50:22 . Uh Can we need to provide with the tag of open NPR NPR

50:27 for now, since we're just dealing serial programs to tell him that the

50:31 program. Okay. And then just uh huh provide the name of your

50:39 executable and the execution generally, uh you would run your program but the

50:48 is now in the same execution you will see a new file that's

50:54 profile 0.0 dot zero dot video. this profile file, it actually contains

51:00 information about the event that we wanted collect uh information about all the functions

51:08 we have in the program uh for event. Now, the way you

51:14 this particular profile file on the console by using a console based profiler provided

51:21 taliban. Uh just spotted people and you simply call this people of profiler

51:29 the directory where these profiles are located this profile is located that should open

51:37 profile. Fine with all the information that particular event for each of the

51:46 that was executed in our program. in this program, I only enable

51:51 traffic matrix multiplication. I don't agree the interchange matrix multiplication, but data

51:58 have collected, it only contains the function the classic maximal and the initialization

52:05 function whether yeah. Now, here can see the clear difference between the

52:12 , inclusive and exclusive events. Now , exclusive events has refined the

52:19 it shows only the events that were by the specific function and not by

52:25 of the Children or not a So we know that the classic maximal

52:31 that was the one that performs the billion operations needed for the matrix

52:38 So we see the exclusive town For matmos here uh about a little bit

52:44 than two billion. Um with the down the main function calls the classic

52:53 multiplication. You see those two billion as inclusive operations for main function as

53:02 . Although we know that main function not actually refunded for the classic.

53:07 need to make sure what you looking and the assignment, I believe lee

53:13 to 2 uh measure the exclusive counts the dysfunctions so that you get a

53:22 idea of one of those specific functions actually going. So you don't end

53:28 adding whatever the other initialization function or other function might be doing.

53:34 Any questions of them. Now I one but lets the students ask

53:40 Yeah, If not while they may about questions, the one question that

53:49 to my mind was it would be to, and I think it's supported

53:57 the sky lake. What type of instructions were actually used if any.

54:03 when they used to be maybe it's 12 or 2 56 or adjusted scale

54:09 instructions and I don't remember exactly what property function name was for that.

54:16 think it's gonna stop the right We can get from Buffy a

54:22 Yeah it's called copies back the single . So if you want to be

54:30 to see what happens. Yeah, helps in trying to figure out,

54:35 know, the mama looks at performance efficiency whether it's just. Uh

54:42 Um Okay, let me come back of uh Yeah. Before I go

54:53 that, the good thing about how coming back to that now, I

54:58 need to recompile the goal. I need to set the cell metrics environment

55:02 able to collect information about a new . And now what I need to

55:07 is simply execute my instrument. Uh . And now should replace that profile

55:18 I had in the directory with the information about this new different events and

55:25 a currency, a single provisions But the that every single decision looks

55:31 much the same. It doesn't seem be much different sort of now at

55:37 . That's good. Right. But doesn't tell you how many uh instructions

55:46 issued on that type. Uh I think the instruction went where Uh event

55:56 available on the 60 years I Uh only tells you about the

56:02 Oh yeah. These are the instructions . Yeah. Yeah, should

56:08 Yeah those are instructions. So instructions operations apparently are equal in this

56:14 Yeah, that's when I saw the numbers and that surprised me. We

56:25 to find exactly what this uh that mean? What yep we got some

56:31 to. Yeah. Okay. Uh so uh that was only collecting one

56:40 event for a given execution. You also collect multiple events uh in in

56:46 single execution and that we can do setting the down metrics, environment variables

56:53 equal to a colon separated list of . Now, one thing I should

57:00 here is that if you're collecting multiple , uh you need to make sure

57:07 you're not ending up collecting events that not compatible with each other. So

57:13 that we saw a tool from barbie uh last time in the demo called

57:20 puppy event user. And that tells that too um event uh are compatible

57:32 each other or not. So let's again, just give a reminder if

57:36 Our target is obviously option about the one. Total Kaltschmitt is compatible with

57:41 other. It will tell you that are not compatible with each other and

57:46 be counted together. So just make you don't use any events that uh

57:53 are incompatible. And even if you up doing it, it's likely when

57:58 run the code that uh the dow . Well how exact command and throughout

58:05 saying that it's downtown here event. yes, coming back to how you

58:12 do that for compatible event. Uh . So just simply provide, like

58:21 give the SDR a colon and let's we want to do puppy total cycles

58:29 I know these two are compatible with other. And just that cell matrix

58:33 a uh colon separated list of these uh metric. And again, we

58:40 need to compile your recompile your Just do that exact and run the

58:46 . And now the difference is that will allow them right to directory in

58:52 execution directory and it will call those by the name of the events that

58:59 and that is collected but has not been directly and it has probably total

59:04 directly now. And if you go one of these directly, you can

59:11 the actual profile for that particular event you can uh read that by using

59:17 command line pilot policy problem. Now that's pretty much it what uh

59:27 need to do in terms of getting running uh as a background for now

59:32 get all the uh event information you how uh now if you want to

59:45 news bad across to get some uh we base profile. What you need

59:54 do is you need to go to uh that visualization portal and there's a

60:01 for it with hot stocks dot utexas e d u. And here when

60:06 go uh you will just go to hometown when you go to the jobs

60:12 . It will let you currently one my jobs running but it will give

60:16 some configuration operations options, which are the slides you can follow those steps

60:21 get one of your jobs running there sure. It takes a while to

60:26 a job for this particular VMC session the uh for the fact regionalization

60:33 But once it's running it should give a window that looked like this and

60:40 it loves into one of the compute . And here again, you can

60:44 your home directory or all the files are all from on the club.

60:50 , but here it allows you to the para crocks, G y based

60:57 and that's going to one of these again. So again, this is

61:02 of the profile that we just Um Yeah here again, you need

61:09 load the dow model. Mhm. it's a different ssh session for

61:16 You need to set all your environment and everything again, but we don't

61:21 it for now. We all limit town one. You all right.

61:26 rather than doing pete brock as we in the on the command line.

61:31 in the other session here you can do battle problem uh and press enter

61:39 that should open the barrel crop And so it tells you all the

61:46 about the uh the note that you're uh the call graph or the event

61:55 for your for your program and then smaller window here it tells you the

62:03 information about uh the uh the event we collected inamorata graphical format. Now

62:10 , this is one of similar to screenshots that you saw in the

62:13 You have the metric name on the . You have uh telling you that

62:18 you're showing you an exclusive values or , your standard deviation mean max and

62:26 . and since it was a single program, it only has this one

62:32 role that has no zero showing. . And this has been a problem

62:38 their visualization for the let your internet get some big money. Uh but

62:45 , so if you click on this no zero name, which is for

62:51 single thread, it should open a detailed uh, window that shows you

62:58 exclusive town for each of the function a graphical apartment, similar profiles but

63:04 a graphical format here. So here have the exclusive counts for the

63:09 Michael. You can also choose the events by going into options select metric

63:19 choosing inclusive. And that way you see the inclusive towns for your for

63:25 events. Yeah. And then you also go back and try uh Come

63:39 . Yeah. And the windows after which gives you two have 3D visualization

63:45 all the other stuff that you can comments prophesied. It depends on how

63:52 you have collected in your profile. example, I can show you for

63:58 another program that you may get at point as an assignment. Uh huh

64:05 , it won't give up anything and talk solution jumping out there. So

64:11 is one of the so that you be asked to profile. I believe

64:18 one of the assignments and it has lot more function calls as compared to

64:26 simple food that we saw similar to of the screenshots. Uh Colombia

64:33 So again, you see all the function calls here And the exclusive towns

64:38 about three single decision of. And looking at this, you can tell

64:43 average function is performing more work compared the other function in case you have

64:47 really, really complicated good, this you sort out uh the most compute

64:55 , not computing cancer, but the busy functions out of your complex.

65:07 any questions, So be a question the chest. Yeah. Way to

65:17 multiple tribe. Now, I see number. Match men, standard

65:24 Uh No, unfortunately, it doesn't you to multiple trials. You'll have

65:31 collect uh profile multiple times and copy from it unless I guess. Um

65:46 , to run things in the loop you run it and look it actually

65:53 the old profile. Uh But then got in the total lower average.

66:04 , but you don't get the highest grams or to speak for instance.

66:08 Yeah. Yeah, I think In that case you make a run

66:25 the Yeah. So what's the choice output formats for tell whether you then

66:38 have some analysis program runs on So you don't have to manually do

66:45 quickly. Can Right. I don't the command but I believe it allows

66:51 to export these profiles and PS three or some other easy to process

66:58 I can look at how that a question. Since we're asking to do

67:05 , there may be outlier executions. , but I can try to look

67:11 the command but I believe it's how do it and Yeah. Uh or

67:20 that you know, some other program is good. Yeah. Um can

67:26 statistics. Right? All right. was pretty much it for the

67:39 Um You know, one thing uh mention, it's not necessarily regarding the

67:46 but in terms of presenting the information your assignment report, uh with your

67:53 , you will like you will get of the scripts that I've written and

67:57 just goes through different combinations of uh events to get your data much more

68:04 . But try to uh extract the information from that script output. Don't

68:13 take screenshot uh of the output sentence the new report. People. That

68:19 make any sense when you try to the data. Try to extract the

68:23 information, put it in a table then analyze it. But one thing

68:28 would want to mention but in terms I don't know if you have to

68:37 any Things are doing the three day capabilities. I have not done.

68:45 you can Just use the 3D visualization take a look at it about in

68:51 words, since you're at it. show that various ways of doing

68:59 Yeah. Do more of a instagram instagram but multi in terms of

69:09 Right. Yeah. There's all these are in the better propaganda, interesting

69:22 truth. The right one. Right . The ones that I showed towards

69:27 uh and the flat profile that you call it. We can all presumably

69:32 pretty visualizations. Okay. And they more questions on course you josh.

69:49 someone just simply show a couple of three D plots that you can do

69:55 me. And the one question. table masked men have three trials for

70:02 party event repeated for two versions of maximal algorithm. Yes, I believe

70:09 . What are you expecting? Asking years until select information for all

70:27 events that are available for the ones make sense for those programs. You

70:37 agree it's a little bit of data but that's why we've try to provide

70:41 with scripts to automate that and this to write your own scripts to do

70:55 . But let's stop shouting at that . Okay. Uh huh. Mm

71:07 ground ones. It screams my name first one where I was yeah so

71:16 just slipped through on this so there's stuff related to what so yes I

71:22 so mhm mhm um tell the cornel yeah so yeah here's I guess things

71:35 you can control what it does and just want to the show a little

71:40 and yeah we were just shut up profile and and this one parents again

71:49 can control under the options, what call a graph that you can

71:52 You know. Met months is a simple programs it won't show much um

71:58 useful thing in here I can just where as well as you can choose

72:03 attributes you want on the three access this case for a three D.

72:08 on the cold and there is again things may disclose um what you want

72:21 get out of it uh you know the best way so they're just different

72:26 of getting a 3D visualization and that it's um quite helpful and figuring out

72:35 to focus on. So I just to highlight um variously departing options that

72:41 not give you much in terms of past that maybe later on. Either

72:48 there assignments or if you choose to a project maybe have something more complex

72:54 you want to be able to comprehend easily than and it's just a flat

73:04 . So I think that's what I just wanted to show in terms of

73:08 treaty both in capabilities. I don't I have much more except there's a

73:14 slice about some of the other tools how is using. So if you're

73:19 in some of them then look at last few sites and they will have

73:26 to or you can get more information the specific tools or you can go

73:31 the tile website there, the crowd landscape garden that for us more pointers

73:41 various suspects account. Yeah, so have yes I'm tracing is something that's

73:48 sophisticated depending on what tools to use um in terms of parallel programming they

73:57 actually quite useful because then you can things on there also preneurs and portrayed

74:04 so you can have much more idea what is the slowest part of what

74:12 or process is performance limiting but of kind of the more decrease of freedom

74:22 you have your code in terms of , processes and um communication routines and

74:30 parts on us to be careful in what we're asking for. Two not

74:37 in order to make visualization tools being to extract or expose what you hope

74:46 find but the tools are generally pretty job taking Multivariate data set and turn

75:00 into something that can represent it into whether with this screen format even as

75:08 , the three D plot, you in the end, the third dimension

75:15 kind of amazing and you have to and finesse. Okay, I will

75:23 there and see if there's any Stop sharing screen, oh, stop

75:40

-
+