© Distribution of this video is restricted by its owner
Transcript ×
Auto highlight
Font-size
00:02 So, um all right, so is what I'll talk about when it

00:08 to Got an elimination first, give just a couple of examples where it

00:14 be used to him a little bit and then go over the algorithm.

00:20 fairly simple. I go them so you should help be able to recognize

00:26 of what I said. And then talk a little bit more about how

00:31 kind of look at the estimate. know how much work it is,

00:34 part of the reason for that is understand some of them or computer science

00:38 aspect off constant elimination, or more less how to get the code for

00:45 the competitions to be efficient. So why it's in America. Talk about

00:52 optimization and some of the, I pitfalls in terms of, you

00:56 medical properties. So, missus, is from the book an example whether

01:06 , uh, how a system a may come up in terms all doing

01:12 type, uh, simulations or estimating flows in a net recall of,

01:20 this case, your sister's for the net berkan. Hopefully, most of

01:26 have heard about courage, of slows arms laws and again, some basic

01:31 course somewhere. But, uh, of stars is about covered on the

01:38 , then figuring out the the both drops are in the years old slow

01:42 figure out what they're both these troops . And, uh, a

01:47 So more precisely kind of looking at particular case, you can assume that

01:53 different current values in these circles are or circus indicated by is kind of

02:04 . Um, so does say my of circles. Um, so there

02:09 the current that's going around. So Monets, current is X wondering,

02:14 you follow it and go around. ex fund goes in the your sister

02:18 seven or arms and I goes in Teo and 40 met cetera. And

02:24 it comes back to the battery power of supply in this case. So

02:28 you add up, all the resistance in that particular loop, you get

02:31 number 15. So that is just some of the number of your sisters

02:36 the circuit for X one. But , um, the the sister was

02:43 . Arms is only subject to the in text one where it's the two

02:47 . It's subject to currents, both . Uh, next to so then

02:51 need to also add in the impact the current in that second kind of

02:59 for circuit and again. So you minus two x two for the current

03:04 to goes through the the sister of arms and similar, you have to

03:10 the effect of the current X three this sister off strength six or so

03:17 kind of the first equation. And so that covers the whole first loop

03:22 the X one. And in that , the battery power is 300

03:26 Basically, the cold voltage dropped difficult all their sisters. We respect the

03:31 to get the 300 on. Then you look at the other three loops

03:36 this little circuit, then there is explicit power supply or battery or

03:43 So in that case, all both these drops again, the Crestor

03:49 . That's up to zero. So is just a curse of law arms

03:53 and plug it in and go through different circus. And then you get

03:56 system of the questions that has, , four circus for unknowns in this

04:02 . Um, and then you try figure out what the actual Kearins

04:07 So then you have to solve this . So this is just a example

04:13 fairly simple how our system of the may Right. Um, there's another

04:20 that, um there's I'm not going go the details in this case because

04:26 gets very complex and understanding the physics the electromagnetic. So But you can

04:31 out this clear the radar image that particular the military does, figuring out

04:38 the signature is off aircraft. So and trying to make sure that on

04:45 left stealth aircraft that I'm sure many you have heard about that. They're

04:49 to make them invisible, though for . So you tried to look

04:54 in this case what on the greater are and try to minimize it.

05:00 in that case, you also got very large systems of equations for doing

05:08 for whole aircrafts, and it depending the method you're using, and then

05:13 may end up what people call the system of equation. that means the

05:19 Uh, A that to people right have non zero enters pretty much in

05:30 location of the Make it that this from sparse matrices. Some many of

05:36 may also have heard about or In which case, a lot of

05:40 interest in The Matrix are known to exactly zero. It's not just so

05:47 to be, but because of the on the underlying thing, Then you

05:52 tell that lots of them are zeros then you don't represent them. And

05:57 towards the end of their course, will be able to talk a little

06:02 about sports major cities. But for , we just consider major cities

06:07 Then stuff means pretty much all the in the major cities are non zero

06:14 should be treated as non zero. , so, uh, and speaking

06:23 , I guess that's a side note terms off this raid are scattered in

06:29 genetics problems. Uh, when when worked in industry, the computer company

06:34 to challenge to try to do a job. But solving this for their

06:40 first kind of self aircraft on and does the job waas to solve this

06:46 of equations on the computer systems at time at the which I would say

06:52 most powerful computer systems at that time be done in less than eight

06:58 Yes, I have been around for time, but still it was recently

07:02 resourceful computer system. So and the of the way than the Department of

07:08 and they're contractors worked is that every test their own discs pack so they're

07:15 amount or dis back and them to their things and then daily. And

07:19 have to finish the job with in eight hour shift. Otherwise you failed

07:23 . You could not deliver in their , too, and in this

07:28 the systems were best of a few . A few 1,000,000 times grows and

07:34 . It's quite large. That's not and anymore. But it was large

07:41 20 years ago, So anyway, you can solve this formally, you

07:49 write it down, you know, tell. Yeah, multiply both left

07:53 right inside with a inverse of the A. And then you get this

07:58 the identity matrix for acts. So and some formula. Just a inverse

08:04 be. The problem is computing am is usually not what you want to

08:10 , so you want to use some method to find the solution.

08:18 And there comes in two flavors. methods and gas in elimination. That

08:22 will talk about today is one of methods on. And then there's a

08:27 it reactive methods that generates essentially sequence approximations to the solution. And at

08:37 point you think that the approximations air enough when you consider yourself down direct

08:42 does not generate the sequence of So you basically have very little insight

08:49 the solution until it's all done. I'm in a you nose off it

08:56 method. Okay, that's all It will cover that towards the end

09:02 the book. It's one of the book chapter, So for now I'll

09:07 about today Yes, from one of direct methods that is, the girls

09:11 elimination method. Gaussian elimination. sometimes it's referred to as l

09:21 The composition on as I show you it works. It should be probably

09:28 . Why that is the name and in your comes from. And gas

09:37 elimination, as I said, the most common of these methods on

09:43 frequently use, but it has certain issues. So one has to be

09:47 of careful when you use it and how you can trust the answer,

09:52 is again part of the theme for class. Can you trust the answers

09:56 how do you know if you can it? So now there are some

10:00 methods that will talk about later We'll talk about that again. Valium

10:06 Because one of the procedures used in Eigen values makes use for some meth

10:11 household the transformations. There's also something given salutations. Um, which will

10:19 may mention tour see again later, of the course. But the difference

10:25 these two methods householder and givens is numerically much better behaved. So one

10:33 need to worry as much about whether got there is some a good answer

10:38 not, And that's because there are . What is mathematical unitary transformations and

10:48 turns. That means they are kind scale preserving, and the scale preserving

10:54 will be obvious when I give an of what happens in certain cases,

11:00 ghost in elimination. So basically, means you have a problem where numbers

11:05 in a certain range, and when escape preserving, they always stay in

11:10 range. With all this or let me collect your sister. So

11:21 is one case where it says on here, when gosh in elimination is

11:27 and safe to use it is' when are symmetric and diagonal dominance or diagonal

11:35 simply means that the absolute value on elements on the diagonal of the Matrix

11:43 at least as large as just some the absolute values of everything off the

11:50 . So it's kind of dominating the because it is bigger than the sum

11:55 the elements of things that are off diet in absolute terms, and why

12:00 important will become clear in the next slight. So when you have this

12:06 of the Matrix, then you don't need to worry, and you can

12:10 the simple procedure that goes into the is and you can trust the

12:15 What you get on this line business a little bit on the history and

12:22 Scott credited with the name of the . But it's it's pointed out it

12:27 not the first had the idea how to solve the system of equations in

12:30 particular manner. But I guess, , the mathematicians couldn't agree on.

12:39 know what, So they gave wrote about it and documented it.

12:44 , um, the honor of being for this month. So now,

12:52 few sights about given example of how scouts and elimination kind of things

13:00 So there's an X general example then system of in equations with and unknowns

13:08 the coefficients. In this case, the Matrix A are the A with

13:12 sub script. Uh, the first is for the row, and the

13:17 subject of A is for the Uh huh. So you can write

13:23 out for some stop or you can it compactly as you turn on the

13:28 as to some of the eye days Director X for each one off the

13:33 in the system and then just in merited for all the different rules.

13:41 now a little example. That reason it on a slide of invisible.

13:50 here's small system for questions and foreign . And, um so how many

13:59 their gods and determination? So All , so anyone want to tell me

14:06 it works? Another No, Exactly. So protecting what I did

14:34 this particular side, I used the goal as what's called the Hippodrome.

14:43 as was just said, then you your multiple suitable factor of that

14:49 multiplying that drawer the paper job with factor and subtract it. Then,

14:57 another one, you pick the So when you use a month multiplier

15:03 the first job, the entry in case, the first column turns into

15:09 . So since they're different interests in first, I call him for the

15:15 grows. You need to use a multiplier for each role in which I

15:18 to eliminate things. So in this case, so then we look at

15:23 second roll as the number 12 for one. So in order to get

15:30 coefficient zero for X one, you the multiplying the top roll with two

15:36 subtract the top row from the second and then it becomes zero times x

15:41 so that into disappears. But you to do for all the other columns

15:45 the right inside as well. So we look at the second column and

15:50 second row, it started with the . Eight. Now we took,

15:56 , the multiplier or two for the row and then subtracted it from Nurse

16:01 controls the two times two is so it's subtracted that from center was

16:07 minus two minus to times two is four and subtracted. It becomes a

16:14 . Force of the minus eight that originally now becomes a minus four and

16:19 can follow the same procedure, and can check it out. That basically

16:23 new interest in the second door uh, whatever is originally in the

16:30 roll, minus two times whatever the is in the top rope and

16:36 If you go to the third role , the numbers three in the

16:40 and there was six on top, you must apply the first drug with

16:45 of 1/2 and then 1/2 times So that works out. So then

16:50 then. She also becomes zero but kind of the procedure. So once

16:54 gone through eliminated, the coefficients are to proficient for all the roads except

17:03 first row into zero. Then they this, in fact, smaller system

17:10 now is the little three by three embedded in the original four by four

17:15 . But now that three by three now has just three questions and three

17:19 nose and that's still good enough to out three unknown. And then you

17:26 , proceed. So then, um on top is exactly what it was

17:31 the previous light. So now your with the remaining three you creations.

17:36 in that case, not it was four in the first door, minus

17:40 in the second. So if you the first job I free and then

17:48 , then becomes minus 12 and Oh, and then you I guess

17:53 was minus. So you have to the sign rights and then you basically

17:57 zero for the second and the third . Shouldn't the third robe,

18:03 in the second column and then with bottom row, the fourth throat?

18:08 the numbers should be basically half times and the proper sign you get rid

18:13 it that SEPTA So now you're in two by two system, and this

18:17 the procedure that you go on it doing it on eventually is your

18:23 This system Now that, um, kind of ready to do something

18:32 So this is was known us forward , because we'll kind of go

18:39 Um, and then you get to point. So anyone sees something interesting

18:44 this particular, uh, last sip equations here? Yes, right,

19:00 . So the last equation is just equation and one knowns. There's tribute

19:04 find out what the value next four . And once you have X

19:08 they can stick a X four back all the other few questions. And

19:13 you go kind of backwards from the , where you up? So if

19:18 so for X four in the bottom , you stick it up into the

19:23 , drove from the top, Then get an equation with only X three

19:28 , and then you can solve for three, and then you kind of

19:31 back, and eventually you get X . So that's known as the back

19:36 . Proceed. So something this, Ah, as pretty much I think

19:46 I said. You plug it in you'll get the value. So and

19:52 question on how this procedure works. is the basic Go see an

19:59 And typically people may also add without that I will come to you in

20:04 second because what was done in this consistent. They choose sort of the

20:13 role as a pivot role, as system gets smaller and smaller, but

20:19 the first role in the current system the concept of unknown. And then

20:27 this lights, there's just that the for figuring out how to up that

20:34 Dodgers in The Matrix A and hot update the right hand side. That

20:39 a forward part. And then the substitution part is two successive this all

20:44 unknowns as you go back. um, you look at this

20:54 Soul once were kind of done with forward elimination part. The Matrix A

21:03 been modified because he changed the matrix . But the shape of it is

21:08 upper triangle. Um, matrix. we're done, So this is actually

21:12 you and the value there composition. an upper triangle, um, matrix

21:16 it just the consequence off executing the elimination on the mate that gives you

21:24 you so and then we have a that is that the We're triangular

21:33 So what is that? The we're The Matrix. The lower to Wrangler

21:37 is kind of nothing but multipliers be on the pivot dro in order to

21:47 entries in columns into zero. So the first column in this l.

21:53 . Matrix and Angela Tear as kind estimate success him out. The virus

21:59 used for the rose below the diagonal I was unique in principle, at

22:06 a unique multiplier for each row. then you keep developing it, and

22:12 you got so one saves the's multipliers in place off this matrix.

22:20 so one worries about storage and don't need the matrix a anymore. You

22:28 change the matrix A. So that's upper trying the part of a becomes

22:34 and then you store them up the in place of the original interest of

22:42 make. So that's what it's thick formerly than the multiplier is to kind

22:48 use, for the people are always . So the ones on the diagonal

22:51 also kind of natural. So that's way you get both the end and

22:56 . And then you can also verify you take the proper values. Says

23:03 had them remarked upon them all together get the a bit? Yes.

23:14 , so, uh right. So what they were. So the first

23:20 , Um, this was the first , right? So that so in

23:25 case, the first column of L be on the top row of

23:31 The second will be two. Third be 1/2 and the last kind would

23:36 millions. One What questions? Any questions? So it's so in that

23:49 , if you don't need or doesn't the matrix A, then you can

23:54 the store, you and L, you don't store the one because you

24:00 values on the diagonal. So you of know that the diagonal values for

24:05 one so you don't store that Otherwise wouldn't eat additional, but by definition

24:11 one. So you can just you have to explicitly represented so And it

24:17 just feeling the strict lower triangular and is the, uh, portraying

24:22 including the diagonal, and, you , smaller matrices. Who cares?

24:29 it's I said, if you have that are a 1,000,000 by a 1,000,000

24:34 is what I tend to the 12 . You don't necessarily want to store

24:41 kind of twice. So then it serious in terms off memory. Use

24:49 . So that seems that was um all right, so next,

24:57 much as I said, you workload or trying to understand the operation

25:06 , it's a good thing, but not necessarily determining performance, but it

25:09 a good thing to understand how much is involved in doing the competition.

25:16 with this slide, I am, , developing a little bit what it

25:25 . So the factor ization is the on the matrix A. That is

25:31 the gels and the youth. That's factor. Ization point. Um,

25:41 that the big part of the factor is that, um, go back

25:50 , taking this picture, um, successively work on the submit disease.

25:57 start with baseline, um, updating n minus one by N one is

26:05 matrix Protect First role in this case using this paper and then they find

26:10 the mouth, the fires, and basically one for each row. And

26:17 it's updating all the matrix coefficients that then an N minus one times and

26:24 one. Make it so Look at estimate here. That's what it

26:31 So after you have, that's whatever case column. Then there's n minus

26:38 by n minus K sub matrix that to have his coefficients updated. So

26:44 they go through and add up all , thanks and manipulate expressions,

26:50 You get to something that says that operation count in term assuming that this

26:57 cost one and multiply cost once. the number two comes from the factors

27:02 at dinner and one for multiplying and following this expressions and you get the

27:09 to divided by three if you have going up. So basically, it

27:14 , uh, it is cubic in matrix eyes, if is an end

27:23 and make and that is important to not only because the tests you how

27:31 thinks the work grows with the size the Matrix. And again, this

27:34 that if you have really giant then it's serious work to actually do

27:40 factories issue. But that's only one of it. Um, now if

27:48 look at the other part, that also when May, in this

27:55 implicitly did the factor ization through this elimination step. Also worked on the

28:00 10 site, so they work on right hand side is cheaper. So

28:07 for each column me do eliminate, just sort of work on the column

28:14 is only one, as supposed, a square type. Yeah, so

28:19 means if you add it all it's something that is an scored

28:25 not in you and the same thing the back substitution. So the expensive

28:34 ISS to modify the Matrix or there a factor ization of the matrix and

28:39 and then use whereas then applying or with the right 10 sides. This

28:48 cheap and that's important for how things actually being done in practice so many

29:01 you have not just one right in , so the right inside may best

29:09 , uh relate to, say, loading of a structure on. You

29:14 to figure out how the structure flexes different kinds of note, but it's

29:19 same structure, so that means it's same matrix. So in that

29:24 you only need two. Factor it , and you don't need to factor

29:27 for every workload that you're Oh, off the structure that you have.

29:33 that's why codes typically work by separately in the Matrix. And then it

29:40 the l and the use in doing forward and backward, solved on one

29:46 multiple right inside. So structurally you to math libraries, you find it

29:53 ization routine. And then you find is known as triangular solvers because,

29:59 , L and you are trying them . So things are structured in terms

30:04 libraries that you likely to use in of again factor station and trying.

30:10 soldiers and the triangular soldiers are much in terms of work. Then the

30:19 ization port. Any questions on Okay, and I'll come back to

30:34 in a little bit. So a little bit when working with us

30:42 elimination, Kangol role. So here's little example for the usual thing in

30:50 . Absolutely. It's a small So we have now a little two

30:55 two system of the questions and that trying to solve it on following the

31:00 I just described on you take a of the first row and subtracted from

31:04 second row. So that means to um X one. In the second

31:13 , you have to multiply by one absolute on the first row and then

31:18 . So you do that than a for X two disappears and coefficient for

31:23 two becomes original one. That was the second question. Minors one over

31:29 Times. Extremely. I was in first equation, and then the same

31:34 for the right inside that comes to is one over absolute now and then

31:42 find X two that is more or one and then because you get to

31:52 one over absolute, absolutely slow So one over absolute is large.

31:59 if it sufficiently large to is very compared to one over emption on

32:05 you kind of lose it. So one is basically one over absolute over

32:11 over absolute that disrupted so and then plug that in into, um,

32:19 second equation of the top equation and find out X one more or less

32:24 out zero. And, um, some specific numbers. Some you take

32:32 with episode being what, 10 to minors. Nine. I think I

32:38 it on this. Yes, on a limited position on the

32:43 And so if you have 16 bit for doing the addition of Septra

32:50 Yeah, take this, um, any first to do the proper representation

32:58 terms of normalized in this case, notation and decimal numbers of binary

33:04 So a 0.1 on and then times to, uh I should,

33:10 one over absolute to tend to the . And then you had the number

33:14 and we were supposed to compute the to minors. One over, Absolute

33:19 so then, to scientific notation is times 10 to the one and then

33:27 being able to subtract the numbers, need to remind them, so they

33:31 to have the same, um, in terms of the base that they

33:38 to be shifted so they can get the same range. So that means

33:44 too, uh, gets shifted down whatever the ninth of 10 position in

33:50 thing. So then you do the and you get something that is 0.0

33:55 a bunch of nines. And then eight fortitude. Now then you come

34:00 and round it and then you get to having it as that epsilon in

34:09 case, I did it, the opposite signed by best one over

34:13 minus two. Instead, to get a good number. Then you're in

34:17 , just get one over absolute which started. It just shows you the

34:22 , the number to kind of dropped . You lost it because off the

34:28 position, your heart and how small absolute waas. So that shows you

34:34 to get this result that they had the previous slide, that X two

34:40 one and you got X one to zero, which is incorrect. So

34:46 gives you the wrong solution. So is the correct solution is, in

34:51 , one than what? No. . So now if the the question

35:03 have been written in the opposite Ah, Now you go through the

35:09 procedure, and then you got the that is at the bottom on this

35:15 . And if you saw the one one, you actually get the

35:23 So how do you know what to ? And how do you?

35:28 this was easy to fix. You manipulate the system and look at them

35:34 going to verify what's correct, but so this is an example. The

35:39 1 when he did when the top things kind of lost position because the

35:52 we used the elements of l was over absolute have a very large

35:59 And whereas the system of equations are of, the numbers are one or

36:05 coefficients for X one x two is , except for the absolute part is

36:09 small. So all the coefficients and system is between zero and one.

36:14 this case, where issue When you the gauze in elimination as the system

36:20 written, you got very large number you got one over absolute absolutely

36:26 So things kind of blew up, that's why you're lost position. So

36:32 the thing with God's an elimination in 84 is not scale preserving. So

36:40 get it to become skate, preserving kind of need to do what's done

36:45 the middle thing reorder the equation. so what is in fact happening,

36:55 up instead, off selecting the first us the system is written. You

37:03 the second roll as your pivot and that's known as partial pivoting.

37:11 , too. Yet Gostin elimination to stable and behave in American behave well

37:19 of just successively picking. The next is to go through the different stages

37:26 the elimination process. You go and in the column that you're trying to

37:34 all elements into zero except one on pick up the rule that has the

37:42 absolute value to be the pivotal. then you use the multiplier of that

37:47 and subtracted from all the other Because if you pick the lard,

37:51 role with the largest value in magnitude the coefficients, all the other ones

37:57 be less than what because you take is original coefficient for that variable and

38:04 by the largest number. So that's definition, make some lessons. So

38:08 it becomes game preserving. So that what most software for gas in the

38:16 does. They use this scheme that known, this partial pivoting and goes

38:22 searches and for each column you want turn into zero. Except for one

38:29 the questions. Then you tried to the door. Just elements or things

38:34 all those things. Multiply one numbers the range of 0 to 1 My

38:40 also looking in the complete submitters and just stop with the column. But

38:46 is usually doesn't buy you much, most soft, I guess that's

38:54 No, that's, um, in way, um, is all fine

39:03 in terms off performance, in if you work would say deep use

39:13 any form of kind of streaming or architecture. That is a problem because

39:25 things are become data dependent. So each step in the gaps in

39:31 you kind of need to do research you can't just proceed blindly and order

39:38 and stream things because there's a conditional and the computation, so it becomes

39:46 pain. The other methods that the doesn't talk much about the household.

39:53 transformation? Yep, Use a little trick. Kind of. So it's

40:02 data dependent. It's unitary transformation without dependency, so things can be scheduled

40:09 stream and that that sentence you have much better chance of getting good performance

40:15 don't have to worry about conditional And the question so tomorrow from this

40:32 gums elimination just is a standard Um, there's simple test one can

40:38 and figure out if the things without works by using diagonal dominance in symmetry

40:43 I got a dominant is enough from . So that will last you to

40:47 things that I independently, Um, , yes, you can fix it

40:52 doing the partial pivoting, but then should seriously, in my view,

40:57 the other methods that can. It's much amenable to streaming like household the

41:03 quite see and just us in this . But it does. Ah

41:10 and one interested. Instead of selecting pivotal, they actually do you,

41:20 , Alinia combination of all the potential drills and use that linear combination as

41:28 patrol. So you, instead of a a video, your computer and

41:32 you do it in the right it becomes keep preserving. They

41:37 too. In terms off operation counts the factor. But the factor to

41:44 can Certainly it's safe in America to it on and for certain architectures you

41:49 more than get the time back that takes to do the extra computation.

41:57 , talk a little bit about that . Part of this is not in

42:01 book, so this is kind of little extra for if they're computer science

42:06 . So I thought it might be to get a little bit of the

42:12 aspect. Last time I mentioned this kind of for one more or

42:16 That that's an elimination for about 20 was the only competition used to try

42:23 rate How good your computer, what's terms off or relevance for scientific from

42:31 applications? Um, people had opinions whether I was a relevant measure or

42:38 , but it waas and this is thing that get in computer vendors bragging

42:42 , you know, a powerful computer have, or how many systems that

42:46 the shell that was on the stop less than all of that. It

42:50 been extended about 23 years back by dealing with farce matrices but doesn't send

42:57 about 20 plus years. It was , um, for dance. Make

43:01 she's and I guess, a little . Um, it says the connection

43:07 there was a company was no at different name. But the product the

43:12 the computer reveals was notice the connection and we on the very first top

43:17 list that was generated. We also the very first, uh, top

43:21 on the list, the number one that was powerful computer on the most

43:25 for this post that and that was day start doing distance matrix solvers.

43:32 thing now to come into code um, a Z Paris, another

43:40 interested in it? There's something called HPC Challenge that has a bunch of

43:43 codes that both vendors and users try play around with and figure out their

43:51 systems. And I go to the and away from this is an extra

43:55 that list on my point for this simply to show you that if you

44:01 things kind of right for X 86 , actually, six instructions that intel

44:09 the, um computerised. Then the of peak you can get for doing

44:17 in elimination or matrix multiply DJM. matrix market by another position?

44:23 for most systems. Very close to . This at least you know,

44:27 was a 90. 95%. So means things are you don't miss many

44:34 you're complaining about. Use every cycle the machine if you write your

44:39 So I talk a little bit. why're, you know, coming back

44:42 the workload is just my doctor is to try to understand how to structure

44:51 . Um, so I'm sure we the, um um for the guts

45:01 elimination, they figured out the operation in terms of Adam of the Pie

45:06 to rank you over to me. . So in that case, it's

45:12 one matrix, the A matrix on X, and the bees are just

45:17 end, so they're much smaller. that's not so much an issue.

45:23 , um, but it's a little , um more. Not much,

45:28 it is a little bit more complex just the simple matrix multiply coat.

45:32 if you look at making small they have this three matrices A,

45:38 and C C was eight times a on, and then you have the

45:43 counts from it multiplies also to in , or it is too. Thank

45:49 . But if you take the racial this case, them between the operation

45:53 on the data sets ice is you that? It's on average also two

46:00 number three operations per matrix. um so the relevance off This is

46:09 . So how many of you guys writing computer architecture class? So even

46:20 you haven't to probably know about the of cash so so, um

46:30 computer systems, they have the hierarchy memories on today. Most system has

46:39 levels of cash, and then they the main memory. So they that

46:44 to start finishing main memory. And , as you need data, it

46:49 loaded into the memory hierarchy. On idea is the reason it's a

46:54 America is because fast memory is so Main member is designed to be

47:04 and lots of it on. So ocean or getting benefit from a cash

47:10 that you need to have data If you don't have any data

47:14 cash is probably would even slow down performance compared to having no cash.

47:21 to get benefits, you need to able to have data every No.

47:27 why this workload versus data sets Isis important. And that's why I didn't

47:32 through this in this lecture. So have the ghost in elimination. There

47:39 kind of an average off, and operations per matrix element, the same

47:45 as in matrix multiple. So since do on average and operations primatrix

47:52 if the code is structured right, maybe you can work out of cash

47:57 not out of May memory for most it. And then you get the

48:01 of is determined by cash and not main mint. That's a little

48:09 uh I know, um, lot people you know, love,

48:14 Nothing. Colds and vitamin and others of scripting languages, too. This

48:21 the sliding borrowed from, uh, my T class. Um, so

48:28 , it shows a little bit Much of this is about, um

48:33 . Their combined versus, um interpreted but is also in terms off.

48:39 glorious making underlying the point is simply in this case, it is about

48:47 factor of it more than 1000 5000 taking the naive code and taking it

48:54 to my school. That's not the off 10 2030%. It can be

49:01 factors or 100 in the house. this is thing I want to wear

49:06 . Make it world and talk a bit about now specifically about you,

49:11 make a six month supply what you . And it's very simple what you

49:15 to do in order to make things . Well, question So anyways,

49:24 came back from this. A set this is the typical simply thing,

49:26 this is that's a cartoon. Assume main memory and one live with a

49:32 memory called Gash on this light. the next few sizes and I guess

49:38 little bit calibration in terms of quantity , not just qualitative. What the

49:45 is in terms off. Say how or fast it is the access various

49:53 . So level one cash. The is typically one or two cycles,

49:58 maybe at most, two year, cycles away from the functional unit.

50:03 know, add a multiplier or But then, as you go for

50:07 way, it's the distances further and the bad grits of those memories are

50:14 . And, um, this is little bit of, um, showing

50:20 numbers. And I think the next through a little bit here in terms

50:27 sort of modern things that the typical module that could use in PCs or

50:33 . Today, each one has a about 2025 gigabytes per second.

50:40 now you tend to have more than . Memory channel Depends on your

50:43 High in the system. They have or eight and memory channels, so

50:47 cannot apply it on that thing. it's a four or six.

50:52 no. You should take your Matrix supply here and figure out what the

50:57 Countess on. So you have the very was A B and C and

51:04 generic expression, and then you have address is you need to fetch the

51:08 figure out where pass it into the to memory to get what you

51:14 so it made it very simplistic. everything 64 bit and figure it out

51:18 lot. And then, if you a month, Accorsi, you must

51:24 . Today in this case, is 64 threads To make something simple,

51:29 best of shows you need about nine 10 terabytes per second data right to

51:36 the computation. That's what it takes actually be able to use the functional

51:44 100% of it and has a said wide gap between 112 gigabytes per second

51:55 nine terrible. That's a Jewish So with that, how come you

52:05 95 98%? Oh, peak performance there is no way if this

52:14 and that's all based on being able get data for you. And that's

52:18 it comes in that the workload is the on average of do you number

52:24 operations for each element fetch. So the way you can get very close

52:30 peak performance. If you do that , and this is just a

52:33 is example that from all computers, depending on our code was structures.

52:38 is basket from nothing to you. 1500 not 5000 but it's not too

52:47 . So here's a little bit. point? Actually, it does on

52:50 do in order to make this thing . And it's fairly simple, so

52:56 Sure what it is. So you know, started with the Matrix

52:59 because again, it doesn't have, , his uniform computation. You do

53:03 same thing for every matrix element of to have a diminishing set that you

53:09 in everyday composition. So matrix, your old time to call in you

53:14 Ah, nickel matrix month deploy You get one element of the product

53:19 . Um so venous the dukes on , trying to figure out what happens

53:25 . So take a little bit. first, in this case, the

53:30 low, the role of a and you need to do in an element

53:34 see, Then you also need the The column will be. Then you

53:38 actually do the in the product of or any times the column will be

53:43 is that K loop on. Then get to see that. So I

53:48 figure out, you know, What this thing you in terms of memory

53:53 and arithmetic operations, It's a generic , so definitely that's it too.

53:59 you. On kind of magic operations block the volume on the add and

54:04 20 in the product. So if you look at the number of

54:09 references, so you go back look at so the b solve my

54:18 compute um se a row off. ? Then you start for the first

54:25 in the role of see you get first column will be moved to the

54:29 element to see you need to the element of be. And when you're

54:34 of go through and one row see, you have noted the entire

54:42 and then you need to move to next throw off, see on.

54:45 the whole thing starts all over So if they ask you, you

54:49 a ones on and you lowered. as many times as you have

54:55 So basically he gets loaded and Total number of loads for the thank

55:01 . Because any know, the entire take and a is only loaded one

55:07 is only loaded one So this Find gap, the number of memory

55:12 Thank you. And this particular way writing your cold. So in that

55:19 , on average, the number of for memory references to not and so

55:29 trick. And to get it to end on sort of cartoonists, simplistic

55:37 , but to show, um, clear what it is. So instead

55:42 if you do things in terms of kind of block Major C's instead of

55:48 off, Uh, the mattress is B and C. It's just being

55:53 two elements. You think of them being ah, so maybe by be

55:58 on a collection of be by the . So in this case, the

56:05 structure is still kind of in in product. But, um, Eastern

56:10 product, this now is small matrix multiply. Taking a block of a

56:15 the block could be to generate the off. See, so And the

56:22 is, in this case, you the block. Size is such that

56:29 three blocks A, B and C in the cash, so and if

56:38 do, then you can work through math on, and I won't go

56:45 the exact elevation, but hopefully you follow it. Just it's the same

56:53 operations. But in terms of the you deals with the block. So

56:56 this case, you don't, um as a function here best it says

57:02 this case, you get the ratio to memory references to be proportional to

57:11 block size, as determined by the off rose and columns in your little

57:18 . So B is the number off and rose and bi bi bi

57:23 So in fact, as be then you get more and more.

57:29 , you know, if you're it doesn't become end because then the

57:32 is air. Not that large, but it's still improves a lot.

57:37 I'm just going to this good second size of eight. Make a good

57:42 four and 16. So we're depending lot. Uh, say level of

57:49 a. Have you pick your block that gives you then a performance.

57:55 here's some numbers were very old computers , but just to show, you

58:00 if I kind of worked out depending what the Matrix eyes was, your

58:04 32 by 32 matrix and up to some of the 30. To do

58:09 , the factor of force be an for a larger magics of women were

58:13 for 3 3.5 But yes, to you an example in for this the

58:21 . And it's a little bit month player on the figure out the

58:24 But basically the number in the equation is all the three matrices needs to

58:30 . The capture that's a three b and em fastest the cash size.

58:34 that gives you best way. How your picture? Big. So I

58:43 this is pretty much and in in the one that is interested in

58:48 theory you can actually approve this particular is, uh, symptomatically optimum to

58:54 it, Um, quite well on scientist HT. Come on. One

58:59 the students young, they approve it ago in 1981 is that little table

59:05 is the paper so but it sold its own as you do you know

59:19 said today's processor typically have three levels cash, so they had registers three

59:26 of cash in the main memory and are really large that I mentioned in

59:32 computation electromagnetic greater scattering problem. Then doesn't even fit the main members.

59:39 then you also have disc system. next year is kind of six level

59:44 the memory. And in principle, do this for every level of

59:48 So you're nice, clean, Your advice too, Um, right

59:52 software engineer and becomes quite miss it you try to off the most things

59:56 it becomes no bunch of nested But that's gives you potentially an order

60:03 magnitude better performance that if you have nice, clean cold. And here

60:11 another example of some of the data , so I don't have any You

60:15 benchmark stato. So, um, it's one thing that may be good

60:22 know. So yes, the community does software, um, notes about

60:29 things on the also knows that this kind of a pain to do,

60:34 people have no So the order tunings for the effectively. Uh, some

60:40 them do the brute force. They out all combinations in in terms

60:46 ordering loops, blocking size it etcetera figure out what the best,

60:53 strategy is. And, um, course, if you're only going to

60:58 one of these Operation one goes in . You may take so much longer

61:03 figure out the Upton and actually do job in this about them away.

61:07 But, um, depending upon how works, then let me do

61:12 So there is one system that it here that at less I was developed

61:18 about Eclipse Oakridge on University Tennessee in with the old which lab that need

61:27 was one of the first to do tuning and the next side it's

61:32 a little bit old data. But shows, um, this is vendor

61:38 that basically, you know, but vendors in Come on in and see

61:44 vendors Ankle has a very strong and to do math libraries as well,

61:48 compilers and all most tools that you on. So that's some of the

61:53 vendors. I am HPI and Did you have the owner for linear

61:59 operations? They're common enough that they to provide up to my slide birds

62:04 their platforms so that this kind of reddish bars here outclasses thie blue bar

62:14 the then there's no fortune. I the broom bars. But just comparing

62:18 blue and the red bars with his in verses. Vendor optimized. You

62:24 tell that the otter tuning does very , and sometimes it even beats the

62:33 libraries and least a bunch of different on on the horizontal axis.

62:44 um, all right, So that was made too expensive, but

62:50 should say a little bit. So does it have to do them ghost

62:53 a nation? Just because on the operation count for memory reference,

63:01 would be the same. You there's a potential is the same.

63:07 now how does this thing then actually you in terms off scouts in

63:16 in the questions for a common so now So these slides are bore

63:29 from colleagues at Berkeley. So Berkeley a very strong group in terms off

63:37 analysis, and they do both suffer , uh, the math bark off

63:42 analysis. So here's kind of cartoon , off the steps in the ghost

63:49 the elimination syrup things below the diagonal successive columns, take it simple without

63:56 pivoting partners. Move on, um know it's again there pre nested loop

64:05 and doing it. So So I the bottom. It's best the

64:14 whatever is, uh, and the in the first in the column.

64:22 trying to do the elimination on divided the diagonal Element A II and then

64:27 through the elements of their own uptake role, and then you go through

64:30 different rose on that is to J on. Then you start from the

64:35 . Um, so, um, the first couple of sciences just

64:41 a little bit cold optimization don't do in the innermost look like you don't

64:45 to have their I'll just take things . So in this case and missed

64:48 multiplier, then you don't need to the multiplier as you go along the

64:53 because it's the same for all the . And they're also don't have it

64:56 them animals, loops. That's what did. Um, then the this

65:01 the, uh, dead What? , okay. Just simpler to change

65:09 loop index because you don't need to eras when you know it's going to

65:14 zero, So don't do it. , okay. And the loop index

65:18 our start from Are you start when plus one instead? That was

65:22 Um Is this little bit again storing but the players a PSA said in

65:29 Matrix. So Baskin store L in Matrix A So that's what it's a

65:34 . I is now the There's an of the lower triangular matrix else.

65:41 what they did. So this is anything. Yeah, Get into ghost

65:46 elimination, but forgetting they're really Um, no. Okay. And

65:56 , yes. So this bit this of loop on. So in terms

66:03 the last look. So desk the just go through and compute in the

66:13 look version He just compute all the piracy down the column computer the multipliers

66:19 , then the next. The other loops deals with updating the matrix that

66:24 have. So and the reason for that is what is comes on the

66:31 slide soul. There's something known that's basic linear algebra subroutines, laws for

66:39 that you find and both public Then you're are several libraries. You

66:46 it in all the vendor libraries and side putting that played doctor work that

66:51 can go on. You get very code for doing these things. And

66:55 comes. The blast routines comes in different. Um, I shouldn't say

67:02 . Spot it. Uh, depending what then You're algebra operation. It

67:08 . It gets classified as 12 or on. It's very simple. One

67:13 one group to is to look 33 so one is just needing the vector

67:19 vector on vector. So there uh to it's like matrix vector matrix

67:26 a vector too nested look. And three is three nurses hope so that

67:33 like maybe six months. So just splitting the low things allow them to

67:39 figuring out the coefficients on the multipliers the column to use blast everyone because

67:45 a common vector. So it is vector. So anything you can use

67:50 vendor library optimized or public, a just to scale back. And then

67:57 can use the matrix vector thing for they some medics in the lower right

68:05 corner. We're still didn't get tomatoes , and now it is the

68:11 How do you get to make its ? And I just This is kind

68:14 amount of motivating slide again shows again old computer. But last two is

68:20 a little bit better than the one blocks three again the courts off plus

68:24 again, eczematous might think about the . So they have the potential for

68:30 off data so you can get benefits cash. So you want to try

68:34 get to something that is, last three or similar to matrix

68:41 mate, Expect er, if you about a matrix vector each matrix element

68:46 is the end square going. It's used once. There's no they never

68:53 except for the vector X or a . There's no data use for the

68:57 eight. So major X factor is much better than simple vector operation.

69:04 huh. So, um, so I did a few more minutes and

69:12 I'll get to the punch line in case. So the point is,

69:17 , drunk and show it more in of the picture. So yes,

69:23 use with people called Delayed up So you find the multipliers for a

69:30 to turn it into zero, then supposed to update, Um, whatever

69:36 is to the right of that No, you don't really need to

69:44 all of it. That's know, you remember that you should do it

69:48 you only need two. In update the next column because you need

69:53 know those values and order computer that so you can then and say,

70:02 , I maybe I do. 234 . I do one update the next

70:09 or the next two or three columns . Then I can compute and you

70:14 them off the pliers. But then , I will have to update the

70:20 of the matrix, too. But what you in fact, have instead

70:24 having a single column? This bluish now is the bunch of columns and

70:31 . They're kind of whatever pinkish or little bit shaded block on top of

70:36 green. One is also a few , so the blue on whatever their

70:44 is on top of the green. kind of skinny. Major cities are

70:50 tall in terms of the number of and few columns in terms of the

70:54 ah, and few rolls and many in terms off the top of the

70:59 color guys. But that is major . They're not doctors anymore. So

71:06 you wanna have one column and one , it's in fact kind of a

71:14 a product or column vector times and back that as the principal Operation toe

71:20 the green. But when you have columns of blue and multiple rows of

71:27 other one, it becomes now matrix multiply that you used to up take

71:32 green. So that's where the game in trying to do the delayed

71:38 Because all of the sudden, I've turned sort of matrix for

71:44 you know, other product things into matrix. Multiple. Okay off.

71:51 this is what this set of slices to do, and you can go

71:55 them. And, um so this the whole game. And that's why

72:01 understanding how to do make it multiplication important in terms of performance and how

72:07 use memory hierarchies well and compilers can good and figuring out how to reorder

72:14 , but usually and not necessarily trying partitioned. Oops and on do

72:20 Um, so understanding it the self not conceptually difficult. So that's a

72:25 thing to keep in mind. And you can engage it in then doing

72:31 matrix operations, whether it's comes in or some of the other met the

72:36 , your household or what not. can still do this operation to call

72:41 turned things into using matrix matrix, and the next slide. Um,

72:52 , this is again old sites, the point this essential to see that

72:57 green is matrix multiplication on the blue thing is, have you got an

73:05 thing on? You can see that gods an elimination performance is pretty much

73:09 same class matrix matrix, multiplication because , and the workhorse in the guys

73:16 elimination is matrix making small application in of computational stroke. So and this

73:25 just a reminder in the same Intial Before that, you you have

73:28 performance and then I was stopped My time is up, but I

73:34 to the punch line. And how you? I guess thinking about coat

73:40 computer science folks should know where about an optimization of cold in my

73:45 Very simple and just conceptually think about underlying hardware that is being used and

73:51 to benefit from ridges data. So you very

-
+