© Distribution of this video is restricted by its owner
Transcript ×
Auto highlight
Font-size
00:02 Okay, so I will talk about fourier transforms today. And the reason

00:13 that One of the teams when I to do a project on the 50's

00:20 huh. As the acronym is for fourier transform. And uh my understanding

00:29 that the team is adventurers and don't a whole lot of background in terms

00:37 their 50s. And it's um one the most widely used computational procedures or

00:47 of anything. So I'll decide to about it and I'll try to done

00:54 once I described what it is, of the techniques were used to try

00:59 get good performance on processors. please. I love all right.

01:11 there's kind of an outline of So first introduced the discrete fourier

01:18 that is the basis for the fast transform and then look at how F

01:28 actually is working or put together. and mhm That's pretty much it.

01:41 motivation. So why one should be and quite important in many disciplines are

01:50 their 50s. This is an example original wide range of different types of

01:58 where it's used. So it on left hand side is kind of electron

02:05 microscopy that is basically dealing with They are highly very noisy images as

02:18 thing with the young go circle shows it looks like just noise but it

02:24 . And to try to make something out of these. I didn't always

02:27 images. A 15 placing key role eventually one can actually reconstruct images of

02:37 like the one at the bottom on left hand side. Another part of

02:43 you may be involved with colleagues in department they are interested in it,

02:51 resonance imaging or admire are I for uh in that case uh 50 is

02:59 the critical role because that's the way actually managed to get an image of

03:05 going on on or the structure of brain. Others are kind of more

03:11 the single processing domain and just show examples here in astronomy that tends to

03:17 very large 50s and billions of data and then on the right hand side

03:26 is what's typical for The use of . 50s and American competitions for solving

03:37 forms of for instance fluids problems are finding out what goes on and some

03:43 to make. And the slide is to illustrate that F. F.

03:53 . S. And also were used a very long time to do

03:59 So it turns out that the four basis with just two econometric functions like

04:07 and co signs are very good set basis functions as it's known in many

04:17 image processing and all some of This isn't solving Pds and many types

04:25 functions can be represented as a serious sign or co some functions and this

04:34 of shows the case of the compression upper left them E. Is an

04:44 of a bunch of pixels and then rest of them from B.

04:49 D. and E. And Is um representing sort of the fourier

04:56 on that. A set of pixels the pictures of the IV. And

05:04 upon how many components of the four serious one users from yet the not

05:11 great representation of E or one can get pretty good representation of the for

05:17 lot less than just communicating or using the pixels. So instead of during

05:25 pixels and it's not clear from the slide weather 10 24 is the size

05:35 civilian pixels that probably what it is represent the so in this case instead

05:41 a million pixels and he is pretty than Earth is indeed a better representation

05:47 E. And for F. Arizona 400, coefficients that pretty much gets

05:55 good the resolution as the pixel So that um and what the direct

06:03 that was used in the mpeg and play standards or what's known as sign

06:09 coastline transforms. Um Now therefore for transform, but it's based on before

06:16 transform, so that was a motivational or of four FTS. Um So

06:29 we talk about the discrete fourier transform the discrete fourier transform is simply a

06:35 version of The four Year Transform or trying to do before we transform that

06:43 kind of defined on the top, is based on an integral of a

06:48 that have F of X. And are these four year basis. The

06:56 and cosine is often represented by Roots unity as they're called or E to

07:01 -1 something. So so that leads to the discrete fourier transform and I'm

07:17 going through here, how you go the continuous domain to the discrete

07:22 So we just have to take it face value that if you do it

07:26 in America one ends up with is ID for your transform that is in

07:32 middle of the image and sorry for rotation. So um for the bottom

07:39 of the image F of X is with um lower case X. So

07:50 the E to the minus. And yeah. Oh my God max

07:55 replaced by this or Madam piece. that you can see that has some

08:05 , what's on top for the continuous analytical fourier transform. So um the

08:18 the collection that discreet from your transform for each capital X. Indexed or

08:26 em the right hand side is simply a product as you do that product

08:35 this over the piece. But the values to sample some one has the

08:42 then um it is simply in the and then and repeat that for a

08:53 of and values that covers A certain from 0 to the mine is wonderful

08:59 enumerated again discrete values. So it's social in the next side it ends

09:09 being simply made to expect an the inverse discrete for a transform is

09:17 very analogous as you can see it's uh maybe it's factored multiplication something so

09:26 that, let's look at that what matrix actually looks like. So the

09:33 is then the omega piece to the of M times J. So um

09:43 if M is zero, that is first drill in this matrix, The

09:50 of Omega P is zero regardless of . So that means um If the

09:57 zero then regardless of what the home that are than the exponential Haitian results

10:06 the wants, the first row is . It's also the case that the

10:12 column is a one because then Jay zero and that's a summation index of

10:19 semester is X. Of zero is multiply by one. So on her

10:27 um particular structure where now the other entries is for the second role Mm

10:38 one. So it as you go and through the inner product than the

10:45 exponents of the moment. A piece simply the j value. That then

10:51 from 1, 2 p -1 of different start in the first column from

10:55 to P -1 And then one has . Yes, plug in the

11:02 And the jenny, someone got the of this W matrix representing the uh

11:13 of unity or the omega keys in a discreet for your transform and the

11:25 of interesting property of the fourier transform the discrete fourier transforming them to be

11:33 . In this case said the matrix is simply the inverse of uh the

11:41 entries in the forward transform. And fairly easy to show that in fact

11:46 is true. Yeah, so that's it's quite easy to compute in verse

11:56 you transform if it actually has what's then called the forward transform. So

12:07 questions on the G. F. . And may be familiar to or

12:13 but it's not, you're welcome to questions If not, I will turn

12:26 one. So to the first floor transform that this thing that um used

12:36 extensively in many disciplines. So I even compete in my matrix multiply or

12:45 even it's more frequently Houston matrix So the first thing is just to

12:53 out that there is the cause of um popularity of this algorithm. The

13:01 comes from it competition all efficiency. unlike the Matrix electoral products, that

13:10 basically order P squared estimate tricks exercise times P before we transform is that's

13:21 proportion of the P times log P of P square, if the

13:27 probably nobody cares. But if p a billion or more points the difference

13:34 large as I will show on the slide. So the transport was actually

13:44 discovered by the all physicists and I uh later we discovered, I don't

13:54 the gap in years between the original the cool it. Okay. Um

14:03 on the festival you transform that somehow got the name recognition for this way

14:10 computing. The discrete fourier transform uh with the notion always can different radi

14:21 are I guess it's called so it's to four and eight and I

14:27 show you that and that's actually important understand because that's a property of deriving

14:34 algorithm that is the basis for getting efficient implementations of the Cooley to can

14:49 and the cool it to transform as will see, I want to talk

14:53 it is restricted to particular sizes of dataset or the input factor accident doesn't

15:04 for an arbitrary length factor. It fact only applies the power of two

15:10 factors so people and uh I think with it and come up with other

15:17 and which one is known as a race addict that can be adopted them

15:23 be more flexible in terms of input sizes. Another one is rader's algorithm

15:30 then there are particularly clever one for when the input factor is of a

15:40 length or can be factors in the of crimes and then continues this prime

15:48 over them. Um mhm So I going to talk about the quality of

15:57 . That's that's the simplest one. and probably no also for the project

16:06 someone you will do and there are trick Aries that is being played.

16:13 Remember from the definition of the discrete transform it had this video factors about

16:23 play the you can see here on top right hand corner. Let's eat

16:29 in this case minus to pay high where so that means the homemade apiece

16:35 complex numbers. So yeah. it turns out that the input

16:47 mhm real and not complex numbers then can actually take advantage of that To

16:55 about a factor of two in operations memory. And I may be able

17:00 make some comments on that before the is over and then there is further

17:07 uh efficiencies that can be again, the data has certain symmetry properties and

17:19 it reduces more or less by another of two in terms of work.

17:25 And on Houston so called signing coastline and that's as I mentioned, was

17:32 basis for compression algorithms that jpeg and standards for many years now I want

17:38 switch to using way that transforms into and the other two things. Self

17:48 and in place. Self sorting a bit obscure at this point but it

17:53 become clear as I talked about it place simply me in some fun um

18:02 allocate separate members for input and So which would kind of double the

18:10 requirements and if you have a billion transforms doubling it. It's kind of

18:16 necessarily what you want to do. that's what's called in place algorithms.

18:23 they worked in an array that starts the input data and when you're said

18:29 done it has the transformer, the data without using much of temp

18:40 So here is just the pointing out the means to do something that is

18:46 important benefit or the p squared versus P log p. So as I

18:54 , uh astronomers are the ones I I'm not use not just the one

19:02 to 2 30 is a billion points they use many billions of points

19:09 I happen to have work with some dana way way back In the 90s

19:18 at that point already did multiple being transforms and the difference between square and

19:29 is quite significant as you can so and the number five instead of

19:38 L O P will be fine So that's in fact what it is

19:44 call it. Take the transform. now I'm going to start to look

19:55 how this FFT is can be the and get some insight into what it

20:02 and what the cleverness is but I'll for a second before diving into detail

20:10 there are questions it's not. So So there are two versions of the

20:28 it okay, there are many Russians in terms of how they kind of

20:35 in how the memory references kind of . Yeah. Typically called the decimation

20:44 frequency and decimation in time. And will show both of them the most

20:53 the literature from anyone reads papers or know it. They tend to favor

21:00 decimation in time and they don't mentioned in terms of decimation in frequency.

21:07 it turned out it turns out that terms of parallel computing in particular if

21:14 do large FTS on clusters, it's to understand both. And hopefully pointing

21:27 out before the lecture Is over why should know both. All right.

21:36 yes, now I'm trying to show have to There, I know what

21:41 basis is for their 50 and how get from something that is or the

21:47 square to something that is full of . So it all depends on the

21:53 that this matrix the w matrix the of the matrix or interests uh unique

22:05 and that's what's being exploited. And first formula transformed. So we'll talk

22:14 how to exploit it. I'll start the thing that's true results in the

22:23 and supposed to that the destination in . So for the decimation in frequency

22:30 thing is to observe as this particular of once known as the twitter factors

22:38 the roots of unity the omega p we're probably most of the trunk.

22:45 the notion of twitter factors And that uh huh. Talking about the end

22:51 the piece and the notion why they're roots of unity is illustrated on the

22:58 hand side. So which is a of the unit circle in the

23:03 The main and the omega piece basically this unit circle and the Lord of

23:12 P. Is the finer or the the angle is for each one of

23:18 slices. Remember that on a happy it to the -2? I divided

23:26 peak. So Pete tells you how uh slices there are as you go

23:32 the human circle. No, since omegas are A to the -2 pi

23:44 over P. Um When you take power omega p To the people of

23:55 two power. Since I should have that on the slide. Sorry.

24:00 So e to the -2 Parts are about 30 times. T over two

24:07 simply E. To the minus um hi B. Two. So that

24:17 you go half away around the unit , So that part of our monarchy

24:25 simply -1. So because you think as you increase the power or made

24:34 p, you kind of add one at the time and P over

24:39 you're halfway around the unit circle. um as you keep increasing the power

24:45 you get that's the one. Now the decimation of frequency one can look

24:52 the sum that they had, that basically take a roll of the dft

25:01 , that is the inner product. the inner product then is the omega

25:08 too. The power that is a of withdrawal you're working on. And

25:15 for the, going through the columns the matrix and multiply it by the

25:20 factor. So if you do you can split up the sum In

25:25 first half of the inner product, is from 0 to Pi over 2

25:31 . Uh And then you do the half. And the way I did

25:35 second half of this line is I the same summation index. But then

25:42 the summation expression I replace jay from previous language, A plus P over

25:51 . So that covers the second part the summation range for the inner

25:58 So now I can use the properties this omega piece that when you take

26:06 the power of P over two is . So that means in this submission

26:15 You have -1. And I guess an excellent sorry, extra crime here

26:23 shouldn't be so I just discovered that . So I apologize. So,

26:31 and then so I just added these guys made it just one summation sign

26:39 it has the omega lP, that this part. Um And it's the

26:46 in the first half of the inner . And the difference for the second

26:51 is that this guy multiplies the expanse for the second part of the

26:59 so I can play uh for the here, someone consider also in an

27:05 . Values and will become clear when doing that in a little bit.

27:11 now if I look at the and even so this is best for the

27:21 here is the even ones and this be the odd ones. So if

27:28 , plug in uh to our prime of L. In the so upper

27:36 a quick slide equation. And then it's an even power of L then

27:46 to an even power is always plus . So then this expression simplifies to

27:54 part and for the other half the is odd. So then Basically

28:02 So in that case you get essentially expression and the other part that was

28:11 here since we have Formula P 2 crime one point as well. Um

28:23 at this six. Turn it into a repeat of the two as the

28:30 part. So remember um this I'm sorry, I'll need to flip

28:38 . Yes. Yeah. So here the armor to remove that. So

28:46 um if you have an even power this uh or kill me too,

28:55 can sorry if you have a two here then you can as well move

29:04 down and They would be over two the denominator, that's kind of what

29:10 exercises here. So at this point looks perhaps like totally meaningless manipulations but

29:19 not. Um So the difference why not is that we started out rested

29:33 this in a product here that is uh have p terms for every question

29:43 every L. So this is and enumerated for all the ills and we

29:48 this matrix vector application that we started . Now what they have here is

29:59 inner product. That is only of half feeling but we got two of

30:07 because there's one for all of them either. So it may not seem

30:13 much of again that I want to . So this is that it's not

30:20 have sites. So what's the Well, I guess first course,

30:27 on. So this is the notion the butterfly. So this is just

30:33 what's being done here. Right? to do each of these two have

30:39 . Dft is what's needed to be is to replace the input by um

30:46 these two input values here. Um the other part you have the take

30:53 difference between the same input values and you must apply it with this.

30:58 my God, jay a woman up to the power of Jack, this

31:05 , the Amadeus here in front is same invoked. So that's you cannot

31:10 ends up with the to do these slightly different input factors. One has

31:15 as an input factor and the other has the omega p to the power

31:20 times the difference as the imperfection. that's what's kind of illustrated on this

31:28 . So now you have the two and one can draw a picture of

31:32 and then it becomes known as saying . Uh people talk about butterfly competition

31:42 talk about butterfly networks when it comes interconnection networks and it's all because of

31:49 like the drugs relationship between the input the output there. So here is

31:59 of what we just did in terms um This generating the to have size

32:10 or uh yeah, so um and way we did it is that There

32:20 one for even components of the output there was one for the components of

32:25 output fractiousness run for the gun the X. The even ones and

32:31 there's the bottom one is the odd . So this is what was

32:40 And the first thing with all the kind of thing. And solid bullets

32:46 just this collection of butterflies. That the computation required to reduce it.

32:54 the requirement to to half size F. T. S. But

33:00 I can apply the same procedure over over again To these half size of

33:09 . So if one does that, ends up with something like this.

33:15 now we have and by doing recursive, lee dividing it while generating

33:24 size FTS that means after logarithmic number steps you're kind of done. So

33:32 why the depth in some sense of . FFT a butterfly network or stages

33:40 the competition this logarithmic and for each the number of operations is proportional

33:49 The number of data funds. So the way you want to insult.

33:55 the algorithm being of order peel api the mountain rotation. Today we're in

34:03 from using the normal thing. There's things to observe on this slide that

34:15 with the input and I normal border just taking the Integrates and you know

34:24 0 to 1615 in this case for input values. But because of this

34:32 even splitting we respect and the it turns out that the output is

34:44 referred to as scrambles. Um So means for instance if we look at

34:57 we take the lines as actual memory so in the first memory location then

35:06 the output for an in place algorithm um zero frequency components is in the

35:16 location as the zero input value but next memory location that used to have

35:24 second input value. Now it has the second frequency component but Frequency Capone

35:34 eight and it turns out that this scrambled order or lucic unscrambled order is

35:44 fact well defined as every traverse order I will talk more about that before

35:53 done and I'll probably I'll stop at point here and but let me see

36:00 my next flight is first. Um , so let me just make some

36:09 on this. And this is particularly When one does distributed the fifties and

36:24 it is looking at these so called the factors for roots of unity.

36:32 in this particular celebration did in the butterfly stage here, the roots of

36:44 corresponds to is slicing of the upper of the unit circle into peace

36:56 We only need half of it because used this property that in this decimation

37:06 frequency algorithm that they looked at data in the input, that is half

37:13 the input factor away from the So those twiddle factors, the fact

37:20 replaced by a minus times the total on the upper half of the unit

37:28 . Now as we proceed to the stages than data was of the half

37:35 size and there were two of them each half size. Fft used the

37:42 uh that the federal factors of the half it doesn't know whether it's other

37:47 input data or what not. This knows about the size and since the

37:54 now is half of the input. in terms of the power of the

38:03 subjects set the twiddle factors. There's half as many slices or in terms

38:10 exponents on his twitter factors. It go 123 anymore, but it goes

38:17 0, 2, 4 and basically twice the value of the twitter

38:26 Exponent in the 1st stage. And tell them what you're saying is also

38:33 means the people of factors used in second stage, it's a subset and

38:39 every other people in fact produced in first stage and then it goes further

38:45 . So then it becomes around only two unless every other of the

38:50 stage. So, and in the it's just a very simple bath

38:59 Yeah, So one more comment and I'll read through my comments under the

39:06 factor setting. So, um what and fft applications, one does many

39:18 f t s of the same size that's so that's what we typically

39:28 You compute the trader factor once and you re use them for every Application

39:36 50s of the same sex. So an area where you included for the

39:44 that computation in your code, but not a good way of doing it

39:50 . So you want first to uh , compute them and spend a little

39:57 of memory or the memory to um them and then just use them every

40:07 of the fft and that's where the is important to know how these things

40:15 in the context of distributed FTS because typically different little factors are allocated in

40:24 nodes of the cluster and you want be able to make sure that they

40:29 in the right place for where they needed. All right. And this

40:36 pretty much what I read is. , but they did not say is

40:43 . Well kind of if one has computed the total factors and store them

40:49 an array Because of this part or property on the cool. It took

40:54 of 50 one can use the address to figure out what the exponents are

41:00 this is simply what um, is the bullet points here says that it's

41:08 step off the most significant bit and into the race um, when you

41:16 to retrieve the proper total factor, questions on that before I talk about

41:24 decimation in time. Okay. So is also This otherwise decimation in time

41:40 tend to be the one that shows most of the time in textbooks about

41:44 50th as well as often in some people develop efficient algorithms were used

41:52 some competition. So here is now hmm. Wherever one looks since that

42:02 even input it. So in this the input factor x is petition

42:11 into looking at the even components in first submission here And help some a

42:19 of -1 such woman peanut yet. sorry about that. And then the

42:24 components of the input factor. Um then, so this is just again

42:32 inner product but rewritten in for even odd components of the input factor and

42:40 the same game. Uh Bye have . Uh Okay prime them being two

42:47 promptly even so I can take these and move it down to uh from

42:55 know from the top to making a . Over two instead. Someone now

42:59 kind of half size in a product also use some fiddle factors that are

43:05 half size half Ephesians. All So this is Oh just but now

43:09 -1 makes sense. Hopefully mm. And now one looks instead of 1st

43:18 2nd half of the output vector, capital. So under the decimation of

43:26 they love that input their points halfway and not even put values. Now

43:38 at what even input values and um values separated about half of the number

43:46 outputs families. Otherwise it's kind of in the manipulation. It's very

43:54 So if one does it this way me kind of derived the application of

44:03 divide and conquer scheme. Starting with output and now we used half

44:12 Fft is based on even or odd values but it's still kind of a

44:18 competition. The butterfly looks a little different in the sense that the complex

44:25 happens before the actual but if I on the decimation in frequency that happened

44:31 the output, one of the outputs the butterfly. So it's not particularly

44:37 but it's very similar and then one again continue to do the same exercise

44:43 and then man get something like So now, you know, we

44:50 with basic derivation from the output that had a normal order and then figure

44:58 this obvious split that happened in every step. It in fact resulted in

45:05 the kind of our input yes, to be in a bit traversed order

45:16 order for the output to be in order. There is a a few

45:28 things to comment on and I don't exactly my sights here but to be

45:35 here is now for this decimation in derivation all the twiddle factors spent along

45:45 After a 1/2 circle is used for last butterfly stage instead of the

45:54 So the kind of order in which title of factors are used is reversed

46:01 to the decimation in frequency. They're and that's kind of one important fact

46:12 in particular in the parallel computing or fFT competitions otherwise. So I think

46:26 so this is just script mentioned that and here's kind of but I also

46:32 talked about and harmon then use the address or in stage number to figure

46:42 uh huh how to index into a of physical factors. Let's see what

46:51 next time is. I guess it's the two that's already done in talking

46:58 them. Um So um the way derived it for the decimation in frequency

47:12 they input was a normal order and the output being a bit reversed

47:20 Um The things were totally opposite for decimation in time And I also commented

47:27 Howard 20 factors are being used now the fingers, right? The I

47:44 to talk about the last comment on slide, I think on the next

47:51 . So this basically showed that the on the left hand signers to instances

48:05 the destination of frequency on this there is the left side that is

48:09 what was previously in the derivation of Decimation of frequency at 15 but it's

48:17 fine to kind of commute the lions you like in the but if I

48:29 so then what happens if you were do that? Which is essentially to

48:33 well instead of having the import vector to memory in normal order, if

48:39 assign the values to memory location and d traversing the index of the input

48:48 , then you would get the output normal order. So, so the

48:55 of this is to say, well is what is often not pointed

49:02 is that one could as values the in frequency, as a decimation in

49:11 . If the data presented as its in vitro first order and the

49:21 stand up also for the decimation in even get the output in normal.

49:32 another point that I wanted to make I haven't been before. So whether

49:39 looks at the left slide, the butterflies combines input values that are half

49:49 apart. X zero and X. . Is part of one. Butterfly

49:54 and 9 is the input to another . And that obviously doesn't change in

50:00 of this, the allocation of memory the competition needs to be the

50:06 So in the first stage and the and fire and frequency one combines in

50:16 buffet computation input values. There are the input director apart from each

50:27 Now the fun. Alright. Uh I serve now if you look at

50:32 decimation in time left his side is what Peter ended up within driving the

50:40 in time. 15 and the same . Yes. To the traversal of

50:47 assignment values, the memory location and get the right hand picture that shows

50:53 . So if you have there in order has important Eucharist values the destination

50:59 tiredness. The decimation in frequency. the consequences that now the output to

51:04 traverse for So they're both are perfectly to use regardless whether the input orders

51:14 and be traversed. The other point again like I pointed out for the

51:22 in frequency on the previous slide. even for the decimation in time.

51:27 the first stage you combine input There are half of the infant sector

51:36 . So the set of data values are combined as you go through the

51:41 of the FFT follows the same Whether you use decimation in time or

51:46 in frequency. That's another part that important to understand. And so the

51:57 reference pattern does not change whether you decimation in time, the decimation in

52:03 , it depends if the input order normal or be traversed but the combination

52:13 input values that is used in the of our stages. This is the

52:19 . Yeah the butterflies are slightly So how the trailer factors are used

52:25 different but it's the same fiddle frankness just the order in which their use

52:32 different. Yeah, so that um just to comment on this and then

52:43 will stop so just to be kind obvious um just illustrating what the controversial

52:53 in terms of having a binary representational um and it just best there so

53:03 look at the binary digits in the cold column and you look at in

53:11 bitter for asbestos, if you write from left to right then it was

53:16 of write them down in the opposite when you do to be controversial.

53:19 nothing more to it than that. it's very well defined what it

53:23 Um It's yes uh kind of do reflection on the bits around the need

53:34 So now comment on this and then was gone again and see if there

53:40 questions. So the victory reversal is that has received a lot of interest

53:54 again, interferes in one of the common algorithms out there and sometimes people

54:02 to have the same input and output . That means we need to add

54:06 stage that restores the order that was in the input and that is the

54:15 . Now people also want to not have temporary race to have So they

54:22 to do it in place like they to hear 50 itself and then it

54:27 somewhat tricky. So there are many um about how to do the traverse

54:37 in place to avoid a large amount temp storage and their papers there's also

54:46 book on for fast fourier transform but band on uh on a university that

54:53 a very good reference for both the fifties. The other point to be

55:00 of As a woman look how clusters Mandelson distributed their 15 then Yeah,

55:09 type of fermentation that the V traversal often is our process a lot of

55:19 in the network, if it it tends to be the case that

55:25 transposition does. So if anyone is in trying to figure out the capabilities

55:31 the interconnection network in a cluster uh to us you two tried the controversial

55:41 matrix transposition and find out what one may work well And the other

55:46 may not and it's also good if buy something, it's a good benchmark

55:52 or acceptance criteria. If you want test what the vendor promised in terms

55:57 their network and I will stop before on to the next. Okay,

56:14 . So mean in some cases that's yes to do say informant and

56:33 So like in the case of that's good a wrong application. I

56:43 about to say in terms of Yes To do sort of the forward

56:48 in order to find out how many coefficients you need to kind of or

56:54 enough to represent the image that sort compression that happens reduction from the number

57:00 pixels in the example I talked about the number of fourier coefficients but then

57:05 probably like to restore the image and you're going to do the inverse

57:12 Um in terms of the M I. That also mentioned, that's

57:21 application. It turns out data that get from the MRI scanner as it's

57:34 . The input data is actually in full year space. So they get

57:38 image you need to run an enormous transform and that may be all that

57:45 want because you may want to understand pictures of their organs that are trying

57:51 image and then then you may work the image space. Now in terms

57:59 the american methods, it turns out people often transforms, It's a geometric

58:08 or time into the for your domain then it turns out manipulations or algorithms

58:16 faster to use in the four year and then eventually I should have done

58:22 . You want to come back to space in which you started. So

58:29 quite common at least in terms of and engineering numerical methods that You use

58:39 forward and inverse f. 50s in . Yeah. And this slide basically

58:45 that if that's the case, one necessarily need to bother about trying to

58:53 order. So the traversal is built and the reversion implied in the

59:02 But as long as you know where data is, if you do a

59:06 followed by an inverse transform the result the end is that the output is

59:12 the same order as the and so why it's good to do if one

59:20 want to see if the coals or else. There is no need necessarily

59:27 um including the traversal stage, Uh when we're getting the index memory

59:39 we use a compliment context instead of the so I was a little bit

59:47 Some NBC if it gets clear. , the compliment. So let me

60:01 the next line maybe see if that's the question is or um so there

60:10 no beach worse unnecessary. So um come back to this. So if

60:16 do. Okay. Yeah. So Oh okay. So it was

60:29 think that you said the compliment. that what I heard? That's right

60:36 compliment. In what sense? I if you're taking 40 so if you

60:43 a complemented it's very like all ones it'll be like a mirror image but

60:48 a different restaurants. Um So so you mind repeating and see if it

61:01 through more clearly? Um What is is rather than the worst thing of

61:08 ? Uh And you simply just flip bits Make a 01. Does that

61:17 not change any? Does that change alphabet? Um Okay I'll see if

61:27 following answers the question so mhm. point on this slide is essentially you

61:34 need a bit reversal if you do both. Now this line was you

61:38 the input in normal order and that eventually you get Results of the two

61:43 combination. Also in normal order. you have the bit traverse input order

61:48 works as well find them the middle the normal order. And if you

61:52 the uh members of fatigue to that you get back to Big river

61:58 That's that was the question. Right we now I don't think so.

62:19 question I think if I understood correctly rather than doing the reversal, one

62:27 the processes that showed up um if the reverse and so on. Okay

62:37 sorry the reversal of the data No it's simple. Let's simply just

62:50 the bits rather than reversing the order make a 01 and change 1-0 rather

62:57 reversing the order. Uh investor. . But even so doing in the

63:10 part to bring in a lot of for different limits. So you can

63:18 manipulate in the seas and whatever way want. But you cannot change the

63:27 items. That needs to be combined each stage. So it's important.

63:35 possibly index of the values that are by the numbers or that's just the

63:46 are there to illustrate a binary encoding the integers and the data then lives

63:58 memory locations. And there's a question retrieving getting the right address is to

64:06 the proper data. That should be in the butterflies. And if you

64:12 it in place you have no choice the output then corresponds to in

64:20 C. So after you've gone through the records of stages, in order

64:25 understand which frequency component in this case capital X. Is in that memory

64:34 . You need to basically do a reversal on the address in which it

64:42 , then you will find which frequency is in that memory location.

65:02 I'm gonna be after this. So you don't do it in place,

65:06 means you use temporary storage. So each butterflies state you choose to write

65:13 else in memory. Then you can what's known as self sorting and

65:27 So that means inherently do. Additionally in Prima Titian's into the stages in

65:34 FFT and the self sorting in place tricky but they can be done but

65:40 you built in the equivalent of the reversal into the various stages On the

65:48 15. So it's important to think terms of a data items that are

66:01 combined and where the lives in memory figure out not to address memory to

66:07 the right values into the right panel . And this slide just assumes that

66:14 have an array in memory in which is input data and in this case

66:20 take the normal and the bit reverse it if you have other ordering of

66:25 input data you need to make sure your address is such that you're all

66:31 access input data. That in terms the input data ordering is halfway through

66:40 sequence of part, depending upon the of the sequence. And then you

66:46 a different map to memory if you use normal or be Trevor story

66:53 So important to distinguish uh the combination input data and needs to take place

67:04 where they live and in this case really simple for having them in this

67:10 the input data. Okay. his addresses and memories from 0 to Oh

67:21 that help the question bomba? somewhat we can talk tomorrow one.

67:32 was saying that what the question was here you reverse the bits of the

67:41 or find the index in extras. um I'm a little bit uh hesitant

67:52 say that I reversed the bits because order to get understand what component is

67:59 bit in a particular memory location, need to reverse the bit of the

68:09 . President, the compliment devastates if did a compliment then the complement of

68:16 bits is not the same. You if you complement the address, did

68:23 get a different number than if you traversed the number if that's the

68:31 Yeah that's kind of what? Right If you take the 1st 001 and

68:41 a bit complement of that, you 1110 which is not type and that

68:50 the question. Okay, shall I , yep okay sorry Yeah and I'm

69:04 to answer when I'm back from the is easy but so this was the

69:09 that often do the inverse one doesn't to do the bit reversal as long

69:15 you know where the data lives because need to apply the right set of

69:19 factors that the combination would do Okay, let's see what happens

69:25 Mhm um Yeah so this is just different. So we found that um

69:39 Embers 50 renters just the members of total factor. So that's fairly simple

69:48 now let's see decimation in time has same property. So again, I

69:52 the example here, normal order input then things are d traversed and then

69:58 chris towards the order. So that's same. Um Now this slide is

70:10 trying to point something else out and is that if you and this is

70:19 particular for distributed Fft where this light important. So if you think of

70:31 Partitioning the input vectors that put four In each of four notes and you

70:38 the first zero for three values, a one note than the 4-70s and

70:42 different note. So you kind of uh lines across here. Then if

70:52 then look at the twiddle factor, values of things within the little square

71:00 . As you go through a line the forward to the members, you

71:06 see that effectively the same twiddle factors so being used in the same mold

71:15 the forward and the inverse. So fiddle factors are now in the right

71:20 , but it only applies if you the combination of the summation in time

71:23 decimation in frequency. Otherwise it does . So that's part of the reason

71:30 I said early on that for distributed 50s, it's essential to know about

71:37 forms of decimation in time with this frequency in order not to have to

71:43 with the factors or also communicate twitter or re compute that. So this

71:51 the most efficient type of combination and doesn't matter which one is done

71:56 Uh decimation in time and decimation in , but one of them should be

72:02 in time and one should be decimation frequency. Okay, let's see what

72:11 asked. Try to get to before time is up. So this is

72:18 just saying what I already said. now I wanted to document this once

72:24 some of you, What do you ? A 15 projects? And since

72:30 course is about making efficient use of , memory, hierarchies are particular significance

72:40 this point. It's important for So I had a few minutes and

72:46 love Okay pointed out. So now is this higher ratings topic that comes

72:55 mentioned readies for an eight early on terms of Cooley Tukey fft and the

73:02 to use the properties of the title factors that go halfway around the the

73:07 circle. It is um The power metal becomes -1. So basically the

73:17 any or meter in the upper Yeah. And you can go halfway

73:24 away from that and you have the Value except it's multiplied by -1.

73:30 that means we only needed half of trail of factors and can replace the

73:36 ones but taking the upper half And deployed on mid 90 swan. There's

73:42 one that is very convenient and that actually go a quarter way around the

73:47 circle um january reached the imaginary one I. So that's the particular

73:57 It doesn't require a and they're real . Multiply with high this change is

74:03 real and imaginary components. So that's basis for the Israelis for tight fifties

74:15 I have about five more minutes to tell you this is a bunch of

74:21 equation that looks messy and probably impossible comprehend by staring at in a

74:28 But you can see there are four that is basically going around in this

74:33 taking the input factor, young dis frequency instead of doing an index and

74:40 halfway the vector length you go a way the vector length people before and

74:46 two people before and flip you before then you do the same thing with

74:52 to the output and not just look than even but there's still a quarter

74:59 ah separation um Sorry, four sets basically four instead of two. Or

75:12 now you cover the range to the Multiple of four by adding 1,

75:17 and three. So if you do well just quickly give you the point

75:25 and in order to give you what the point of all of this

75:29 So you get sort of now ratings . But the fine looks like this

75:35 as you can see in this case is effectively four and radios to butterflies

75:43 in this rating for butterflies. Um this ratings for there is a multiplication

75:51 -1. That is just simply flipping and imaginary components with the proper

75:57 And so now you have At the three complex multiplication with privilege factors.

76:08 we had one complex multiplication from each of the four multiply. So for

76:16 so now we basically have instead of complex multiplication in the ratings too configuration

76:22 we have three instead of four. we say one complex multiplication And working

76:29 Great X four but that's not actually most important part. I will skip

76:35 , you can look at it. and this is this mission time,

76:41 can do the same thing. Um then you can even go to reading

76:47 but when you go a quarter way uh the unit circle. Yeah,

76:52 you need some numerical or uh arithmetic you end up with having a square

76:58 win there. So it's not quite cheap anymore. Um it does have

77:04 real foreign point operation but here's pictures it and I'm trying to get here

77:11 to the french sign that is important try to optimize performance and um so

77:20 the upper kind of table, refers to each one of the Butterfly

77:28 , Israelis to foreign eight. How ads? Some of subtractions multiplication

77:33 And how many storage references is and point is then that the idea is

77:39 each one of these butterfly competitions fits fast storage so you just retrieve input

77:48 output data for the butterflies. You have to do any um right thing

77:59 loading from memory to carry out uh really ate a butterfly for example.

78:05 all loaded in what? So if do that one can see that in

78:11 of storage references, if you look the right thing, column going from

78:17 to turns ratings. eight Reduces the access by the more than a factor

78:24 two. It also says a little in terms of the number of operations

78:31 in that case it goes from five roughly four. So it's about 20%

78:37 and arithmetic but it's more than a to the memory references. And since

78:43 references is usually the bottleneck and most systems today, That's why during this

78:52 ratings of 50 are critical and most use them and they the extension is

79:00 won best. It does a red . That is as large as you

79:06 do. That fits in the cash , level one cars and then level

79:11 cars and the three cash. Um means that the look for as you

79:17 from level one to level two, ratings is larger uh etcetera. And

79:23 for the same concept as for the matrix multiplication I talked about last time

79:31 now I'm sure that my time is . So I take questions and I

79:38 stop here and they have been through in detail next time the rest certain

79:51 that I have today, you can some print the picture and flip them

79:58 very quickly here. So sorry it's there as well, you can see

80:03 happens. So in this case the axis is performance in some sense and

80:09 two horizontal access is memory strikes for and output. As you can see

80:16 the strikes gets larger, then the drops and high jobs depends on the

80:23 cache policies for the particular processor that have. But keeping things small in

80:32 definitely has a significant impact on the and I think there was another sign

80:39 that shows the range, the blue , it's kind of bad performance in

80:45 of being put them up, put and kind of brownish reddish bars and

80:51 all that is when you have really strikes and managed to make good.

80:55 first use of cashes and the green are coming on the average for this

81:02 but so it's a huge difference but are slicing, it can look through

81:07 and you're gonna get an idea what is good, good if no

81:17 I will stop, you have to sharing. I was stopped recording.

81:23 huh uh huh I don't know if lost to Russia where along the way

81:31 I think uh stop

-
+