© Distribution of this video is restricted by its owner
Transcript ×
Auto highlight
Font-size
00:02 so, uh, Sabol he wouldn't it from now on. It used

00:09 . You guys here? Can you hear? Okay, uh, let

00:19 just share the screen, All Can you guys see my screen?

00:43 , yes, yes. Okay, I, uh I think the nurse

00:50 lawyer share already this nice. I'm gonna be able to see the chat

00:58 . Yeah, so So, Feel free to amuse yourself. If

01:03 have any question, I'm happy to with that, but I'm not gonna

01:07 able to see it. So amuse if you if you want something.

01:12 right, So let me just move into presentation. All right,

01:19 uh, for those that don't know you are, I don't remember

01:25 My name is with our lamb, of their bitter laugh as you age

01:29 computer science department with her Solorio. in my four years, and I

01:34 do research on energy. Obviously, I'm gonna talk about sequence to sequence

01:40 . This is a bridge side and us here. The many applications is

01:45 . And you know how powerful this is, Actually. Um, so

01:50 get started on that so basically as , if we're gonna have a quick

01:55 of the things that you have already the course Well, a spotless I

01:59 the things that you have cover. then we're gonna motivate why we need

02:04 have an encoder decoders framework and the that are enable based on that.

02:10 then we're gonna have, you some mechanisms that improve over those in

02:16 girlfriend wins, which is basically the Maicon's. There are many formulations of

02:22 , so we're going to cover a of them. You know, every

02:27 much, every day there's a new on a piece of just like

02:31 you will find, like many versions attentions and many improvements, and so

02:35 right, so it keeps moving. we're gonna go for the,

02:38 basic, uh, nutrition and like like, the first contributions on attention

02:45 it seem like the ones that everyone know. And then we have

02:51 We're going to see, like, cool things. The people apply these

02:56 in like, you know, image , for example, Like even an

03:01 , the model is able to compare images features into a caption like,

03:06 know, a bird is flying over sky or something like that. So

03:11 critical, in my opinion. But will see that there are not only

03:14 kind of publications, but those are . Basically, once you master,

03:19 included Nuclear Pringle. Right then, know, just like I said

03:25 there are all their methods of and there's one that is very

03:30 There is close help. Attentions were , you know, just have

03:35 a brief overview on that And that be like the next stop before you

03:40 the next lecture. Well, not next lecture, but next week,

03:44 guess because next lecture, we have coding station Sonia on. And then

03:50 gonna have ah station for we're gonna a part for questions. But you

03:55 ask at any point, as I before, Okay. So Well,

04:02 , as I understand, is you know, document justification secrets label

04:07 an English money, right? And document classification, but you're trying to

04:12 is to say, OK, giving sentence like a very good movie,

04:16 like that Pearson day in the picture trying to say okay, these sentence

04:20 a positive review, right? In of sentiment analysis, basically, what

04:26 modeling is the probability the Parliament's rights ability off same label. Why?

04:32 is, in this case, positive Exxon is to up to the

04:36 right? So basically, given a of talk, it's so essentially you're

04:40 being, you know, sequence off tokens. Toe wears single label,

04:46 ? So that's the human consideration. you know about sequence labeling that you

04:52 been applying these. I'm the Syrian part of the stack, but for

04:57 into the recognition for sure, One of your homework or a thing

05:01 waas, uh, based on So basically what you're morally here is

05:07 Okay, we have the parliament rise . When? When this, uh

05:12 pita orphee or whatever letter appears down the probability letter. It's because it's

05:21 on parameters just to clarify eso. , we're here, Marlene.

05:27 the sequence. Why want why to why end given the inputs X one

05:34 up to a extent, right? something to note here is that end

05:39 both the doubted sequence and the sequence the same, right because essentially,

05:44 goes mapping every word in the input one label in the outright, So

05:51 have the same the same length. important to remember. Why do you

05:57 know these already? So what am talking about? So get then language

06:01 , right? With language. I think you guys go over in

06:05 previous session about l more. It's Run. Um, so essentially here

06:12 trying to say Okay, where were ? Want to know what is the

06:15 unity off the next world during the wars? So in this case,

06:21 the graph that you can say, is the probability of movie given the

06:24 so are very good, right? essentially one. We're mourning here as

06:30 have seen applications for all of right? But when it comes to

06:35 other cases, we cannot mortal with and ideas that we have cover a

06:40 , right? For example, if include land is different. So to

06:44 them, right, As I said in sequence labelling and is the same

06:49 inputs and outputs, right? So can simply, you know, provide

06:54 a now put label for every Did you have anything that's very

06:59 But when? When? When you an input sentence. That is,

07:05 , marking to annoy her sentence. Whose output is not the same.

07:10 you have a problem, right? , you know, have alignments.

07:14 that's the second point, Fred, puts an open. They're not

07:18 Right? Uh, and then the problem is that you can, you

07:22 , simply don't know There, the , you don't know the length of

07:27 out, and that's a total And cases for these are,

07:32 you know, translating languages. If have, for example, a phrase

07:36 English and you wanna have this race towards the Spanish, it might not

07:42 the same number of port or the of the wars are not the same

07:47 . The input sentence, right, expression of of the phrase,

07:52 So some boards might be altered, the order my bees shift or something

07:58 something else, right? So you have appropriate alignment for those, and

08:02 you have other cases, like summarizing three, for instance, You have

08:06 , tired book and you want to up with civilization of these books

08:11 you know, like an article dangers the small passage that describe the important

08:17 , something off the article Or even can have another case in which you're

08:23 with a ball. So whatever question ask, just pick an answer.

08:28 only questions unanswered, but also turning . Like, um, I don't

08:33 , like, hey there. And the boat house replied, in a

08:38 way or something, Not even not replying and answer answering a question.

08:43 , Uh, pedals and pretending. those all of those on that,

08:48 , you know, cover in the s denies that we will discuss,

08:53 , a document classification sequence, label or language point. Right. So

08:58 have to come up with something right? And this is where sequence

09:04 sequence models come into place. and the idea is very, very

09:08 . Right? So since we we don't know, uh, exact

09:13 between the input and the output, at the world level. We don't

09:18 . I start mapping. What we is to project whatever the input is

09:24 late in the space. Andi, is just like instead of using esoteric

09:32 like ratings face. What is right? Like you can think of

09:36 , Layton Better Z as just like they known observed by that was that

09:43 extracted from green, right, that representing the the important aspects of

09:51 And then and then this is an representation that compresses all the information they

09:58 that you're going toe discover with the their part in and out.

10:03 So basically, instead of going directly the include towards the output when you're

10:08 to do is to say, I'm gonna go from the input stores

10:12 Layton space victor. See on then bakers, he is gonna be the

10:19 into the out. Right? So have an intermediate step, so

10:22 you have to mayor components here. is in color, the one on

10:26 left, and basically then color is to mother. These probability over

10:31 So basically saying the probability off Uh, sorry. Off the problem

10:36 off these latent. Better see, the input X one up to

10:43 right? Essentially, they are Everything is is ah, here and

10:49 . And then the role of the , which is a different model.

10:54 that this has a different letter, ? So this is still a and

10:58 this is fee or five. I it's fee. Anyways, the point

11:04 that for decoder, what you're trying monitor is okay. We're going to

11:08 the crop remedial. Why want Why after y m know that this is

11:14 not in right Because M and N different can be different. I

11:20 can be the same as well. you know, it's not tied to

11:23 . Given that we have seen are we are given, you know,

11:28 late in Spacey. So essentially you intermediate space Step here, right.

11:35 this is what allows you to do mapping between different lenses off in putting

11:40 its Well, I have to say , you know is Waas proposing sequence

11:45 sequence Learn with you known efforts the others. You can take the

11:49 for more details, but you it should be so piecing these

11:53 So we're gonna cover more defense. , So so far I just used

12:00 to describe what is included. And is that? Because there any suggestion

12:05 you can amuse yourself if you want say something. And this suggestions for

12:10 and decoders I mean, we have larcenies where we're trying to say the

12:15 of these given that writing, we extra why, I could picture depending

12:21 whatever we're went way we're trying to there. So any suggestions here?

12:29 , I have a question. but so the encoder, Right?

12:35 guessing you training in such a way , um so whatever the outputs,

12:44 you give it to some other right? Not this one. Not

12:47 one you have him. If give some other decoder. That because I

12:51 keep the same input. You're putting encoder. Do you get my question

13:00 , just replicating the regular? Yeah, yeah. What I know

13:06 conclude is decoded. You know, tend them so that the kitchen and

13:12 store that, um you know, example, there's some noise in the

13:19 . Um, then whatever they could couldn't encodes somewhat. A decoder

13:25 um, removed. You know, you think it included, he can

13:29 with the noise and then and a and could get the the input without

13:36 noise. You know, battles between . Yeah. So basically you're referring

13:44 , um, out on corners. is what it's called and idea about

13:49 colors is that you have a name sequence, right? You're compressing it

13:54 an encoder, making it I'm in valuable see as well, but compressed

13:59 at them that he could. The of the recorder is d'Yquem friends and

14:04 the regional input that been color compressed this based on these. See

14:10 right? So, essentially, you , you could say Okay. And

14:15 the point off Giving unequal generating same , Right. But there's a very

14:21 aspect off the circuit, Thinker is you're basically representing in a single the

14:27 compress victor, whatever the input And then out of these Victor,

14:32 trying to decompress information, right? this is like a compression Times

14:39 or instructing the actual features that enable recorder to become president. The So

14:45 how outgoing corners. It's another you know, in our colors,

14:50 can even have variation on lives in that have some ah stochastic gusty

14:58 I think you've come, but s behaviour we can say right That example

15:04 a godsend, this reason to you , not only reconstruct, but you

15:09 have some variations on the resulting out so you can control a little

15:14 The generation some people do that for . Do they out in quarters for

15:19 ? Or some people just, you , train these health and colors to

15:25 up with very good and corners you that we can represent in a very

15:30 betters of see the input sentence. then they just dropped the declare.

15:35 know that the culture I just keep color. And now they can,

15:39 , compress information. Uh huh. to do feature instruction right out of

15:45 input. If at the end, want to represent a put with some

15:49 , right? And then people do step about including so that they can

15:54 these features are very, you very polished. Yeah, that's a

15:59 different topic. Although it's very related we have the same framework, the

16:04 and decoder. And essentially they particularly the same outfit sequence, right,

16:08 valued question. But ideally, your these two monitors together, right,

16:15 otherwise we don't know how toe correct output that the encoder is dominating,

16:21 that See that right. We have have our way to correct his

16:26 So he has to be trained jointly the decoder. All right, so

16:31 seems that, you know, you are getting this in and probability that

16:35 morning on the recorder will be, know, more together with the

16:42 right, Because again, we are to back propagate through time. But

16:46 my original question, which was What you guys think that in Korder and

16:51 color can be is just whatever you . Since basically, if you are

16:58 impetus images you can have CNN's has or just like, a mutilated

17:03 if you if you're you know, have some recent their arguments that satisfy

17:09 who was just 29% in over a ? But anyways, you can put

17:14 you wander and as well with the you can put whatever you want.

17:17 that's why the president, like of 66 Mile, is just a simple

17:22 saying clear and single, both because it can be whatever you

17:26 Okay, All right, let's move the next. Oh, by the

17:35 , um, it's kind of hard me to switch to there the

17:40 So let me see. So if guys can get yourself in, I

17:48 that. Were there any problem benefits other benefits of visas of these models

17:53 that can figure out the different Ample? Yeah aside, I was

18:00 , We have many examples of 626 , right? I was actually showing

18:06 here. You know we can We can use it cord translated

18:12 Or there's an example that I'm gonna later in Does in the successful application

18:18 on which, actually, the input not even thinks it's images, it's

18:23 single image. And based on that you're trying to generate a caption caption

18:30 such images, like describing what's inside . And that's great Cool, because

18:36 even a different morality, right? different source of information, meaning images

18:41 towards text so applications are endless is absolute. Your imagination of where do

18:48 want to apply? Right? So not only about aligning or or just

18:56 than something put in the output, also, you know, whatever it

19:00 you want to come up with, , it's super general framework and powerful

19:06 the same time. So mystery that answers the question. But I would

19:13 that it does. Oh, through morning can be used to do in

19:19 imitation thinking all right seems like especially to translation. That Z is the

19:28 that's behind whatever the right Exactly. , exactly. And that's why people

19:34 it latent space. Because these Z is like the semantic representation you have

19:40 your mind. Like the concept you like project aware that language,

19:46 Because you can at the end, doesn't matter how many languages, you

19:50 , just he'll think of in the way, right? Like the same

19:55 you want to express. Do you your mind And then you put it

19:58 the linguist, right? So it's doing that. And that's why people

20:03 it Layton's is, um all so I keep going. Ah,

20:13 . Okay. This is like a of use a part, right?

20:17 if we look at these models more , we have an example here.

20:22 think we were gonna do the walk the mornings so that you guys have

20:26 better idea of how this works. as I tell them, because they

20:30 be whatever model architecture you want. indeed it in the decoder. It

20:36 be whatever you guys want. And this case, since we're talking about

20:41 , what is the natural? The natural thing to use is our

20:45 Right. So we're gonna assume that is an art in in right

20:49 although you'll feel in future lectures. ordinance that or no, the default

20:56 right now. At least not the one at this point. But that's

21:01 another election. Anyway, let's assume . So OK, so the colors

21:08 still reflecting what we have in the in the previous moment, encoders and

21:12 . So keep that in mind. basically you haven't equally on the

21:16 right, which is this entire set blocks. So, as I said

21:23 , we're trying to more than the of these Victor. See, given

21:27 input X one and X two, to extent. Right? Okay.

21:32 , essentially, what we're doing is with another. AnAnd will say we

21:36 . I right, we process the sequence and generate the heat. Inspect

21:41 human victor. See? Right. essentially how another man works. It

21:47 a hidden hitting a state that has through the steps, right? You

21:52 gotta remember these from day from the stations again and even their language.

21:58 , sister. Right, So I some homework? Yeah, I'm from

22:03 whole world to Right. So I we're only thing Paige. OK,

22:07 why did you have compressed these These ah, information in C.

22:14 mean, they put information, so see. Now we started recording

22:19 which is like, decompressing that right? So now what we do

22:24 to take these two things the see and the start talking. I want

22:30 . Okay, let's hold on and on the start talking. So essentially

22:34 way that the recorder No. When has to start producing tokens, it

22:41 a startled. And then how does model How do we know when the

22:45 is done producing these stockings? with the and talking, we'll see

22:51 . So the first inputs towards the our the human estate because the information

22:56 been compressing it into the sea And then we say today the corner

23:01 okay to start now, right? is the start again, is the

23:05 talking We know that we can provide them to the decoder because we don't

23:10 that the translation or anything about Right. So the money stays

23:15 Uh, says I'm I'm taking the Victor and start cooking, and I

23:20 is gonna be this war. May first world are generate and then the

23:26 it's OK, now that I have one war, usually that's an input

23:31 the next token for the next time . And then out of these,

23:36 gonna produce out well out of these the previous estates, right? Remember

23:41 We keep this, uh, recurrent , right? And then we produce

23:47 next war and so on. Uncle, We reach the end.

23:52 essentially again, we're modeling to probability every output, out talking, even

24:00 Syrian state that we were past basically as well as the humans say that

24:07 having updated right because it's being Bernice tainted. Thanks. Step two

24:12 that Frank and then we just generate previous up. So essentially, based

24:18 the previous generations, we keep Right? All right. So

24:23 so good. Everybody has Yes. . Perfect. Yeah. Thank

24:31 Yeah, the piccolo stops at the . And how does that figure out

24:46 window and yeah, that's it. , so the thing is that whenever

24:55 are training this molars, you have data, right? Well, it's

24:59 essentially in all the cases, but the aldea idea. Listen, are

25:03 You have gold data. That means you have a sentence in one

25:08 For the case of much in that is Mark Toner sentence in the

25:12 language. Right. So when you're you're processing the data, what you

25:16 is that you add these start and tokens towards your target sentence because you

25:24 need them to teach the morning that has to produce an and took

25:28 So the and took it has to produced by the moment, right?

25:32 the start cooking is where to give the, you know, regional output

25:38 right, which is like, just triggered in there the quarter mile essentially

25:44 then the recoveries is supposed to produce and back and talking, but And

25:50 have to have any of Saini. talking has been previewed. Then just

25:55 the trading day creation, right? . Either eight intros the time

25:59 So its place on the data that declutter learns that because you have a

26:05 , older sentences or you're part of sentence is to have these and token

26:11 , right? Is that Is that ? Okay, because you know that

26:27 my clarify a lot more questions. I had this out. So in

26:35 , where you had to start, token. Like if you see that

26:40 , Yeah. So is you hear use and like the world me right

26:51 . Eso take the start looking on director like the living space produced award

26:59 is Dr. Over is the the Is this how we give it to

27:13 reader or the May and the boredom a gold level. Yeah, that's

27:20 good question. Actually. I'm gonna that in the next the next

27:24 because, you know, there are issues that thes architectural is layout

27:31 right? I'm gonna cover that, you know. Now that you're approaching

27:35 point, let me just address it away. Essentially, when you start

27:40 a model, the first predictions will but right, because it is just

27:47 , right? It's random initialization, everything is mostly run right on

27:53 After some time, it gets So the point here is that if

28:03 start producing bad tokens at the very off the sequence, it's very likely

28:09 the rest of the sequin won't be , right? Like it won't corrected

28:16 rest of the translation. Right. just are bad is gonna go

28:21 right? So while training giving the levels and while testing will just with

28:29 tokens with greater or using the start and those we elect Yeah,

28:36 So essentially, right now, right , here. I'm just saying how

28:41 oil sure work. Putting in This is passing his traditions where the

28:47 time step is not ideal, because first training steps are now the other

28:53 good to stop, right? So should use their trunk with labels to

28:59 . I just wanted to come in , you know, this is this

29:03 exactly the same France's then when using count base and gambling, which models

29:09 lingua generation. I explained your your training over exactly what you have

29:15 your training sets. But when we the language models to dinner thing where

29:19 just start with yours. Samos here a start simple. And you acquire

29:24 model, the little issues of first . Then based on that, you're

29:29 a model again. And we should next one and so on. So

29:33 is it was a union thing. model. It's a bottom look at

29:37 previous stop talking. So you the next one and then you

29:40 But it's exactly the same process. polo. It's just here keeping people

29:47 to get to that process. Yeah, really inspiring that. I

29:53 didn't I didn't think about that most , Thank you. Direct. Any

30:02 questions? Okay. All right. I guess we digested. This is

30:14 So far. So any potential problem this morning besides, the wonder was

30:21 . It is. Well, I'll you. I'll let you get some

30:24 them. Okay? I just disclosed , but let's let's make the rest

30:29 them given by you guys. So me just describe the 1st 1 compressing

30:36 long sequences into C Remember that were all of the steps of the sequence

30:43 by one, and then the resulting in ST See is the one that

30:49 passed through the required. So if secrets is super super long, it

30:54 matter. Of course, she is a big size victor. Right?

30:59 the same size all the time. the longer the in particular is,

31:03 more information is see, Victor has carry towards the record towards the

31:09 right. So if you know at point, it won't be enough to

31:14 these, uh ah, fiqh size , right? So how do you

31:21 think we go deal with that? , let's hold the question for

31:25 But you know any other potential problem ideas, I guess if you're using

31:35 nn, you know, it would the problem that, um so you

31:45 , unless he tries to forget things doesn't need and at things in me

31:52 he goes making it able. Teoh the time, but deal with context

32:01 are far away. Something like You know, right, That's a

32:05 point because so far are Burnell. was just having one direction in

32:11 right? Dummy's whatever it was in beginning, we my end up forgetting

32:18 easily and goes paying more attention, it is at the end,

32:21 because that's what you know, And he's supposed to do it

32:25 the start forgetting some things. It's remaining. Some independence is, but

32:30 will be paying more attention to the term memory, of course, but

32:38 is, um, in these your first stresses the entire temple before

32:45 go toe. I mean, first president secrets to get the and and

32:51 told the court, which means that that's the nature you can use a

32:55 directional Justine. Yes, yes. I hope in the assignment many people

33:03 be able to implement that. But , that's another. And iron weight

33:08 these because, you know, with direction, we end with end

33:12 diminishing the original information, but with other direction, you can compensate

33:17 And then the last units things for direction can be contract in ating a

33:21 baker to like balance information from both . Yeah, that's a good a

33:28 point. All right. So, , so you know it isn't,

33:34 we have been discussing is tied to phrase, right? The recorder

33:38 finding the relevant parts from the input using scene because everything has been compressing

33:44 sea on. And then I'm just go ahead them to the wrist.

33:50 , this is the 3rd 1 is I get some pretty was mentioning is

33:56 it's hard to recover when the needs the card tokens are wrong.

34:00 So, since we're producing begins with turn with unthreatening miles because they are

34:06 the process of being trained, thes are bad. And they were giving

34:11 input. Five teens were gonna result even worse translation friends. So we

34:16 these by, you know, as discussed before, just including the output

34:23 . Right for the recorder. Uh, yeah, you guys came

34:29 with term he has. But, know, in essence, this is

34:34 people introduced the attention mechanism, which basically saying ok, instead of compressing

34:41 . Okay, let me just move . The Americans understand, instead of

34:46 everything in to see what we're going do is we're gonna have disability towards

34:52 the hitting outputs, and we're gonna OK, for this time, step

34:56 the decoder. The most important outputs the encoder outputs Are these these friends

35:05 might be the same as for the time that you will be called

35:09 Right. So let me let me put this into the diagram. I

35:13 I have a few. So when , when the following happens of,

35:18 might need Spirit Nation, two different , for example, in the

35:22 we were translated, right? Classy going to be associated with the world

35:29 , and they're not in the same . Step right. So let me

35:32 show you the translations before. So , this glass is over here,

35:40 ? So, you know, do need to pay attention towards the writing

35:45 and distributions along the input sequence, ? And that my no being the

35:49 Victor in this case classes at the time. Is that right? So

35:54 it might appear in the sea. , I mean enough information capture over

35:59 class slogans, but for other it might be a you know,

36:04 case. Um, so then years back. So we use probabilities to

36:13 all these vectors on the in colors the clear step and and then we

36:19 up together all these pictures and OK, this is my contact,

36:22 new see Victor, right? And we try to explode that information.

36:28 the attention steps right now, it look a little bit obscure, but

36:32 me just put in the steps and we're going toe. It's a diagram

36:36 see everything happened, right? So the first step is that

36:42 we get their outputs from being right? When I say the AL

36:45 , I mean all the heat and , right? And then the hearing

36:51 at the time, said that we decoding at the time. So we

36:57 two things. The hearing at the , output served in color and the

37:01 better of critical. So and then define in scoring function, and we

37:07 these two variables in a way you know, it's very much we

37:12 these countries, and then we have query victor and basically square effect,

37:17 is the human factor from the recorder trying to say What? What should

37:21 get more attention in the set off , right? Which is? All

37:25 albums have been colder. And for , we need a scoring. And

37:30 we compare these scores into probabilities and these bankers to assume them all together

37:36 come up with a single victor for entire input right that has been prioritized

37:42 information associated to the square, which the colder heating sick at the

37:47 Said that we're facing so aside said saw about this way the victors,

37:54 we combined this with the here and , That is our quite a

37:58 Okay. How is so far? is, you see too confusing because

38:03 going to play for in the next with items. But, you

38:06 if you have any questions, you're transit right now. I do.

38:13 when you say encoder outwards, you the elector or you mean the

38:21 Plus several presentational being put works. right? Yeah. Let me let

38:27 just move to the next life because have a diagram for that. So

38:31 in color, right? Well, that this is included because of the

38:37 . Okay, So as you can here you have the director, which

38:41 the late in space that was fixed before. And that was the problem

38:46 the regional moment. Right? But we have we're taking also all the

38:51 outputs off the color, right? all the H one h two h

38:56 forage front because we're gonna prioritized some these hidden victors based on our equity

39:05 on the record in time Step that doing right? Is that someone

39:12 Yeah, I would speak sharing but I just wanted to Yeah,

39:18 ? So, essentially, we're using outputs from the included. No,

39:23 this one anymore. We're gonna get late and victor out of the entire

39:28 of pockets. This is our Yes. Oh, that I didn't

39:38 . Okay. Yeah. I you know, you know, he

39:42 not version. Yeah, but people even try to concoct innate the sea

39:48 with all the quite vectors and so . Right? People drive many crazy

39:53 , but in the regular poor Yeah, it's totally dropping, and

39:57 just use their the the hearing out essentially, the Civica is just a

40:04 off the age. Victor's right. , so that's the first. The

40:11 step we get the context records, are all the H one h two

40:15 to H end. Right. So are not taking this Erector anymore.

40:20 coming up with a C later in out of these age pictures. All

40:26 , so that's our context then are . Let's take four times that

40:32 It's getting by getting quarter right, in this case is just is just

40:37 . Remember that this Q. This a Marinin, right? So at

40:41 point, the recorder has just generated hearing state of these times, giving

40:47 world a lot. Right? So this is our query Victor for the

40:54 settle off contents. Right? Which describing age. So this is our

40:58 . This is a record. the base of this credit would

41:02 OK, H two is the most part. So we're gonna prioritize that

41:06 our scoring conscience. Okay, so we define ours core function, and

41:12 we're using on attention mechanism that is Louis Attention, Which is Lawrence?

41:20 which is also known I was moved liquidate attention. Right on. Did

41:25 see the all girl? This is , right? So for the

41:29 um, assays h one, we coordinate that. I was sorry.

41:37 added a plus there, but it's coordination. I should I should have

41:40 a coma, You hear? It's a plastic second confirmation, and then

41:46 can company these two victors, which the question that we have the times

41:50 the decoding that we're facing and then continent that with each of four h

41:58 right, you can see it in diagram here. And then we project

42:03 into another space with W, which parameters of the mothers on. Then

42:10 squash that and projecting toe a single . Right? With better be,

42:16 is Director. This is a And this victor allows us to have

42:20 single sculler on you. Right. a sigh showing that in the picture

42:27 can cut in a these victors for equity. Fourth, the 4th 20

42:33 then we generate issue scores. Those all scholars. And then we now

42:40 . This is coursing toe probabilities. do we used to convert those into

42:44 probabilities. Do you guys know the of peace? Right, Right.

42:53 yet subjects. So the idea is so far we have book applications between

42:59 , right? So these can be numbers and very large numbers. And

43:04 those scores, those are Rose And we cannot wait our betters with

43:10 roads cars because, you know, wouldn't have any concert over the weight

43:14 are adding toward the victims. So what we do is to convert

43:18 used colors into probability space, and do that with subjects out instead of

43:25 used those air ace right? And with the next is toe. Now

43:31 we have all the probabilities associated to time step, we wait the age

43:37 with these probabilities, right? And waiting, each of these beggars,

43:42 do the sum over all the way Victor's and that's it. That is

43:47 single victor. See this thing? is the context, Specter. That's

43:53 people call it. They use the C user, but this is essentially

43:58 our, uh, see later in . The letter Z I'm sure

44:04 Pregnancy, right but the previous human that we were compressing all the

44:10 right? So, as you can it now is the victor can be

44:15 basis in the query, Victor. . Because everything else will be the

44:20 . Modifications and additions and activation whatever. But what is different is

44:26 query, Victor. So the credit Fourth? Well, you know,

44:31 times before with the different Corey Victor the times, the tree or trying

44:36 five. Whatever. Right. So that uses, see, is something

44:41 ? Equity free is going to be based on the quiz. All

44:46 so far is that clear, or just confusing everyone. I have a

44:54 . Yeah, go ahead. So trying to see, you know,

44:59 query, Right? Um, so think that's less? It predicts they

45:08 the foot? Um, yeah, . Yes. Okay. Okay.

45:17 yeah, so So you use a states three. When you get inquiries

45:26 a queue for you use states Okay. Yeah, I know,

45:33 know. Is there a question? essentially here, we're producing the director

45:40 the Heat and Victor que floor. remember, this is not an end

45:46 . So it's essentially thinking the hearing . Okay, The state and they

45:53 at this time. Step right. . So in the music before off

45:59 previous heating state Hidden still. Yeah, I know that. This

46:04 , we said that we drop right, But I couldn't meet here

46:08 where people do is to initialize these zero. So that's something to keep

46:12 mind yet. But as I people try so many combinations of things

46:17 there is no like, you this is the one and true version

46:23 sick, too sick model with people just trying to printings. And

46:27 people might just give the regional seeing over here so they might visualize with

46:33 . They might just, you you can guys even try in the

46:37 with that surface. So this you is generated only with information within the

46:45 , right? It's just basically a . And then when it comes to

46:50 attention part, we used it. ultra strong being color. And that's

46:55 we generate entire thing. Right? we take the query we can coordinate

46:59 each time, step on in color then we, uh, Basseterre to

47:06 scoring function. We generate this rose . We compare those things of

47:12 and then we wait their regional victor's color with these probabilities and add them

47:17 together. So this is essentially a that some of the hearing I'll put

47:23 cutter. And this is our see , right? And now the sea

47:28 is where we used to concoct in with the query. Right. So

47:32 equity in the context Hajric are When we're producing the the final

47:39 there's another Fine. I work about world for these times, right?

47:46 so questions you lend it. Don't big dog you lend doesn't weight that

47:51 laying, right? Yes, it . Oh, both. Both of

47:57 are parameters to be learned. Yes. And then this is what

48:03 can, captain eight. This, , and then fitted for people where

48:08 , which is just fully connected layer then weigh produced this course for the

48:15 likely worth, which in this ideally, will be classic.

48:22 All right. So so far is good or do you guys have any

48:39 ? Values off each and every thing really good. So for every time

48:55 , they'll be our victor off values the probabilities for each and every

49:04 Something that they were better off So for also for the entire six

49:13 will make millions of values which your it up in the last in the

49:20 different. That's why you're using the there. So the so on.

49:24 is the sum commissioner? So what is the conventional? Okay, so

49:34 me see if I thes h victors going to be off, you

49:41 128 dimensions. Right? And these its colors, right? Each of

49:47 Sorry. Spellers. Yes. so these this color is going to

49:51 multiplied, or all the values of victims for all their 128 dimensions.

49:57 if you have C 25 then you're be multiplying syrup on 54 That mentioned

50:04 in the victor one and they miss a tree after 128 and then a

50:11 two is gonna be multiplied by for the nations. The 128 dimensions are

50:17 age too, and so on, ? Yeah. So is This isn't

50:22 scholar, but not a victor. is an this fella. I

50:26 you can put all of them in vector then and then do some fishin

50:31 application using vectors. But just for sake cups of simplification and getting the

50:38 this is a salary and think that just want to bring with Victor.

50:42 is the right away to the big , which is what we're doing

50:46 right? So don't think off next I'm gonna given to black.

50:57 just just get the intuition right. is this color. I want to

51:01 back to our overall dimensions. All , Any other question? Next question

51:15 you go. Yeah, right. yes. Yes. O C.

51:30 , here. So the for the previous You know, the previous

51:40 . You know, you sticks three for the, you know, for

51:45 heating leg. You know, he How could So this city will be

51:49 the heating output for the next Right? Are you talking about the

51:57 evidence? Because here, we're gonna using all the times that think

52:08 Okay? Yeah. And then We use all all the time steps

52:13 we want to get rid off off problem that we have before that,

52:17 see Victor was compressing all the We were forgetting some stuff, but

52:22 here. Next is, I is the next night. Yeah.

52:26 . Yeah, that class. You the thing that's this class.

52:31 yeah. So will it be you know, the eating? You

52:36 , the hidden layer, the previous layer value for the next, Um

52:46 , he does the prediction, Yeah, it will be bused.

52:50 , that record does this thing that the previous tradition as the next okay

52:58 takes the the previous state right to the mixer. But these, you

53:05 , we are. We're using a , Victor. See, that was

53:08 out of day in color output. ? And this is what is used

53:13 generate the final world, not the word. I'm kid saying this the

53:20 for these times. That right? if it might keep going right

53:24 it might generate more words under league that they end. Talking against my

53:30 is handling this Bagram right? There's hour before that Q four.

53:36 Yeah, So the next on the time state would be that would be

53:42 . Yeah, it would be the estate off the off the ottoman.

53:48 it's on l S d m, will be Q four in the sales

53:52 . You know, the short uh, together. Right.

53:55 this is this is two things that passed to an instant, the long

54:00 and the short memory. Okay, it will be like you fall for

54:05 next. Yeah, if it's if a vanilla on and then then is

54:11 gonna be cute for, But it send the list. TM is gonna

54:15 both Q four and see poor, is the cell state. That it's

54:20 , which I'd like to call it short member. Short memory and long

54:25 . Right from these are the two that are going to be passed.

54:30 . And then the applause the include current times that step pimple. The

54:41 is gonna be passed to the next . Yeah, Yeah, there are

54:45 in these times, it is going be passed as an input to the

54:48 time. Okay. Cool. Thank . All right. Yeah, any

54:52 question because you guys are implemented This I don't have a question I get

54:59 is one who can you go to previous one being This lights.

55:04 so basically, for for the encoding , the only thing that changes is

55:13 we're not cooperating. These deceit of presentations for each of your include

55:18 And we're forgetting about the sea ports because the Alfa in the context Specter

55:27 only comes in it by including the opening mechanism, because we need to

55:33 a new contract vector for every time . There were making predictions. Is

55:38 correct? Yeah, exactly. So I should have added a superscript

55:44 these Z see Victor because, in , we are generating these victor as

55:51 times that we generate to Victor's right the decoder is gonna run. Let

55:57 , Let's say think that 10 right? So we're going to be

56:02 10 c vectors for every query that have from the decoder every time.

56:10 is night, Michael minutes, because hiss for me, the little computing

56:14 see the a goat or here with I know why they're there because they're

56:21 the the heating for presentations but the straight to generate human presentations. And

56:30 you start dinner and he sees when go to the league quarter part,

56:34 ? So, yeah, I'm putting way that some here just, you

56:41 , represent. Remember that it's coming in color, but these multiple,

56:46 pontifications and all the attention Maicon's happens on their decoder, right, because

56:52 wanted triggers his way. That Yeah, And it's the same,

57:05 , again if you bean scene over over with learning your training, your

57:10 to do something that you told that away used for something else. So

57:14 trained the encoder to generate Azzi that tossing away. And then since since

57:21 the heat underpants in patients, when say you're touching when you said you're

57:27 away, uh, thank you like is it for them to do

57:37 But they use them for something like, you know, where Zenit's

57:40 way. Just take the useful Okay, so now about scoring

57:53 As you can see, this is little bit RV training, like,

57:57 know, why do we have to innate. Remember these. I have

58:00 take this part. This is a . Should be adding here coma.

58:05 we can captain eighties and we must with his matric. And then we

58:08 tang age that, you know, around. And then we move,

58:15 with these victors, is to get single score, right? I

58:19 this is super arbitrary, right? , do go have come up with

58:23 different way to get a score out that, and that's that's actually

58:29 I mean, people just decided we going to try this right. Instead

58:33 that, they tried in the work because support, they went for

58:39 right? But as you can many other papers will be providing a

58:43 a new version of the scoring But the essential you know, the

58:48 going functions are essentially the first ones those are the type of the ideas

58:54 . And that's the one that began in this paper, you know,

59:01 in translation by jointly learning toe a in translate. You know, that's

59:06 people here, like just of You may have heard about his These

59:12 , too. I don't know how pronounce his name. That this is

59:16 very very thing, this person. , yeah, so, like that

59:21 attention is also known as additive, right. And as you can

59:26 it's a little bit different here. , we have through parameter to main

59:32 as parameters that are w A and have you a s. Well,

59:38 why are we taking here? We're the previous the previous state off the

59:43 or not, the current one, the previous in the in the description

59:48 we discussed before we were taking the state of the decoder, right,

59:53 S Q. 44 times the court this case for the scoring function that

59:59 proposed. They were taking the state the three years times in the

60:05 right? So instead of being cute you could be killed three for administration

60:09 had before and then this is the kid estate. And then the scoring

60:15 is pretty much the same, And then the Yusof Max to normalize

60:21 discourse rose scores, which in this are denoted like but enough about

60:27 And then they used the data they something and remember that we do this

60:33 that some we do the guy attention for every time said in the

60:39 So that's why this is index by . So for the d color

60:44 take one, we have ah, Victor one as well and so

60:49 up to whatever length were generating with color. Okay, so is out

60:55 so far? Much the same thing discuss, report. But yeah,

61:01 free to ask any questions. everything. So something could are they

61:09 in the paper is one of the with sequence Lands Way mentioned that these

61:15 a challenge in in the vanilla sequence sequence mothers right. The longer the

61:22 , the hard therefore for the in to compress information the single see

61:27 right? Because we have a fix off dimensions that we that we can

61:34 it it clear to the sea. . Um And then when we are

61:38 tension, we we don't have that anymore because we can attend to any

61:44 of the centers. We care, ? And this is determined by the

61:50 Victor from the d color that we trying to predict a war for

61:55 So as you can see, look the graph. This is beautiful because

62:00 keeps through the sequence. Plant doesn't how long you think is 60

62:07 It's still going good, right? else? Owners? Other monies are

62:11 the cannery A and an RV sensations that they propose that they they provided

62:19 was pretty good because, as you see, the first boards in this

62:24 between friends, eh? English and ? Uh, pretty much all the

62:29 had a pending the same war. when it comes to these name,

62:36 , European economic area, they come in French is in the opposite

62:43 So, as you can see, it attends at two different wars,

62:48 ? It goes ahead and it takes right to translate sewn, and it

62:56 the open sea direction right instead of these areas. And then everything else

63:02 very much this thing. But this pretty cool, right? Because they

63:06 how to align the words from different without explicitly telling. This should be

63:12 to this one, right? I have a question. Yes,

63:19 missed what the story The left half . What is third? Ah,

63:28 don't remember these to the onus I need to take the paper. That's

63:34 the point. But the point of plot waas the problem they had previously

63:40 validation, right? They were not to keep up with the sequencing.

63:47 right, you get the as you see, we have an end and

63:53 token as well here, right? , yeah, As I said

63:58 we put president dating at the All right. And then, I

64:04 , that's buy that his attention, is also which rattles and as addict

64:09 . But we have other score We have moons with flickering attention.

64:14 the fair wanted conquered person is what use. This is what I have

64:19 have I should have used in the . I will correct. Uh,

64:25 use the contact person. Right? there are others, like, you

64:28 , simple dark truck between the right? Or, you know,

64:34 space. And then, like multiplying victors and stone, right, the

64:38 cut. So what is the You know, why do people choose

64:43 to use dark instead of genital instead conquer. Do you guys have any

64:49 ? Oh, you guys come out some tuition out of that white dot

64:55 said the first move because they're very , right? Why would you choose

65:00 one instead of the next one? , that is living a lie.

65:09 , I think what Dr Dirk is compute there. The, um the

65:20 is similar. Like the similar riveting back there. Exactly. Is the

65:26 similarity without being normalized? Right? , eso basically we have these dot

65:33 , which is what you just That similarity between the vectors, but

65:38 normal normalizing. So it is not the metric like that. The scientific

65:43 . Uh, but remember that when do that, essentially we can come

65:48 with values out of these access, ? If its opposite them, we

65:52 minus 100. They're assuming that the are equal length. You can find

65:58 can get minus 100. And then there are starting out this cereal and

66:04 100 they are Ah, uh, similar. Right again, This is

66:10 normalizing, right? The point here that OK, That makes sense,

66:15 ? Why? Why don't we use scoring function and avoid having parameters out

66:21 that. Why were you at this for instance, You cannot learn it

66:27 ? You need to learn to be to address the difference. Circumstances Unit

66:34 two need to learn the w there's something something very interesting what you

66:39 because you said you need to adjust especially Syrian instances, right?

66:46 replaces the same wars. But that's that's bring heat. But why

66:51 those specific circumstances that you are not learning by generating these age victors?

66:58 to generate those, you have to something as well, right? Because

67:02 is output open Aryan in which also parameters. So what's what's behind

67:09 Maybe for season, every word they a single world though another land or

67:17 like that. Like for one we must be We must have a

67:21 worth three words or at the same . And for that we use the

67:27 of a Those three words are those ? You started just sitting on the

67:33 path in danger, sidetrack a little . But you know, the important

67:40 here is that if you use the product. As I said before,

67:45 are getting ah, similarity to Right? But what about if the

67:50 the betting spaces are from different Right? Remember that in much in

67:57 , you might have a different vocabulary for your declare opposed to the color

68:04 you have English words and then you French words, right? So then

68:09 of space is not the same. you are trying to get similarity with

68:13 probation. You're trying to get similarity on their depression and bury in

68:18 right? But as the 2nd 1 adapt, well adjusted those here

68:23 I like that phrase because it's gonna to say, Okay, we're marking

68:29 languages, right? And we want similarity between this, but in the

68:34 space. But we know that they learning the same street because they belong

68:37 different language. So that's why people use the 2nd 1 allows us to

68:43 different buildings, places and then for concatenation one, this is pretty much

68:50 , is not very much right. very similar to the back down this

68:54 . So what are very crucial differences ? Can someone tell me you go

69:03 one more time said before, Yeah, essentially that. So,

69:09 I said at the very beginning, just tried their their ideas. And

69:13 it works, they publish them. see, it's a little bit arbitrary

69:18 say, OK, I think the sensitive then no presidency. Okay,

69:22 think the current thinks that there's like, you know, uh,

69:27 a single question, single answer for right people just formulate experiment. And

69:33 it works, they they publish it whatever. So, yeah, you

69:39 see these thes scoring functions in this , effective approaches to attention base near

69:45 Metis translations. And you can see is a year later, after Baghdad

69:48 attention, which was in 2014 and , you know, very popular

69:56 So you can you can rely on their approaches. All right, on

70:03 , with these the scoring functions something is very nice is that they they

70:08 able to improve career machine Translation Mullins we're getting mistakes. Basically. Name

70:16 , right? So name, shook, kept the same Should be

70:22 the same from either language. So in this case, you have

70:28 source. Here's source. Here's the . Like the goal. Today I

70:34 labels the actual translations of German. then this is their vision,

70:40 And then they got to ride the entity. But other baselines, they

70:47 these they went enabled they were not to produce the right entity,

70:52 So that's for equal, because it able to think. Okay, this

70:56 something I shouldn't be translated. Somebody out of this, all right.

71:03 you know, there are successful applications these 60 miles in the sense off

71:08 color decoders. As I said we have the same. It's we

71:14 threats, um, features out of units, like, you know,

71:17 regions and so on. And then that convolution of your own, never

71:22 . So he's essentially will be hearing , right? They wonder the strikes

71:27 from from the image. And then passed these towards an instant, and

71:33 l STM, with visitation mechanisms is to say, Okay, I'm gonna

71:37 in these areas of the image to a barrett right. And then flying

71:44 like, you know, more a little bit of background, and

71:48 and the winds and then over. getting a little gate off off,

71:53 off the water in the background and generating of audio border. That's in

72:00 opinion. That's amazing, because you combine. You can even combine moralities

72:07 , meaning from unity Is your generating , right? So it's again like

72:12 said. It's like the semantics of of your thoughts, what you're representing

72:20 between these two and color the corner then then color is just capturing either

72:26 kind of features that you're receiving us input, and then your brain is

72:30 this late in space and then with lawyer, Uh uh. You

72:36 the skills that you have, you generate some other output in the

72:40 so you get something from vision and something from language. Right? So

72:44 that's very cool, In my I actually suggest to read This paper

72:48 people. I am from popular I guess they got roughly every dozen

72:55 since, and then we have all attention of, by the way,

73:00 are we doing the time? I cover time right here way. I'm

73:11 Yeah, two more. This one one morning. That's it. It's

73:16 another yoke off the reputation mechanism that will cover later. So,

73:23 I said before, there are many mechanism metals, but one that has

73:27 has become very trendy lately. Is self attention from the transforming architecture?

73:34 the paper is potentially so, And there's a bunch of people behind

73:39 work. Um, so something you know we have problems with the

73:45 626 months is that we're generating every , said Frank. So so,

73:50 know, remember then color. It's the human state to the next time

73:54 . And so right. So even though you can paralyze a lot

74:00 , you know, computations based on Street mood applications and abuse, and

74:05 on. Just feel cannot paralyzed. know, across time steps right,

74:10 the next time step is depending along output of the previous times.

74:15 So that's that's quite a bottleneck. with this transformer architectures, they enhance

74:21 realization. So they figured more like and more effective training because they are

74:27 to process faster the data and then can process. Before they had answering

74:31 longer. Uh and you know this any sense? They said these calls

74:36 potential in The idea is that they record fission products between every word across

74:43 the sentences. So across the So basically, you have, you

74:48 , in our unis awards they were in the initial the temple, off

74:53 translation between English and Spanish. You the world class with some of in

74:59 distribution of proposed entire sentence, including including its own words. So I

75:06 the end of the class will have probability distribution and all the words we

75:11 have that because not all the wars related to, like a civil war

75:19 not related to all the world's equally . They change because of linguistic properties

75:25 whatever role they have in the So this is idea off, potentially

75:30 in self attention from the officially soldier paper and you know, a little

75:35 of the self attention is he's scaled product attention formulation. Essentially, we

75:43 covered that we have a query and and, you know, throughout scoring

75:48 basically squarely with win, what are most important parts in the context,

75:53 ? We are already learned this parts their sick, Too sick with

75:58 More. But it is in this . Where they do is to has

76:03 parameters. Were they that they you know, makers Ford equating the

76:09 for the keys and the valets, ? And basically squarely, since all

76:16 them are going to be matrix the input they will be,

76:21 attending to every word in the as I said, is a dietitian

76:24 at the end. And essentially, we have these matrix with probably distributions

76:29 the role level, you know, world has probably distribution because it saying

76:34 words attending to these these other this other war is attending to all

76:39 them and so on, Right? bro is had probably distribution. So

76:44 that you have done do wait, use these as you're probably weights,

76:48 then you weigh the value matrices, is which is what's happening here.

76:53 . So here you have you have stock inspired that we saw in the

76:57 Wallace to normalize the probabilities. And they weigh the value of acres,

77:04 ? And yeah, I'll leave it to their because anyways, wrestle will

77:09 this topic in depth. So Yeah, That's pretty much what I

77:14 here. The references for you want read more about these papers? You

77:19 take them out over here, and it. That's all that happened

77:26 Thank you. That was a wonderful to sequence of sequins models. Very

77:33 explain. Um, anywhere those Oh, I shouldn't. Okay,

77:41 stop. Shame. Thank you. , praises. Okay. Thank you

78:06 much. All right, All Thank you. We'll see you Monday

78:15 the next Monday. Him and he's to be a hands on showing.

78:24 , is nature you're here and you up because that's, uh, basically

78:29 we would use for the assignment. ? Stumbles to prepare an assignment,

78:33 less assignment It would be kind of . So make sure that you have

78:38 . You can leave the papers that sure you're here on Monday before the

78:42 where he's going to percent. Thank . Once say help. Yes,

79:03 . 52. This connection. Just show the recording, you can drop

79:08 the from the meaning. It's just I had a happen stopping recording.

79:14 mean, one day, like it's good to start, but the stuff

79:19 like that it according Okay, so a one stop itself on a

-
+