© Distribution of this video is restricted by its owner
00:02 | so, uh, Sabol he wouldn't it from now on. It used |
|
|
00:09 | . You guys here? Can you hear? Okay, uh, let |
|
|
00:19 | just share the screen, All Can you guys see my screen? |
|
|
00:43 | , yes, yes. Okay, I, uh I think the nurse |
|
|
00:50 | lawyer share already this nice. I'm gonna be able to see the chat |
|
|
00:58 | . Yeah, so So, Feel free to amuse yourself. If |
|
|
01:03 | have any question, I'm happy to with that, but I'm not gonna |
|
|
01:07 | able to see it. So amuse if you if you want something. |
|
|
01:12 | right, So let me just move into presentation. All right, |
|
|
01:19 | uh, for those that don't know you are, I don't remember |
|
|
01:25 | My name is with our lamb, of their bitter laugh as you age |
|
|
01:29 | computer science department with her Solorio. in my four years, and I |
|
|
01:34 | do research on energy. Obviously, I'm gonna talk about sequence to sequence |
|
|
01:40 | . This is a bridge side and us here. The many applications is |
|
|
01:45 | . And you know how powerful this is, Actually. Um, so |
|
|
01:50 | get started on that so basically as , if we're gonna have a quick |
|
|
01:55 | of the things that you have already the course Well, a spotless I |
|
|
01:59 | the things that you have cover. then we're gonna motivate why we need |
|
|
02:04 | have an encoder decoders framework and the that are enable based on that. |
|
|
02:10 | then we're gonna have, you some mechanisms that improve over those in |
|
|
02:16 | girlfriend wins, which is basically the Maicon's. There are many formulations of |
|
|
02:22 | , so we're going to cover a of them. You know, every |
|
|
02:27 | much, every day there's a new on a piece of just like |
|
|
02:31 | you will find, like many versions attentions and many improvements, and so |
|
|
02:35 | right, so it keeps moving. we're gonna go for the, |
|
|
02:38 | basic, uh, nutrition and like like, the first contributions on attention |
|
|
02:45 | it seem like the ones that everyone know. And then we have |
|
|
02:51 | We're going to see, like, cool things. The people apply these |
|
|
02:56 | in like, you know, image , for example, Like even an |
|
|
03:01 | , the model is able to compare images features into a caption like, |
|
|
03:06 | know, a bird is flying over sky or something like that. So |
|
|
03:11 | critical, in my opinion. But will see that there are not only |
|
|
03:14 | kind of publications, but those are . Basically, once you master, |
|
|
03:19 | included Nuclear Pringle. Right then, know, just like I said |
|
|
03:25 | there are all their methods of and there's one that is very |
|
|
03:30 | There is close help. Attentions were , you know, just have |
|
|
03:35 | a brief overview on that And that be like the next stop before you |
|
|
03:40 | the next lecture. Well, not next lecture, but next week, |
|
|
03:44 | guess because next lecture, we have coding station Sonia on. And then |
|
|
03:50 | gonna have ah station for we're gonna a part for questions. But you |
|
|
03:55 | ask at any point, as I before, Okay. So Well, |
|
|
04:02 | , as I understand, is you know, document justification secrets label |
|
|
04:07 | an English money, right? And document classification, but you're trying to |
|
|
04:12 | is to say, OK, giving sentence like a very good movie, |
|
|
04:16 | like that Pearson day in the picture trying to say okay, these sentence |
|
|
04:20 | a positive review, right? In of sentiment analysis, basically, what |
|
|
04:26 | modeling is the probability the Parliament's rights ability off same label. Why? |
|
|
04:32 | is, in this case, positive Exxon is to up to the |
|
|
04:36 | right? So basically, given a of talk, it's so essentially you're |
|
|
04:40 | being, you know, sequence off tokens. Toe wears single label, |
|
|
04:46 | ? So that's the human consideration. you know about sequence labeling that you |
|
|
04:52 | been applying these. I'm the Syrian part of the stack, but for |
|
|
04:57 | into the recognition for sure, One of your homework or a thing |
|
|
05:01 | waas, uh, based on So basically what you're morally here is |
|
|
05:07 | Okay, we have the parliament rise . When? When this, uh |
|
|
05:12 | pita orphee or whatever letter appears down the probability letter. It's because it's |
|
|
05:21 | on parameters just to clarify eso. , we're here, Marlene. |
|
|
05:27 | the sequence. Why want why to why end given the inputs X one |
|
|
05:34 | up to a extent, right? something to note here is that end |
|
|
05:39 | both the doubted sequence and the sequence the same, right because essentially, |
|
|
05:44 | goes mapping every word in the input one label in the outright, So |
|
|
05:51 | have the same the same length. important to remember. Why do you |
|
|
05:57 | know these already? So what am talking about? So get then language |
|
|
06:01 | , right? With language. I think you guys go over in |
|
|
06:05 | previous session about l more. It's Run. Um, so essentially here |
|
|
06:12 | trying to say Okay, where were ? Want to know what is the |
|
|
06:15 | unity off the next world during the wars? So in this case, |
|
|
06:21 | the graph that you can say, is the probability of movie given the |
|
|
06:24 | so are very good, right? essentially one. We're mourning here as |
|
|
06:30 | have seen applications for all of right? But when it comes to |
|
|
06:35 | other cases, we cannot mortal with and ideas that we have cover a |
|
|
06:40 | , right? For example, if include land is different. So to |
|
|
06:44 | them, right, As I said in sequence labelling and is the same |
|
|
06:49 | inputs and outputs, right? So can simply, you know, provide |
|
|
06:54 | a now put label for every Did you have anything that's very |
|
|
06:59 | But when? When? When you an input sentence. That is, |
|
|
07:05 | , marking to annoy her sentence. Whose output is not the same. |
|
|
07:10 | you have a problem, right? , you know, have alignments. |
|
|
07:14 | that's the second point, Fred, puts an open. They're not |
|
|
07:18 | Right? Uh, and then the problem is that you can, you |
|
|
07:22 | , simply don't know There, the , you don't know the length of |
|
|
07:27 | out, and that's a total And cases for these are, |
|
|
07:32 | you know, translating languages. If have, for example, a phrase |
|
|
07:36 | English and you wanna have this race towards the Spanish, it might not |
|
|
07:42 | the same number of port or the of the wars are not the same |
|
|
07:47 | . The input sentence, right, expression of of the phrase, |
|
|
07:52 | So some boards might be altered, the order my bees shift or something |
|
|
07:58 | something else, right? So you have appropriate alignment for those, and |
|
|
08:02 | you have other cases, like summarizing three, for instance, You have |
|
|
08:06 | , tired book and you want to up with civilization of these books |
|
|
08:11 | you know, like an article dangers the small passage that describe the important |
|
|
08:17 | , something off the article Or even can have another case in which you're |
|
|
08:23 | with a ball. So whatever question ask, just pick an answer. |
|
|
08:28 | only questions unanswered, but also turning . Like, um, I don't |
|
|
08:33 | , like, hey there. And the boat house replied, in a |
|
|
08:38 | way or something, Not even not replying and answer answering a question. |
|
|
08:43 | , Uh, pedals and pretending. those all of those on that, |
|
|
08:48 | , you know, cover in the s denies that we will discuss, |
|
|
08:53 | , a document classification sequence, label or language point. Right. So |
|
|
08:58 | have to come up with something right? And this is where sequence |
|
|
09:04 | sequence models come into place. and the idea is very, very |
|
|
09:08 | . Right? So since we we don't know, uh, exact |
|
|
09:13 | between the input and the output, at the world level. We don't |
|
|
09:18 | . I start mapping. What we is to project whatever the input is |
|
|
09:24 | late in the space. Andi, is just like instead of using esoteric |
|
|
09:32 | like ratings face. What is right? Like you can think of |
|
|
09:36 | , Layton Better Z as just like they known observed by that was that |
|
|
09:43 | extracted from green, right, that representing the the important aspects of |
|
|
09:51 | And then and then this is an representation that compresses all the information they |
|
|
09:58 | that you're going toe discover with the their part in and out. |
|
|
10:03 | So basically, instead of going directly the include towards the output when you're |
|
|
10:08 | to do is to say, I'm gonna go from the input stores |
|
|
10:12 | Layton space victor. See on then bakers, he is gonna be the |
|
|
10:19 | into the out. Right? So have an intermediate step, so |
|
|
10:22 | you have to mayor components here. is in color, the one on |
|
|
10:26 | left, and basically then color is to mother. These probability over |
|
|
10:31 | So basically saying the probability off Uh, sorry. Off the problem |
|
|
10:36 | off these latent. Better see, the input X one up to |
|
|
10:43 | right? Essentially, they are Everything is is ah, here and |
|
|
10:49 | . And then the role of the , which is a different model. |
|
|
10:54 | that this has a different letter, ? So this is still a and |
|
|
10:58 | this is fee or five. I it's fee. Anyways, the point |
|
|
11:04 | that for decoder, what you're trying monitor is okay. We're going to |
|
|
11:08 | the crop remedial. Why want Why after y m know that this is |
|
|
11:14 | not in right Because M and N different can be different. I |
|
|
11:20 | can be the same as well. you know, it's not tied to |
|
|
11:23 | . Given that we have seen are we are given, you know, |
|
|
11:28 | late in Spacey. So essentially you intermediate space Step here, right. |
|
|
11:35 | this is what allows you to do mapping between different lenses off in putting |
|
|
11:40 | its Well, I have to say , you know is Waas proposing sequence |
|
|
11:45 | sequence Learn with you known efforts the others. You can take the |
|
|
11:49 | for more details, but you it should be so piecing these |
|
|
11:53 | So we're gonna cover more defense. , So so far I just used |
|
|
12:00 | to describe what is included. And is that? Because there any suggestion |
|
|
12:05 | you can amuse yourself if you want say something. And this suggestions for |
|
|
12:10 | and decoders I mean, we have larcenies where we're trying to say the |
|
|
12:15 | of these given that writing, we extra why, I could picture depending |
|
|
12:21 | whatever we're went way we're trying to there. So any suggestions here? |
|
|
12:29 | , I have a question. but so the encoder, Right? |
|
|
12:35 | guessing you training in such a way , um so whatever the outputs, |
|
|
12:44 | you give it to some other right? Not this one. Not |
|
|
12:47 | one you have him. If give some other decoder. That because I |
|
|
12:51 | keep the same input. You're putting encoder. Do you get my question |
|
|
13:00 | , just replicating the regular? Yeah, yeah. What I know |
|
|
13:06 | conclude is decoded. You know, tend them so that the kitchen and |
|
|
13:12 | store that, um you know, example, there's some noise in the |
|
|
13:19 | . Um, then whatever they could couldn't encodes somewhat. A decoder |
|
|
13:25 | um, removed. You know, you think it included, he can |
|
|
13:29 | with the noise and then and a and could get the the input without |
|
|
13:36 | noise. You know, battles between . Yeah. So basically you're referring |
|
|
13:44 | , um, out on corners. is what it's called and idea about |
|
|
13:49 | colors is that you have a name sequence, right? You're compressing it |
|
|
13:54 | an encoder, making it I'm in valuable see as well, but compressed |
|
|
13:59 | at them that he could. The of the recorder is d'Yquem friends and |
|
|
14:04 | the regional input that been color compressed this based on these. See |
|
|
14:10 | right? So, essentially, you , you could say Okay. And |
|
|
14:15 | the point off Giving unequal generating same , Right. But there's a very |
|
|
14:21 | aspect off the circuit, Thinker is you're basically representing in a single the |
|
|
14:27 | compress victor, whatever the input And then out of these Victor, |
|
|
14:32 | trying to decompress information, right? this is like a compression Times |
|
|
14:39 | or instructing the actual features that enable recorder to become president. The So |
|
|
14:45 | how outgoing corners. It's another you know, in our colors, |
|
|
14:50 | can even have variation on lives in that have some ah stochastic gusty |
|
|
14:58 | I think you've come, but s behaviour we can say right That example |
|
|
15:04 | a godsend, this reason to you , not only reconstruct, but you |
|
|
15:09 | have some variations on the resulting out so you can control a little |
|
|
15:14 | The generation some people do that for . Do they out in quarters for |
|
|
15:19 | ? Or some people just, you , train these health and colors to |
|
|
15:25 | up with very good and corners you that we can represent in a very |
|
|
15:30 | betters of see the input sentence. then they just dropped the declare. |
|
|
15:35 | know that the culture I just keep color. And now they can, |
|
|
15:39 | , compress information. Uh huh. to do feature instruction right out of |
|
|
15:45 | input. If at the end, want to represent a put with some |
|
|
15:49 | , right? And then people do step about including so that they can |
|
|
15:54 | these features are very, you very polished. Yeah, that's a |
|
|
15:59 | different topic. Although it's very related we have the same framework, the |
|
|
16:04 | and decoder. And essentially they particularly the same outfit sequence, right, |
|
|
16:08 | valued question. But ideally, your these two monitors together, right, |
|
|
16:15 | otherwise we don't know how toe correct output that the encoder is dominating, |
|
|
16:21 | that See that right. We have have our way to correct his |
|
|
16:26 | So he has to be trained jointly the decoder. All right, so |
|
|
16:31 | seems that, you know, you are getting this in and probability that |
|
|
16:35 | morning on the recorder will be, know, more together with the |
|
|
16:42 | right, Because again, we are to back propagate through time. But |
|
|
16:46 | my original question, which was What you guys think that in Korder and |
|
|
16:51 | color can be is just whatever you . Since basically, if you are |
|
|
16:58 | impetus images you can have CNN's has or just like, a mutilated |
|
|
17:03 | if you if you're you know, have some recent their arguments that satisfy |
|
|
17:09 | who was just 29% in over a ? But anyways, you can put |
|
|
17:14 | you wander and as well with the you can put whatever you want. |
|
|
17:17 | that's why the president, like of 66 Mile, is just a simple |
|
|
17:22 | saying clear and single, both because it can be whatever you |
|
|
17:26 | Okay, All right, let's move the next. Oh, by the |
|
|
17:35 | , um, it's kind of hard me to switch to there the |
|
|
17:40 | So let me see. So if guys can get yourself in, I |
|
|
17:48 | that. Were there any problem benefits other benefits of visas of these models |
|
|
17:53 | that can figure out the different Ample? Yeah aside, I was |
|
|
18:00 | , We have many examples of 626 , right? I was actually showing |
|
|
18:06 | here. You know we can We can use it cord translated |
|
|
18:12 | Or there's an example that I'm gonna later in Does in the successful application |
|
|
18:18 | on which, actually, the input not even thinks it's images, it's |
|
|
18:23 | single image. And based on that you're trying to generate a caption caption |
|
|
18:30 | such images, like describing what's inside . And that's great Cool, because |
|
|
18:36 | even a different morality, right? different source of information, meaning images |
|
|
18:41 | towards text so applications are endless is absolute. Your imagination of where do |
|
|
18:48 | want to apply? Right? So not only about aligning or or just |
|
|
18:56 | than something put in the output, also, you know, whatever it |
|
|
19:00 | you want to come up with, , it's super general framework and powerful |
|
|
19:06 | the same time. So mystery that answers the question. But I would |
|
|
19:13 | that it does. Oh, through morning can be used to do in |
|
|
19:19 | imitation thinking all right seems like especially to translation. That Z is the |
|
|
19:28 | that's behind whatever the right Exactly. , exactly. And that's why people |
|
|
19:34 | it latent space. Because these Z is like the semantic representation you have |
|
|
19:40 | your mind. Like the concept you like project aware that language, |
|
|
19:46 | Because you can at the end, doesn't matter how many languages, you |
|
|
19:50 | , just he'll think of in the way, right? Like the same |
|
|
19:55 | you want to express. Do you your mind And then you put it |
|
|
19:58 | the linguist, right? So it's doing that. And that's why people |
|
|
20:03 | it Layton's is, um all so I keep going. Ah, |
|
|
20:13 | . Okay. This is like a of use a part, right? |
|
|
20:17 | if we look at these models more , we have an example here. |
|
|
20:22 | think we were gonna do the walk the mornings so that you guys have |
|
|
20:26 | better idea of how this works. as I tell them, because they |
|
|
20:30 | be whatever model architecture you want. indeed it in the decoder. It |
|
|
20:36 | be whatever you guys want. And this case, since we're talking about |
|
|
20:41 | , what is the natural? The natural thing to use is our |
|
|
20:45 | Right. So we're gonna assume that is an art in in right |
|
|
20:49 | although you'll feel in future lectures. ordinance that or no, the default |
|
|
20:56 | right now. At least not the one at this point. But that's |
|
|
21:01 | another election. Anyway, let's assume . So OK, so the colors |
|
|
21:08 | still reflecting what we have in the in the previous moment, encoders and |
|
|
21:12 | . So keep that in mind. basically you haven't equally on the |
|
|
21:16 | right, which is this entire set blocks. So, as I said |
|
|
21:23 | , we're trying to more than the of these Victor. See, given |
|
|
21:27 | input X one and X two, to extent. Right? Okay. |
|
|
21:32 | , essentially, what we're doing is with another. AnAnd will say we |
|
|
21:36 | . I right, we process the sequence and generate the heat. Inspect |
|
|
21:41 | human victor. See? Right. essentially how another man works. It |
|
|
21:47 | a hidden hitting a state that has through the steps, right? You |
|
|
21:52 | gotta remember these from day from the stations again and even their language. |
|
|
21:58 | , sister. Right, So I some homework? Yeah, I'm from |
|
|
22:03 | whole world to Right. So I we're only thing Paige. OK, |
|
|
22:07 | why did you have compressed these These ah, information in C. |
|
|
22:14 | mean, they put information, so see. Now we started recording |
|
|
22:19 | which is like, decompressing that right? So now what we do |
|
|
22:24 | to take these two things the see and the start talking. I want |
|
|
22:30 | . Okay, let's hold on and on the start talking. So essentially |
|
|
22:34 | way that the recorder No. When has to start producing tokens, it |
|
|
22:41 | a startled. And then how does model How do we know when the |
|
|
22:45 | is done producing these stockings? with the and talking, we'll see |
|
|
22:51 | . So the first inputs towards the our the human estate because the information |
|
|
22:56 | been compressing it into the sea And then we say today the corner |
|
|
23:01 | okay to start now, right? is the start again, is the |
|
|
23:05 | talking We know that we can provide them to the decoder because we don't |
|
|
23:10 | that the translation or anything about Right. So the money stays |
|
|
23:15 | Uh, says I'm I'm taking the Victor and start cooking, and I |
|
|
23:20 | is gonna be this war. May first world are generate and then the |
|
|
23:26 | it's OK, now that I have one war, usually that's an input |
|
|
23:31 | the next token for the next time . And then out of these, |
|
|
23:36 | gonna produce out well out of these the previous estates, right? Remember |
|
|
23:41 | We keep this, uh, recurrent , right? And then we produce |
|
|
23:47 | next war and so on. Uncle, We reach the end. |
|
|
23:52 | essentially again, we're modeling to probability every output, out talking, even |
|
|
24:00 | Syrian state that we were past basically as well as the humans say that |
|
|
24:07 | having updated right because it's being Bernice tainted. Thanks. Step two |
|
|
24:12 | that Frank and then we just generate previous up. So essentially, based |
|
|
24:18 | the previous generations, we keep Right? All right. So |
|
|
24:23 | so good. Everybody has Yes. . Perfect. Yeah. Thank |
|
|
24:31 | Yeah, the piccolo stops at the . And how does that figure out |
|
|
24:46 | window and yeah, that's it. , so the thing is that whenever |
|
|
24:55 | are training this molars, you have data, right? Well, it's |
|
|
24:59 | essentially in all the cases, but the aldea idea. Listen, are |
|
|
25:03 | You have gold data. That means you have a sentence in one |
|
|
25:08 | For the case of much in that is Mark Toner sentence in the |
|
|
25:12 | language. Right. So when you're you're processing the data, what you |
|
|
25:16 | is that you add these start and tokens towards your target sentence because you |
|
|
25:24 | need them to teach the morning that has to produce an and took |
|
|
25:28 | So the and took it has to produced by the moment, right? |
|
|
25:32 | the start cooking is where to give the, you know, regional output |
|
|
25:38 | right, which is like, just triggered in there the quarter mile essentially |
|
|
25:44 | then the recoveries is supposed to produce and back and talking, but And |
|
|
25:50 | have to have any of Saini. talking has been previewed. Then just |
|
|
25:55 | the trading day creation, right? . Either eight intros the time |
|
|
25:59 | So its place on the data that declutter learns that because you have a |
|
|
26:05 | , older sentences or you're part of sentence is to have these and token |
|
|
26:11 | , right? Is that Is that ? Okay, because you know that |
|
|
26:27 | my clarify a lot more questions. I had this out. So in |
|
|
26:35 | , where you had to start, token. Like if you see that |
|
|
26:40 | , Yeah. So is you hear use and like the world me right |
|
|
26:51 | . Eso take the start looking on director like the living space produced award |
|
|
26:59 | is Dr. Over is the the Is this how we give it to |
|
|
27:13 | reader or the May and the boredom a gold level. Yeah, that's |
|
|
27:20 | good question. Actually. I'm gonna that in the next the next |
|
|
27:24 | because, you know, there are issues that thes architectural is layout |
|
|
27:31 | right? I'm gonna cover that, you know. Now that you're approaching |
|
|
27:35 | point, let me just address it away. Essentially, when you start |
|
|
27:40 | a model, the first predictions will but right, because it is just |
|
|
27:47 | , right? It's random initialization, everything is mostly run right on |
|
|
27:53 | After some time, it gets So the point here is that if |
|
|
28:03 | start producing bad tokens at the very off the sequence, it's very likely |
|
|
28:09 | the rest of the sequin won't be , right? Like it won't corrected |
|
|
28:16 | rest of the translation. Right. just are bad is gonna go |
|
|
28:21 | right? So while training giving the levels and while testing will just with |
|
|
28:29 | tokens with greater or using the start and those we elect Yeah, |
|
|
28:36 | So essentially, right now, right , here. I'm just saying how |
|
|
28:41 | oil sure work. Putting in This is passing his traditions where the |
|
|
28:47 | time step is not ideal, because first training steps are now the other |
|
|
28:53 | good to stop, right? So should use their trunk with labels to |
|
|
28:59 | . I just wanted to come in , you know, this is this |
|
|
29:03 | exactly the same France's then when using count base and gambling, which models |
|
|
29:09 | lingua generation. I explained your your training over exactly what you have |
|
|
29:15 | your training sets. But when we the language models to dinner thing where |
|
|
29:19 | just start with yours. Samos here a start simple. And you acquire |
|
|
29:24 | model, the little issues of first . Then based on that, you're |
|
|
29:29 | a model again. And we should next one and so on. So |
|
|
29:33 | is it was a union thing. model. It's a bottom look at |
|
|
29:37 | previous stop talking. So you the next one and then you |
|
|
29:40 | But it's exactly the same process. polo. It's just here keeping people |
|
|
29:47 | to get to that process. Yeah, really inspiring that. I |
|
|
29:53 | didn't I didn't think about that most , Thank you. Direct. Any |
|
|
30:02 | questions? Okay. All right. I guess we digested. This is |
|
|
30:14 | So far. So any potential problem this morning besides, the wonder was |
|
|
30:21 | . It is. Well, I'll you. I'll let you get some |
|
|
30:24 | them. Okay? I just disclosed , but let's let's make the rest |
|
|
30:29 | them given by you guys. So me just describe the 1st 1 compressing |
|
|
30:36 | long sequences into C Remember that were all of the steps of the sequence |
|
|
30:43 | by one, and then the resulting in ST See is the one that |
|
|
30:49 | passed through the required. So if secrets is super super long, it |
|
|
30:54 | matter. Of course, she is a big size victor. Right? |
|
|
30:59 | the same size all the time. the longer the in particular is, |
|
|
31:03 | more information is see, Victor has carry towards the record towards the |
|
|
31:09 | right. So if you know at point, it won't be enough to |
|
|
31:14 | these, uh ah, fiqh size , right? So how do you |
|
|
31:21 | think we go deal with that? , let's hold the question for |
|
|
31:25 | But you know any other potential problem ideas, I guess if you're using |
|
|
31:35 | nn, you know, it would the problem that, um so you |
|
|
31:45 | , unless he tries to forget things doesn't need and at things in me |
|
|
31:52 | he goes making it able. Teoh the time, but deal with context |
|
|
32:01 | are far away. Something like You know, right, That's a |
|
|
32:05 | point because so far are Burnell. was just having one direction in |
|
|
32:11 | right? Dummy's whatever it was in beginning, we my end up forgetting |
|
|
32:18 | easily and goes paying more attention, it is at the end, |
|
|
32:21 | because that's what you know, And he's supposed to do it |
|
|
32:25 | the start forgetting some things. It's remaining. Some independence is, but |
|
|
32:30 | will be paying more attention to the term memory, of course, but |
|
|
32:38 | is, um, in these your first stresses the entire temple before |
|
|
32:45 | go toe. I mean, first president secrets to get the and and |
|
|
32:51 | told the court, which means that that's the nature you can use a |
|
|
32:55 | directional Justine. Yes, yes. I hope in the assignment many people |
|
|
33:03 | be able to implement that. But , that's another. And iron weight |
|
|
33:08 | these because, you know, with direction, we end with end |
|
|
33:12 | diminishing the original information, but with other direction, you can compensate |
|
|
33:17 | And then the last units things for direction can be contract in ating a |
|
|
33:21 | baker to like balance information from both . Yeah, that's a good a |
|
|
33:28 | point. All right. So, , so you know it isn't, |
|
|
33:34 | we have been discussing is tied to phrase, right? The recorder |
|
|
33:38 | finding the relevant parts from the input using scene because everything has been compressing |
|
|
33:44 | sea on. And then I'm just go ahead them to the wrist. |
|
|
33:50 | , this is the 3rd 1 is I get some pretty was mentioning is |
|
|
33:56 | it's hard to recover when the needs the card tokens are wrong. |
|
|
34:00 | So, since we're producing begins with turn with unthreatening miles because they are |
|
|
34:06 | the process of being trained, thes are bad. And they were giving |
|
|
34:11 | input. Five teens were gonna result even worse translation friends. So we |
|
|
34:16 | these by, you know, as discussed before, just including the output |
|
|
34:23 | . Right for the recorder. Uh, yeah, you guys came |
|
|
34:29 | with term he has. But, know, in essence, this is |
|
|
34:34 | people introduced the attention mechanism, which basically saying ok, instead of compressing |
|
|
34:41 | . Okay, let me just move . The Americans understand, instead of |
|
|
34:46 | everything in to see what we're going do is we're gonna have disability towards |
|
|
34:52 | the hitting outputs, and we're gonna OK, for this time, step |
|
|
34:56 | the decoder. The most important outputs the encoder outputs Are these these friends |
|
|
35:05 | might be the same as for the time that you will be called |
|
|
35:09 | Right. So let me let me put this into the diagram. I |
|
|
35:13 | I have a few. So when , when the following happens of, |
|
|
35:18 | might need Spirit Nation, two different , for example, in the |
|
|
35:22 | we were translated, right? Classy going to be associated with the world |
|
|
35:29 | , and they're not in the same . Step right. So let me |
|
|
35:32 | show you the translations before. So , this glass is over here, |
|
|
35:40 | ? So, you know, do need to pay attention towards the writing |
|
|
35:45 | and distributions along the input sequence, ? And that my no being the |
|
|
35:49 | Victor in this case classes at the time. Is that right? So |
|
|
35:54 | it might appear in the sea. , I mean enough information capture over |
|
|
35:59 | class slogans, but for other it might be a you know, |
|
|
36:04 | case. Um, so then years back. So we use probabilities to |
|
|
36:13 | all these vectors on the in colors the clear step and and then we |
|
|
36:19 | up together all these pictures and OK, this is my contact, |
|
|
36:22 | new see Victor, right? And we try to explode that information. |
|
|
36:28 | the attention steps right now, it look a little bit obscure, but |
|
|
36:32 | me just put in the steps and we're going toe. It's a diagram |
|
|
36:36 | see everything happened, right? So the first step is that |
|
|
36:42 | we get their outputs from being right? When I say the AL |
|
|
36:45 | , I mean all the heat and , right? And then the hearing |
|
|
36:51 | at the time, said that we decoding at the time. So we |
|
|
36:57 | two things. The hearing at the , output served in color and the |
|
|
37:01 | better of critical. So and then define in scoring function, and we |
|
|
37:07 | these two variables in a way you know, it's very much we |
|
|
37:12 | these countries, and then we have query victor and basically square effect, |
|
|
37:17 | is the human factor from the recorder trying to say What? What should |
|
|
37:21 | get more attention in the set off , right? Which is? All |
|
|
37:25 | albums have been colder. And for , we need a scoring. And |
|
|
37:30 | we compare these scores into probabilities and these bankers to assume them all together |
|
|
37:36 | come up with a single victor for entire input right that has been prioritized |
|
|
37:42 | information associated to the square, which the colder heating sick at the |
|
|
37:47 | Said that we're facing so aside said saw about this way the victors, |
|
|
37:54 | we combined this with the here and , That is our quite a |
|
|
37:58 | Okay. How is so far? is, you see too confusing because |
|
|
38:03 | going to play for in the next with items. But, you |
|
|
38:06 | if you have any questions, you're transit right now. I do. |
|
|
38:13 | when you say encoder outwards, you the elector or you mean the |
|
|
38:21 | Plus several presentational being put works. right? Yeah. Let me let |
|
|
38:27 | just move to the next life because have a diagram for that. So |
|
|
38:31 | in color, right? Well, that this is included because of the |
|
|
38:37 | . Okay, So as you can here you have the director, which |
|
|
38:41 | the late in space that was fixed before. And that was the problem |
|
|
38:46 | the regional moment. Right? But we have we're taking also all the |
|
|
38:51 | outputs off the color, right? all the H one h two h |
|
|
38:56 | forage front because we're gonna prioritized some these hidden victors based on our equity |
|
|
39:05 | on the record in time Step that doing right? Is that someone |
|
|
39:12 | Yeah, I would speak sharing but I just wanted to Yeah, |
|
|
39:18 | ? So, essentially, we're using outputs from the included. No, |
|
|
39:23 | this one anymore. We're gonna get late and victor out of the entire |
|
|
39:28 | of pockets. This is our Yes. Oh, that I didn't |
|
|
39:38 | . Okay. Yeah. I you know, you know, he |
|
|
39:42 | not version. Yeah, but people even try to concoct innate the sea |
|
|
39:48 | with all the quite vectors and so . Right? People drive many crazy |
|
|
39:53 | , but in the regular poor Yeah, it's totally dropping, and |
|
|
39:57 | just use their the the hearing out essentially, the Civica is just a |
|
|
40:04 | off the age. Victor's right. , so that's the first. The |
|
|
40:11 | step we get the context records, are all the H one h two |
|
|
40:15 | to H end. Right. So are not taking this Erector anymore. |
|
|
40:20 | coming up with a C later in out of these age pictures. All |
|
|
40:26 | , so that's our context then are . Let's take four times that |
|
|
40:32 | It's getting by getting quarter right, in this case is just is just |
|
|
40:37 | . Remember that this Q. This a Marinin, right? So at |
|
|
40:41 | point, the recorder has just generated hearing state of these times, giving |
|
|
40:47 | world a lot. Right? So this is our query Victor for the |
|
|
40:54 | settle off contents. Right? Which describing age. So this is our |
|
|
40:58 | . This is a record. the base of this credit would |
|
|
41:02 | OK, H two is the most part. So we're gonna prioritize that |
|
|
41:06 | our scoring conscience. Okay, so we define ours core function, and |
|
|
41:12 | we're using on attention mechanism that is Louis Attention, Which is Lawrence? |
|
|
41:20 | which is also known I was moved liquidate attention. Right on. Did |
|
|
41:25 | see the all girl? This is , right? So for the |
|
|
41:29 | um, assays h one, we coordinate that. I was sorry. |
|
|
41:37 | added a plus there, but it's coordination. I should I should have |
|
|
41:40 | a coma, You hear? It's a plastic second confirmation, and then |
|
|
41:46 | can company these two victors, which the question that we have the times |
|
|
41:50 | the decoding that we're facing and then continent that with each of four h |
|
|
41:58 | right, you can see it in diagram here. And then we project |
|
|
42:03 | into another space with W, which parameters of the mothers on. Then |
|
|
42:10 | squash that and projecting toe a single . Right? With better be, |
|
|
42:16 | is Director. This is a And this victor allows us to have |
|
|
42:20 | single sculler on you. Right. a sigh showing that in the picture |
|
|
42:27 | can cut in a these victors for equity. Fourth, the 4th 20 |
|
|
42:33 | then we generate issue scores. Those all scholars. And then we now |
|
|
42:40 | . This is coursing toe probabilities. do we used to convert those into |
|
|
42:44 | probabilities. Do you guys know the of peace? Right, Right. |
|
|
42:53 | yet subjects. So the idea is so far we have book applications between |
|
|
42:59 | , right? So these can be numbers and very large numbers. And |
|
|
43:04 | those scores, those are Rose And we cannot wait our betters with |
|
|
43:10 | roads cars because, you know, wouldn't have any concert over the weight |
|
|
43:14 | are adding toward the victims. So what we do is to convert |
|
|
43:18 | used colors into probability space, and do that with subjects out instead of |
|
|
43:25 | used those air ace right? And with the next is toe. Now |
|
|
43:31 | we have all the probabilities associated to time step, we wait the age |
|
|
43:37 | with these probabilities, right? And waiting, each of these beggars, |
|
|
43:42 | do the sum over all the way Victor's and that's it. That is |
|
|
43:47 | single victor. See this thing? is the context, Specter. That's |
|
|
43:53 | people call it. They use the C user, but this is essentially |
|
|
43:58 | our, uh, see later in . The letter Z I'm sure |
|
|
44:04 | Pregnancy, right but the previous human that we were compressing all the |
|
|
44:10 | right? So, as you can it now is the victor can be |
|
|
44:15 | basis in the query, Victor. . Because everything else will be the |
|
|
44:20 | . Modifications and additions and activation whatever. But what is different is |
|
|
44:26 | query, Victor. So the credit Fourth? Well, you know, |
|
|
44:31 | times before with the different Corey Victor the times, the tree or trying |
|
|
44:36 | five. Whatever. Right. So that uses, see, is something |
|
|
44:41 | ? Equity free is going to be based on the quiz. All |
|
|
44:46 | so far is that clear, or just confusing everyone. I have a |
|
|
44:54 | . Yeah, go ahead. So trying to see, you know, |
|
|
44:59 | query, Right? Um, so think that's less? It predicts they |
|
|
45:08 | the foot? Um, yeah, . Yes. Okay. Okay. |
|
|
45:17 | yeah, so So you use a states three. When you get inquiries |
|
|
45:26 | a queue for you use states Okay. Yeah, I know, |
|
|
45:33 | know. Is there a question? essentially here, we're producing the director |
|
|
45:40 | the Heat and Victor que floor. remember, this is not an end |
|
|
45:46 | . So it's essentially thinking the hearing . Okay, The state and they |
|
|
45:53 | at this time. Step right. . So in the music before off |
|
|
45:59 | previous heating state Hidden still. Yeah, I know that. This |
|
|
46:04 | , we said that we drop right, But I couldn't meet here |
|
|
46:08 | where people do is to initialize these zero. So that's something to keep |
|
|
46:12 | mind yet. But as I people try so many combinations of things |
|
|
46:17 | there is no like, you this is the one and true version |
|
|
46:23 | sick, too sick model with people just trying to printings. And |
|
|
46:27 | people might just give the regional seeing over here so they might visualize with |
|
|
46:33 | . They might just, you you can guys even try in the |
|
|
46:37 | with that surface. So this you is generated only with information within the |
|
|
46:45 | , right? It's just basically a . And then when it comes to |
|
|
46:50 | attention part, we used it. ultra strong being color. And that's |
|
|
46:55 | we generate entire thing. Right? we take the query we can coordinate |
|
|
46:59 | each time, step on in color then we, uh, Basseterre to |
|
|
47:06 | scoring function. We generate this rose . We compare those things of |
|
|
47:12 | and then we wait their regional victor's color with these probabilities and add them |
|
|
47:17 | together. So this is essentially a that some of the hearing I'll put |
|
|
47:23 | cutter. And this is our see , right? And now the sea |
|
|
47:28 | is where we used to concoct in with the query. Right. So |
|
|
47:32 | equity in the context Hajric are When we're producing the the final |
|
|
47:39 | there's another Fine. I work about world for these times, right? |
|
|
47:46 | so questions you lend it. Don't big dog you lend doesn't weight that |
|
|
47:51 | laying, right? Yes, it . Oh, both. Both of |
|
|
47:57 | are parameters to be learned. Yes. And then this is what |
|
|
48:03 | can, captain eight. This, , and then fitted for people where |
|
|
48:08 | , which is just fully connected layer then weigh produced this course for the |
|
|
48:15 | likely worth, which in this ideally, will be classic. |
|
|
48:22 | All right. So so far is good or do you guys have any |
|
|
48:39 | ? Values off each and every thing really good. So for every time |
|
|
48:55 | , they'll be our victor off values the probabilities for each and every |
|
|
49:04 | Something that they were better off So for also for the entire six |
|
|
49:13 | will make millions of values which your it up in the last in the |
|
|
49:20 | different. That's why you're using the there. So the so on. |
|
|
49:24 | is the sum commissioner? So what is the conventional? Okay, so |
|
|
49:34 | me see if I thes h victors going to be off, you |
|
|
49:41 | 128 dimensions. Right? And these its colors, right? Each of |
|
|
49:47 | Sorry. Spellers. Yes. so these this color is going to |
|
|
49:51 | multiplied, or all the values of victims for all their 128 dimensions. |
|
|
49:57 | if you have C 25 then you're be multiplying syrup on 54 That mentioned |
|
|
50:04 | in the victor one and they miss a tree after 128 and then a |
|
|
50:11 | two is gonna be multiplied by for the nations. The 128 dimensions are |
|
|
50:17 | age too, and so on, ? Yeah. So is This isn't |
|
|
50:22 | scholar, but not a victor. is an this fella. I |
|
|
50:26 | you can put all of them in vector then and then do some fishin |
|
|
50:31 | application using vectors. But just for sake cups of simplification and getting the |
|
|
50:38 | this is a salary and think that just want to bring with Victor. |
|
|
50:42 | is the right away to the big , which is what we're doing |
|
|
50:46 | right? So don't think off next I'm gonna given to black. |
|
|
50:57 | just just get the intuition right. is this color. I want to |
|
|
51:01 | back to our overall dimensions. All , Any other question? Next question |
|
|
51:15 | you go. Yeah, right. yes. Yes. O C. |
|
|
51:30 | , here. So the for the previous You know, the previous |
|
|
51:40 | . You know, you sticks three for the, you know, for |
|
|
51:45 | heating leg. You know, he How could So this city will be |
|
|
51:49 | the heating output for the next Right? Are you talking about the |
|
|
51:57 | evidence? Because here, we're gonna using all the times that think |
|
|
52:08 | Okay? Yeah. And then We use all all the time steps |
|
|
52:13 | we want to get rid off off problem that we have before that, |
|
|
52:17 | see Victor was compressing all the We were forgetting some stuff, but |
|
|
52:22 | here. Next is, I is the next night. Yeah. |
|
|
52:26 | . Yeah, that class. You the thing that's this class. |
|
|
52:31 | yeah. So will it be you know, the eating? You |
|
|
52:36 | , the hidden layer, the previous layer value for the next, Um |
|
|
52:46 | , he does the prediction, Yeah, it will be bused. |
|
|
52:50 | , that record does this thing that the previous tradition as the next okay |
|
|
52:58 | takes the the previous state right to the mixer. But these, you |
|
|
53:05 | , we are. We're using a , Victor. See, that was |
|
|
53:08 | out of day in color output. ? And this is what is used |
|
|
53:13 | generate the final world, not the word. I'm kid saying this the |
|
|
53:20 | for these times. That right? if it might keep going right |
|
|
53:24 | it might generate more words under league that they end. Talking against my |
|
|
53:30 | is handling this Bagram right? There's hour before that Q four. |
|
|
53:36 | Yeah, So the next on the time state would be that would be |
|
|
53:42 | . Yeah, it would be the estate off the off the ottoman. |
|
|
53:48 | it's on l S d m, will be Q four in the sales |
|
|
53:52 | . You know, the short uh, together. Right. |
|
|
53:55 | this is this is two things that passed to an instant, the long |
|
|
54:00 | and the short memory. Okay, it will be like you fall for |
|
|
54:05 | next. Yeah, if it's if a vanilla on and then then is |
|
|
54:11 | gonna be cute for, But it send the list. TM is gonna |
|
|
54:15 | both Q four and see poor, is the cell state. That it's |
|
|
54:20 | , which I'd like to call it short member. Short memory and long |
|
|
54:25 | . Right from these are the two that are going to be passed. |
|
|
54:30 | . And then the applause the include current times that step pimple. The |
|
|
54:41 | is gonna be passed to the next . Yeah, Yeah, there are |
|
|
54:45 | in these times, it is going be passed as an input to the |
|
|
54:48 | time. Okay. Cool. Thank . All right. Yeah, any |
|
|
54:52 | question because you guys are implemented This I don't have a question I get |
|
|
54:59 | is one who can you go to previous one being This lights. |
|
|
55:04 | so basically, for for the encoding , the only thing that changes is |
|
|
55:13 | we're not cooperating. These deceit of presentations for each of your include |
|
|
55:18 | And we're forgetting about the sea ports because the Alfa in the context Specter |
|
|
55:27 | only comes in it by including the opening mechanism, because we need to |
|
|
55:33 | a new contract vector for every time . There were making predictions. Is |
|
|
55:38 | correct? Yeah, exactly. So I should have added a superscript |
|
|
55:44 | these Z see Victor because, in , we are generating these victor as |
|
|
55:51 | times that we generate to Victor's right the decoder is gonna run. Let |
|
|
55:57 | , Let's say think that 10 right? So we're going to be |
|
|
56:02 | 10 c vectors for every query that have from the decoder every time. |
|
|
56:10 | is night, Michael minutes, because hiss for me, the little computing |
|
|
56:14 | see the a goat or here with I know why they're there because they're |
|
|
56:21 | the the heating for presentations but the straight to generate human presentations. And |
|
|
56:30 | you start dinner and he sees when go to the league quarter part, |
|
|
56:34 | ? So, yeah, I'm putting way that some here just, you |
|
|
56:41 | , represent. Remember that it's coming in color, but these multiple, |
|
|
56:46 | pontifications and all the attention Maicon's happens on their decoder, right, because |
|
|
56:52 | wanted triggers his way. That Yeah, And it's the same, |
|
|
57:05 | , again if you bean scene over over with learning your training, your |
|
|
57:10 | to do something that you told that away used for something else. So |
|
|
57:14 | trained the encoder to generate Azzi that tossing away. And then since since |
|
|
57:21 | the heat underpants in patients, when say you're touching when you said you're |
|
|
57:27 | away, uh, thank you like is it for them to do |
|
|
57:37 | But they use them for something like, you know, where Zenit's |
|
|
57:40 | way. Just take the useful Okay, so now about scoring |
|
|
57:53 | As you can see, this is little bit RV training, like, |
|
|
57:57 | know, why do we have to innate. Remember these. I have |
|
|
58:00 | take this part. This is a . Should be adding here coma. |
|
|
58:05 | we can captain eighties and we must with his matric. And then we |
|
|
58:08 | tang age that, you know, around. And then we move, |
|
|
58:15 | with these victors, is to get single score, right? I |
|
|
58:19 | this is super arbitrary, right? , do go have come up with |
|
|
58:23 | different way to get a score out that, and that's that's actually |
|
|
58:29 | I mean, people just decided we going to try this right. Instead |
|
|
58:33 | that, they tried in the work because support, they went for |
|
|
58:39 | right? But as you can many other papers will be providing a |
|
|
58:43 | a new version of the scoring But the essential you know, the |
|
|
58:48 | going functions are essentially the first ones those are the type of the ideas |
|
|
58:54 | . And that's the one that began in this paper, you know, |
|
|
59:01 | in translation by jointly learning toe a in translate. You know, that's |
|
|
59:06 | people here, like just of You may have heard about his These |
|
|
59:12 | , too. I don't know how pronounce his name. That this is |
|
|
59:16 | very very thing, this person. , yeah, so, like that |
|
|
59:21 | attention is also known as additive, right. And as you can |
|
|
59:26 | it's a little bit different here. , we have through parameter to main |
|
|
59:32 | as parameters that are w A and have you a s. Well, |
|
|
59:38 | why are we taking here? We're the previous the previous state off the |
|
|
59:43 | or not, the current one, the previous in the in the description |
|
|
59:48 | we discussed before we were taking the state of the decoder, right, |
|
|
59:53 | S Q. 44 times the court this case for the scoring function that |
|
|
59:59 | proposed. They were taking the state the three years times in the |
|
|
60:05 | right? So instead of being cute you could be killed three for administration |
|
|
60:09 | had before and then this is the kid estate. And then the scoring |
|
|
60:15 | is pretty much the same, And then the Yusof Max to normalize |
|
|
60:21 | discourse rose scores, which in this are denoted like but enough about |
|
|
60:27 | And then they used the data they something and remember that we do this |
|
|
60:33 | that some we do the guy attention for every time said in the |
|
|
60:39 | So that's why this is index by . So for the d color |
|
|
60:44 | take one, we have ah, Victor one as well and so |
|
|
60:49 | up to whatever length were generating with color. Okay, so is out |
|
|
60:55 | so far? Much the same thing discuss, report. But yeah, |
|
|
61:01 | free to ask any questions. everything. So something could are they |
|
|
61:09 | in the paper is one of the with sequence Lands Way mentioned that these |
|
|
61:15 | a challenge in in the vanilla sequence sequence mothers right. The longer the |
|
|
61:22 | , the hard therefore for the in to compress information the single see |
|
|
61:27 | right? Because we have a fix off dimensions that we that we can |
|
|
61:34 | it it clear to the sea. . Um And then when we are |
|
|
61:38 | tension, we we don't have that anymore because we can attend to any |
|
|
61:44 | of the centers. We care, ? And this is determined by the |
|
|
61:50 | Victor from the d color that we trying to predict a war for |
|
|
61:55 | So as you can see, look the graph. This is beautiful because |
|
|
62:00 | keeps through the sequence. Plant doesn't how long you think is 60 |
|
|
62:07 | It's still going good, right? else? Owners? Other monies are |
|
|
62:11 | the cannery A and an RV sensations that they propose that they they provided |
|
|
62:19 | was pretty good because, as you see, the first boards in this |
|
|
62:24 | between friends, eh? English and ? Uh, pretty much all the |
|
|
62:29 | had a pending the same war. when it comes to these name, |
|
|
62:36 | , European economic area, they come in French is in the opposite |
|
|
62:43 | So, as you can see, it attends at two different wars, |
|
|
62:48 | ? It goes ahead and it takes right to translate sewn, and it |
|
|
62:56 | the open sea direction right instead of these areas. And then everything else |
|
|
63:02 | very much this thing. But this pretty cool, right? Because they |
|
|
63:06 | how to align the words from different without explicitly telling. This should be |
|
|
63:12 | to this one, right? I have a question. Yes, |
|
|
63:19 | missed what the story The left half . What is third? Ah, |
|
|
63:28 | don't remember these to the onus I need to take the paper. That's |
|
|
63:34 | the point. But the point of plot waas the problem they had previously |
|
|
63:40 | validation, right? They were not to keep up with the sequencing. |
|
|
63:47 | right, you get the as you see, we have an end and |
|
|
63:53 | token as well here, right? , yeah, As I said |
|
|
63:58 | we put president dating at the All right. And then, I |
|
|
64:04 | , that's buy that his attention, is also which rattles and as addict |
|
|
64:09 | . But we have other score We have moons with flickering attention. |
|
|
64:14 | the fair wanted conquered person is what use. This is what I have |
|
|
64:19 | have I should have used in the . I will correct. Uh, |
|
|
64:25 | use the contact person. Right? there are others, like, you |
|
|
64:28 | , simple dark truck between the right? Or, you know, |
|
|
64:34 | space. And then, like multiplying victors and stone, right, the |
|
|
64:38 | cut. So what is the You know, why do people choose |
|
|
64:43 | to use dark instead of genital instead conquer. Do you guys have any |
|
|
64:49 | ? Oh, you guys come out some tuition out of that white dot |
|
|
64:55 | said the first move because they're very , right? Why would you choose |
|
|
65:00 | one instead of the next one? , that is living a lie. |
|
|
65:09 | , I think what Dr Dirk is compute there. The, um the |
|
|
65:20 | is similar. Like the similar riveting back there. Exactly. Is the |
|
|
65:26 | similarity without being normalized? Right? , eso basically we have these dot |
|
|
65:33 | , which is what you just That similarity between the vectors, but |
|
|
65:38 | normal normalizing. So it is not the metric like that. The scientific |
|
|
65:43 | . Uh, but remember that when do that, essentially we can come |
|
|
65:48 | with values out of these access, ? If its opposite them, we |
|
|
65:52 | minus 100. They're assuming that the are equal length. You can find |
|
|
65:58 | can get minus 100. And then there are starting out this cereal and |
|
|
66:04 | 100 they are Ah, uh, similar. Right again, This is |
|
|
66:10 | normalizing, right? The point here that OK, That makes sense, |
|
|
66:15 | ? Why? Why don't we use scoring function and avoid having parameters out |
|
|
66:21 | that. Why were you at this for instance, You cannot learn it |
|
|
66:27 | ? You need to learn to be to address the difference. Circumstances Unit |
|
|
66:34 | two need to learn the w there's something something very interesting what you |
|
|
66:39 | because you said you need to adjust especially Syrian instances, right? |
|
|
66:46 | replaces the same wars. But that's that's bring heat. But why |
|
|
66:51 | those specific circumstances that you are not learning by generating these age victors? |
|
|
66:58 | to generate those, you have to something as well, right? Because |
|
|
67:02 | is output open Aryan in which also parameters. So what's what's behind |
|
|
67:09 | Maybe for season, every word they a single world though another land or |
|
|
67:17 | like that. Like for one we must be We must have a |
|
|
67:21 | worth three words or at the same . And for that we use the |
|
|
67:27 | of a Those three words are those ? You started just sitting on the |
|
|
67:33 | path in danger, sidetrack a little . But you know, the important |
|
|
67:40 | here is that if you use the product. As I said before, |
|
|
67:45 | are getting ah, similarity to Right? But what about if the |
|
|
67:50 | the betting spaces are from different Right? Remember that in much in |
|
|
67:57 | , you might have a different vocabulary for your declare opposed to the color |
|
|
68:04 | you have English words and then you French words, right? So then |
|
|
68:09 | of space is not the same. you are trying to get similarity with |
|
|
68:13 | probation. You're trying to get similarity on their depression and bury in |
|
|
68:18 | right? But as the 2nd 1 adapt, well adjusted those here |
|
|
68:23 | I like that phrase because it's gonna to say, Okay, we're marking |
|
|
68:29 | languages, right? And we want similarity between this, but in the |
|
|
68:34 | space. But we know that they learning the same street because they belong |
|
|
68:37 | different language. So that's why people use the 2nd 1 allows us to |
|
|
68:43 | different buildings, places and then for concatenation one, this is pretty much |
|
|
68:50 | , is not very much right. very similar to the back down this |
|
|
68:54 | . So what are very crucial differences ? Can someone tell me you go |
|
|
69:03 | one more time said before, Yeah, essentially that. So, |
|
|
69:09 | I said at the very beginning, just tried their their ideas. And |
|
|
69:13 | it works, they publish them. see, it's a little bit arbitrary |
|
|
69:18 | say, OK, I think the sensitive then no presidency. Okay, |
|
|
69:22 | think the current thinks that there's like, you know, uh, |
|
|
69:27 | a single question, single answer for right people just formulate experiment. And |
|
|
69:33 | it works, they they publish it whatever. So, yeah, you |
|
|
69:39 | see these thes scoring functions in this , effective approaches to attention base near |
|
|
69:45 | Metis translations. And you can see is a year later, after Baghdad |
|
|
69:48 | attention, which was in 2014 and , you know, very popular |
|
|
69:56 | So you can you can rely on their approaches. All right, on |
|
|
70:03 | , with these the scoring functions something is very nice is that they they |
|
|
70:08 | able to improve career machine Translation Mullins we're getting mistakes. Basically. Name |
|
|
70:16 | , right? So name, shook, kept the same Should be |
|
|
70:22 | the same from either language. So in this case, you have |
|
|
70:28 | source. Here's source. Here's the . Like the goal. Today I |
|
|
70:34 | labels the actual translations of German. then this is their vision, |
|
|
70:40 | And then they got to ride the entity. But other baselines, they |
|
|
70:47 | these they went enabled they were not to produce the right entity, |
|
|
70:52 | So that's for equal, because it able to think. Okay, this |
|
|
70:56 | something I shouldn't be translated. Somebody out of this, all right. |
|
|
71:03 | you know, there are successful applications these 60 miles in the sense off |
|
|
71:08 | color decoders. As I said we have the same. It's we |
|
|
71:14 | threats, um, features out of units, like, you know, |
|
|
71:17 | regions and so on. And then that convolution of your own, never |
|
|
71:22 | . So he's essentially will be hearing , right? They wonder the strikes |
|
|
71:27 | from from the image. And then passed these towards an instant, and |
|
|
71:33 | l STM, with visitation mechanisms is to say, Okay, I'm gonna |
|
|
71:37 | in these areas of the image to a barrett right. And then flying |
|
|
71:44 | like, you know, more a little bit of background, and |
|
|
71:48 | and the winds and then over. getting a little gate off off, |
|
|
71:53 | off the water in the background and generating of audio border. That's in |
|
|
72:00 | opinion. That's amazing, because you combine. You can even combine moralities |
|
|
72:07 | , meaning from unity Is your generating , right? So it's again like |
|
|
72:12 | said. It's like the semantics of of your thoughts, what you're representing |
|
|
72:20 | between these two and color the corner then then color is just capturing either |
|
|
72:26 | kind of features that you're receiving us input, and then your brain is |
|
|
72:30 | this late in space and then with lawyer, Uh uh. You |
|
|
72:36 | the skills that you have, you generate some other output in the |
|
|
72:40 | so you get something from vision and something from language. Right? So |
|
|
72:44 | that's very cool, In my I actually suggest to read This paper |
|
|
72:48 | people. I am from popular I guess they got roughly every dozen |
|
|
72:55 | since, and then we have all attention of, by the way, |
|
|
73:00 | are we doing the time? I cover time right here way. I'm |
|
|
73:11 | Yeah, two more. This one one morning. That's it. It's |
|
|
73:16 | another yoke off the reputation mechanism that will cover later. So, |
|
|
73:23 | I said before, there are many mechanism metals, but one that has |
|
|
73:27 | has become very trendy lately. Is self attention from the transforming architecture? |
|
|
73:34 | the paper is potentially so, And there's a bunch of people behind |
|
|
73:39 | work. Um, so something you know we have problems with the |
|
|
73:45 | 626 months is that we're generating every , said Frank. So so, |
|
|
73:50 | know, remember then color. It's the human state to the next time |
|
|
73:54 | . And so right. So even though you can paralyze a lot |
|
|
74:00 | , you know, computations based on Street mood applications and abuse, and |
|
|
74:05 | on. Just feel cannot paralyzed. know, across time steps right, |
|
|
74:10 | the next time step is depending along output of the previous times. |
|
|
74:15 | So that's that's quite a bottleneck. with this transformer architectures, they enhance |
|
|
74:21 | realization. So they figured more like and more effective training because they are |
|
|
74:27 | to process faster the data and then can process. Before they had answering |
|
|
74:31 | longer. Uh and you know this any sense? They said these calls |
|
|
74:36 | potential in The idea is that they record fission products between every word across |
|
|
74:43 | the sentences. So across the So basically, you have, you |
|
|
74:48 | , in our unis awards they were in the initial the temple, off |
|
|
74:53 | translation between English and Spanish. You the world class with some of in |
|
|
74:59 | distribution of proposed entire sentence, including including its own words. So I |
|
|
75:06 | the end of the class will have probability distribution and all the words we |
|
|
75:11 | have that because not all the wars related to, like a civil war |
|
|
75:19 | not related to all the world's equally . They change because of linguistic properties |
|
|
75:25 | whatever role they have in the So this is idea off, potentially |
|
|
75:30 | in self attention from the officially soldier paper and you know, a little |
|
|
75:35 | of the self attention is he's scaled product attention formulation. Essentially, we |
|
|
75:43 | covered that we have a query and and, you know, throughout scoring |
|
|
75:48 | basically squarely with win, what are most important parts in the context, |
|
|
75:53 | ? We are already learned this parts their sick, Too sick with |
|
|
75:58 | More. But it is in this . Where they do is to has |
|
|
76:03 | parameters. Were they that they you know, makers Ford equating the |
|
|
76:09 | for the keys and the valets, ? And basically squarely, since all |
|
|
76:16 | them are going to be matrix the input they will be, |
|
|
76:21 | attending to every word in the as I said, is a dietitian |
|
|
76:24 | at the end. And essentially, we have these matrix with probably distributions |
|
|
76:29 | the role level, you know, world has probably distribution because it saying |
|
|
76:34 | words attending to these these other this other war is attending to all |
|
|
76:39 | them and so on, Right? bro is had probably distribution. So |
|
|
76:44 | that you have done do wait, use these as you're probably weights, |
|
|
76:48 | then you weigh the value matrices, is which is what's happening here. |
|
|
76:53 | . So here you have you have stock inspired that we saw in the |
|
|
76:57 | Wallace to normalize the probabilities. And they weigh the value of acres, |
|
|
77:04 | ? And yeah, I'll leave it to their because anyways, wrestle will |
|
|
77:09 | this topic in depth. So Yeah, That's pretty much what I |
|
|
77:14 | here. The references for you want read more about these papers? You |
|
|
77:19 | take them out over here, and it. That's all that happened |
|
|
77:26 | Thank you. That was a wonderful to sequence of sequins models. Very |
|
|
77:33 | explain. Um, anywhere those Oh, I shouldn't. Okay, |
|
|
77:41 | stop. Shame. Thank you. , praises. Okay. Thank you |
|
|
78:06 | much. All right, All Thank you. We'll see you Monday |
|
|
78:15 | the next Monday. Him and he's to be a hands on showing. |
|
|
78:24 | , is nature you're here and you up because that's, uh, basically |
|
|
78:29 | we would use for the assignment. ? Stumbles to prepare an assignment, |
|
|
78:33 | less assignment It would be kind of . So make sure that you have |
|
|
78:38 | . You can leave the papers that sure you're here on Monday before the |
|
|
78:42 | where he's going to percent. Thank . Once say help. Yes, |
|
|
79:03 | . 52. This connection. Just show the recording, you can drop |
|
|
79:08 | the from the meaning. It's just I had a happen stopping recording. |
|
|
79:14 | mean, one day, like it's good to start, but the stuff |
|
|
79:19 | like that it according Okay, so a one stop itself on a |
|