© Distribution of this video is restricted by its owner
Transcript ×
Auto highlight
Font-size
00:00 Hello, everyone. Welcome to to lecture today. We're going to talk

00:04 logistic regression. This is probably the one of the most classical machine learning

00:11 . This is the very basic classification learning algorithm, and I give everyone

00:19 turns or wants to learn. Machine well learns is as their first on

00:27 algorithms. But before I do I just want to spend some time

00:32 explaining the buyers and various variance problem I touched on last week. But

00:39 probably didn't do a good job explaining . So why spend time talking about

00:45 is because if you if you if learn and a chain a motion anymore

00:53 from your opinion that said, Can find that your machine learning model is

00:59 performing as well as you expected the likely you have rain to either a

01:05 on variance problem. It's very important figure out whether you are telling you

01:12 dealing with Spires on various problem because tells you what to do next in

01:17 to improve your motion alone in Remember that last week we also talked

01:22 the different strategies, different things you do on the order to deal with

01:28 and various problems. So let me . Let me use this simple.

01:32 said that you have already seen several on the why access is that.

01:37 is a price on the horizontal axis is the size of house. So

01:43 trying to predicting the housing cops price price based on the side of the

01:48 of the house. In this if we assume if our hypothesis too

01:55 , For example, if we assume this is a linear relationship between price

02:01 on the size, then our machine model will look something like this basically

02:08 line. In this case, it's that we are. We're under fitting

02:13 training data because we get because this line cannot capture um, all the

02:22 information. On the other hand, you receive your machine learning model is

02:29 complicated, for example, here, assume that we use a polynomial model

02:37 two degree for on our mission. mortal will look something it kept.

02:46 feeds data really well because you can can tell that this and yet curve

02:52 all our training data point. So but probably because. But because of

03:00 , this morning probably will not. as well to new that's set in

03:05 case way have to over feed the . Also, in this case,

03:15 we assume this more than it has not too simple linear regression case not

03:23 complex that liking this case where we a problem after the four degrees.

03:27 we simple assume a polynomial moored up to 2 degrees in this modern learning

03:36 that you end up ways will look like this. In this case,

03:41 hope you agree with me that this is probably the best more than that

03:46 we can use to predict how's housing based on the side of house.

03:52 that literature in machine learning community people also a user, I use the

04:01 buyers Hi bear buyers to describe this feeding problem and high variance to describe

04:10 over feeding problem. So here's the most likely in most cases inspires problem

04:19 due to our oversimplified assumptions. For , we assume a linear model when

04:26 through our training data actually were generated a non only highly nonunion model.

04:33 will lead to under feet in the data, meaning that our machine learning

04:41 Mrs some of the important information important , important relationships. Among the training

04:48 on theology and the various problem comes the fact that our machine learning model

04:57 to center Dave probably excessively SEB sensitive small variations in the tree and it

05:04 like black noises. So in this , it will lead to overfeeding that

05:12 , meaning that anymore is too powerful it captures irrelevant or sometimes even you

05:25 it features young training did. This happens. For example, This

05:30 This will happen if we assume a year model when our date actually are

05:35 leaning. It turns out that these things buyers and variance there's a well

05:46 bar bias and variance trade off. what? What does that mean?

05:52 ? What this means is that if increase mortars complexity, for example,

05:59 we increase the polynomial features, increase number of parliament features and the decrees

06:07 parliament features, that's what typically increase various because we are moderates now more

06:15 of capturing a very small variations. date, so the various will increase

06:23 the center. At the same the buyers will be reduced, and

06:28 , if we reduce the more those city. For example, if we

06:32 from parts unknown and agree to back linear regression more the platinum a degree

06:38 . Obviously we variances decrease because because modern art becomes a senior, so

06:44 it becomes incapable of capturing the small variations but the buyer's increases so you

06:52 see that there is any buyers and trade off. In most cases,

06:59 you increase wine, you will, , reduce the other one. It

07:05 very difficult to reduce both buyers and , So this is incoming in any

07:13 . This is very well known as Barrys and Byron Street off. I

07:19 that gave you a better understanding of spires and variance problems. Okay,

07:28 we're going thio fuckers on today's Largest regressions. I'm going to talk

07:36 the basic idea and concepts behind things . I also try to expend,

07:44 and help you to understand intuitively what regarding does also, for those who

07:53 interested in learning more about how to a cost function. Was it about

08:00 ? But feel you're free to escape part because this is not required.

08:05 is beyond the scope of this class for your lab exercise my homework,

08:11 example. You do not need to how to develop cost function. And

08:15 if you work in industry and you with largest refugee on a daily

08:22 And if you, if you are open sauce and library like second learn

08:28 tensorflow. Still, you do not to know how they cost function for

08:34 record. What the back. This purely for those who are interested in

08:38 more about cost function and optimization. , out, I will also give

08:45 the shows how to implement this operation second learn. Okay, so first

08:54 about largest regression is that these easy method remember that last week we talked

09:01 there are two basic categories for machine algorithms. A wise regression. There's

09:09 classification regression, regression always predicts and numerical values on classification always predicts

09:23 , and the category cool numbers like Here, the 123 simply means Class

09:30 or a category one category to cutting . So the first thing I want

09:36 make clear is the largest Russian. seize a classification matter despite the fact

09:43 it is called the largest regression. this name is very, very

09:47 But just keep in mind that this a classification algorithm rather than a

10:00 So here's just a few examples of registry largest triggering a logistic regression

10:06 For example, emails we want to emails you to spam on non

10:15 So basically a yes or no Um, another, um, application

10:22 largest regress. And he's on the transactions we want to classify. Want

10:29 detect if a thief online transaction it's or not again, with this case

10:38 where this is a yes or no ? Yes, it is fraudulent from

10:42 is a fraudulent transaction. Now it not, um, another application using

10:49 class occasion whether patient's tumor is malignant or benign for self driving

11:01 registration is also useful because it Classified data into pedestrians are no the

11:15 problem that I used last week to supervised machine learning. It is,

11:21 is. It can also be be using largest regression again. Here we're

11:28 about whether it's a cat on not so in an application to

11:34 To Johnson's problems. Largest reversing also also has also found, and its

11:40 it's using, for example, sort all the detection. In this

11:45 you might want to predict whether a model sells a particular cell. Your

11:53 is either sort on no. And this week's life exercise, you're going

12:02 classify about 10,000 seismic creases into the is good and bad. So you

12:12 these that largest regression in all these . It serves as a binary,

12:20 fire. So we only have two yes or no or category one Katherine

12:32 . So the largest regression, and is a supervised learning algorithm. So

12:37 just a recap of supervised learning. that for supervised learning. Our training

12:44 said training did set consists of two . The input variable X and label

12:49 why, says LaBelle's why you can that as output variables are the true

12:58 . So she wasn't what what he . Learning out with them trying to

13:03 is to come up with the mapping F that can that can maps this

13:12 variable two out Once, once this functions learned the next time when there's

13:24 new instance, X comes in, can just use the learned F adds

13:29 new data to predict. So for , sir progression the outfit. The

13:41 will always be either zero or one this case for the weekend. Simply

13:49 zero as negative class. For It's not a thought. It's not

13:53 fraudulent transaction. It's not a it's a scam. And all also,

14:07 can understand these category one as positive , for example. Yes, it

14:14 a sword. It is a and it is the fraudulent transactions.

14:27 this is a linear regression model that have already is in several times.

14:33 he also implement these in the notebook . I'm here. I'm using this

14:40 satisfaction prediction as example. So here have one feature one put one put

14:48 one feature, which is the TV capital. In other words, we're

14:52 to predict the life satisfaction H backs on the single feature TDP X.

15:02 here's this zero See the one that more the parameters were trying to learn

15:08 training date. In this case, only have one input variable or one

15:15 . So this is also called linear . For one feature, you're just

15:21 down here that you can understand the very bow as as feature. So

15:28 , if there's wine put very more a means and we only have one

15:34 . So we also talk about general this linear regression for one feature to

15:40 regression for multiple features. The basic is very simple. Also, instead

15:46 having only one people variable here we quite a few input variables X want

15:51 to up to extend each one of represented 11 He could feature one feature

15:57 put variable. For example here, one month the GDP extrude man free

16:01 care three, maybe education, ex and the air quality So on and

16:08 forth. So we have a problem is is to predict life satisfaction.

16:14 on all these features, X one after accent and here see that they

16:21 see the one Sorry, I should put down to about two feet and

16:27 are the more the parameters were trying learn from training. Um, I

16:32 hear multiple input variables correspond into multiple . Oh, that's what I put

16:45 here. So we can We can need more general science on this speech

16:52 acts That is our predictions and X that is our future one next to

16:57 future to 3 53 Accident on the In this is a more primitive

17:07 Turns out that we can use linear using Ze Major expected multiplication to simplify

17:17 morning and summarize it in the more form here, State vector transposed Have

17:27 vector x after years heart. he's defined a visa and plus on

17:34 one vector axes also a m plus by one vector. So for logistic

17:46 , remember that way This is a classifier. So we out prediction will

17:52 yes or no. But Congress, predictions, always will always be is

17:59 zero or what? Oh, making prediction to be exactly zero or one

18:13 out to be really difficult. So a nothing similar thing we can do

18:18 to make sure that I were predicting for within this range, this Ranger

18:25 0 to 1. But the problem the linear regression problem without foot of

18:33 regression models that this output of linear model it can be anyway view

18:41 It can go from medicine for too, to infinity. For

18:47 Suppose you put the future while you feet here. He's obviously that prediction

18:54 we learn a senior regression model guessing that's the street line on. But

19:01 the output values can be anywhere from Infinity too infinite. So obviously we

19:11 to do something different in order to sure that output always within our

19:18 always for within this rink. So that is where the largest logistic function

19:25 into play. Logistic function is defined this way one over one plus e

19:35 menace X. This is also called function waken. If you plot of

19:43 , this is this is hardly a regression. Looks like first the first

19:53 know, he said. This is smooth function, and also you noticed

19:58 as C input variable X goes to , the ultra value approaches line and

20:15 , if the input variable keeps this outfit buh the Afghani will be

20:26 closer and closer to zero and twins variable is zero Um, the out

20:32 the butt. The out foods value 0.5. A nice thing about this

20:41 regression logistic function is that it can any real number from Madis Infinity,

20:48 to a real number within this So what? No matter and how

20:54 or how small this input Vera Maxie's T X will always be within this

21:04 zero and why? So how do go from largest reversion? It's our

21:11 regression to largest regression. Well, is the on the model we have

21:17 for largest regression. And I also that his output value from this more

21:24 he's it can be anything from minus infinity. Remember, we also we

21:35 talked about largest regret largest a function can map a real number two,

21:43 rule number within this rich. So want to make sure that the output

21:50 always for within this wrench we simply the logistic function to these products

22:03 That is what he was thinking. this is the logistic function. We'll

22:09 more of the function that you saw few slides ago. One property with

22:17 function, that's the output will always within this range, which is what

22:22 we want. Another good thing that this property is that we can easily

22:32 out. Convertible on that is the of X as the probability What was

22:39 reason is that simply reason simply sends parties within this range. So that

22:47 the interpretation in terms, probability just natural thing to do. So,

22:55 example, yes, that's what I what I would I'm here. We

23:03 interpret h of x as the estimated that why you won X So,

23:13 example, I just I have a simple example on email scam detection,

23:21 example Here, my input features, two by one vector. The future

23:30 using its simple is a number of words. If people have found that

23:37 scam emails, they have something in , that is the capers. For

23:42 , if you see something like for also, for example, cash,

23:49 example Amazing. So these are the words that the common to scam emails

23:56 one way to detect scam emails is to counter. To detect the existence

24:03 these and also to count is no of these killers or their many

24:10 There's a long list of the Was amore key words you see from

24:17 list in your email, the more it is a scamp. So in

24:21 case, for example, we trim largest regression based on the input this

24:26 feature and also assume that we are output value from largest yr Ri

24:34 Poland. A two way can simply this outfit value as as that there

24:45 the 82% chance that this email Well, some people don't like

24:54 because if you tell people that there's chance your email use them. That

24:58 make sense to some people. People simply want to know. Eve my

25:05 . If my email spam on just simple yes or no problem.

25:11 that case, my simple thing. can do it just to do that

25:16 me. If the output value from regression is larger or no less than

25:23 off point, pop conned 0.0.0.5, is pretty quiet for a while.

25:27 if the value is less than point . We simply predict Y equals

25:40 So that's pretty much that's that's the idea is behind largest regression. So

25:45 I want to spend some time explaining this what this is and trying to

25:53 you develop intuitive understanding of what the through Britain does. So this is

26:01 we talk about just one second This is how we do the classification

26:07 on largest regression. If the output larger or Yukos and it's like is

26:12 or Laugesen thereupon five, we predict positive class. If the output various

26:19 point 0.5 predicting Y equals zero. again, this is like this is

26:30 the largest, longest day function looks . So let's take a closer look

26:39 this logistic function. Um, my here for you guys is to think

26:47 when this happens. So remember that of X, um defined as t

27:04 transpose times, Max and the as blue line. That is how the

27:12 function Z looks like this easy here can simply can't get similarly we can

27:19 define Z equals equals faith transports packs age. Theater X is equal to

27:38 . So if you look at this closely, you probably will. You

27:42 have already found out that whenever fate X, it's larger. Zero.

27:57 have aged thief ex you cruise for 0.5. And similarly, if h

28:09 eight eggs smolders and meet me that t of transport eggs smaller than upon

28:17 five In other words, eve t the smaller than 50.5, that means

28:24 we are talking about see smaller than . Remember what defines E as 50

28:33 , safe transport time, blacks. that means state transpose times x smaller

28:38 zero. So whenever this happens, have t of the So we have

28:45 of then faith transpose Times X wanted , um, 0.5, that is

28:55 city. So whenever let me put here and we're ready down. So

29:03 H bags is smarter than on five Predict why, Michael zero equipment way

29:10 saying this is Yves St Transposed X smaller than zero. We predict y

29:18 so that that is what I wrote here. Um, if state transposed

29:25 is equal to all ages and zero pretty like one if data transpose times

29:31 smaller zero Pretty wife was here. what? I it was on

29:38 It's exactly the seam is equivalent to part. Okay, Next we'll Andi

29:53 Thio. Explain what? What this means. So to do that I

29:58 Thio. Here's a simple example So here we have This is our

30:03 is our training set. Come again we used the red crosses as potted

30:10 and the circles as negative class on words I the Red Cross. Of

30:16 one to class wine and these circles to pass you and that we want

30:24 classify, um, this data using regression So in this case because in

30:34 case, we have to put So that's why we have we have

30:40 like this said zero Plus they want wanting to x two. You can

30:46 expressed this thing as stated transposed time were Satan. He's going to faithfully

30:58 Saito wine. I hate it and X because X zero excellent.

31:07 next to relax zero will always be . So we haven't talked about how

31:23 hotter learns it's more the perimeter Um what about that. But assume

31:31 we have implemented largest regression and we We learn we have learned this more

31:37 perimeters. And, uh, we that think that they're all equal to

31:43 three. Some wine equals one. in other words, we predict y

31:57 one. Whenever this is true, can always move without. We can

32:11 move this ministry to the veteran so that becomes x one plus X

32:18 larger than three. Conversely, will Michael zero if Manus three plus X

32:39 last text to smaller than zero. , you can also right this thing

32:47 select different form. Let's move this three to the right hand side,

32:52 becomes X Y Class X two smaller three. So you want to explain

33:05 this means? Let me let me write down Listen creations that to you

33:13 problem Much more familiar ways X one X two equals equals three. This

33:20 simple. If we plotted this up onto these x wax to plan,

33:27 simply is storyline. Passing through is on x one and three annex.

33:36 to these these x one flax next equals three It turns out that this

33:49 space on the top right can be some rice as x one plastics to

34:03 him. Three if you look. you look at what we read on

34:11 , it simply means that we will , Like the one Eve this data

34:21 is located in these half space. was a tough right thanks. And

34:34 , the half space to the bottom that can be mathematically some rays and

34:42 one plus extra smaller than three. is what we have here. So

34:47 also he's 1/2 space. So what says is that will predict y equals

34:53 whenever the state point's located in this space. And we will. So

35:03 notice that this straight line here that the X one plus extra good

35:11 It separates this positive class from this class. So we will,

35:21 terms this tree line as deceiving boundary it is a boundary between this posted

35:34 and negative class. If you don't follow me, please feel free to

35:47 here and spend some think, spend time thinking about, um off these

35:55 here. I guess the important point want to understand What I have done

36:02 is to realize that on base increasing to a street line. And these

36:12 actually corresponding to these cough space to top, right? And these quantity

36:24 to the half space to the bottom and that will help you understand.

36:31 , what I did here. Now let's consider a more complicated example

36:44 hear. I have supposed again and is my training data, and I

36:49 posted class marked highlighted in dressed crosses neck. Next class in it's open

36:55 cups in this case. So this the we also have again, we

37:01 have two features. So this is largest tree regression that we you have

37:07 sin from previous lights. I'm But probably have already realized that these largest

37:16 based on these small there will not able to capture will not be able

37:22 , um, find out the boundaries the positive positive class and Arctic

37:28 Because these theme this model can only linear boundaries like this example. So

37:43 question now is that can we find can we discover? Can we develop

37:51 nonlinear decision Boundaries using largest river got . Well, the answer is

37:59 And, uh, it was a to do that is to adding higher

38:07 degree polynomial features like these X wine X two squared This'll Stroot look for

38:18 to you guys because last time, when we talk when we talk about

38:27 Z remedies for undefeated. One of we can do is to, um

38:33 when we're under feet. Today, , if if you want to phone

38:39 that problem, one thing we can is to add more features, for

38:43 , the higher degree polynomial features that make your learning more than more capable

38:50 capturing Lundeen year behavior. Nonunion boundaries your data. So this is what

38:59 This is what we have from last video here. A similar thing simply

39:05 can simply add more features, more features of high degrees in order to

39:14 the complicated decision boundaries. In this , again, we can rewrite this

39:27 inside thes parentheses as faith transpose where St Symphony teeth of their

39:37 Think wine two and three, beautiful x. Okay, excellent. Thanks

39:49 sorry. We also have X zero excellence merit. And next to square

39:59 . If we have help office function looks like this. Basically, it

40:06 that we will predict. Like was like would want? If Sorry,

40:11 forgot to mention one thing and supposed learning we learned that the model prints

40:17 we will learn are the following 60 minus one state one 0203 minutes

40:30 I'm sorry. State of three Um, he's wanted for is why

40:37 visible with that? That means that predict Y quit one. If this

40:44 is larger or larger, equal to again Yonder took that. Understand

40:53 I will. I will revise this in a slightly different form as a

40:59 X one scrap plus extra square is to our larger than one. But

41:04 probably already recognize that these if I up if I plot up this in

41:10 ex Max too plain, um, correspondent to x one squared X two

41:24 equals one And all this space outside outside this this decision boundary can be

41:36 some rest sex one squared, plus to scratch. Larger than why so

41:42 this case, What we have developed far is to commit some residents as

41:49 falling will predict Y equals one. , um, my dad corns force

41:56 these circum in this case, this that is our decision boundary.

42:12 turns out, turns out that we we can do We can't keep adding

42:16 polynomial features Learn more my more complicated boundaries. For example, if you're

42:23 simple example where we just keep adding polynomial features you this kid, for

42:30 Excellent squared times X two x two one squared extra squared Exline cooked actitud

42:37 this is already the fourth on the of polynomial features. And because of

42:43 higher order putting your features, it turns out that these parts this

42:54 largest regression in this form is capable learning more complicated on boundaries, for

43:01 , something like like these. so next thing I want to talk

43:11 the seal cost function for cost function lunch is through regression. This is

43:18 punk thing because I remember that on or three weeks ago when we talk

43:24 machine learning with talk about what learning . Wait, really talk about cost

43:34 , learning. I mean man, many cases, it's important means that

43:40 want to minimize the cost function. next I want to spend some time

43:45 about see cost function for logistic So So here I summarized on training

43:52 as as this least so where we input variables. First input feature the

44:04 data input data, First label, label, and suppose we have M

44:12 and each one of each one of x I on each one of these

44:17 data. He's eight and buy them one by one vector because we have

44:24 features. Plus, um, plus ex not which is always because equal

44:39 one. So a note is that materials in the fall from slide 25

44:49 33 explains how to develop the cost for logistic regression. Again, this

44:56 beyond the scope of this class. you are so please feel free to

45:01 them. And but if you want learn more about cost function as well

45:07 optimization than the following, materials will useful look. So in order to

45:16 a cost function for largest reverie, let's consider the following. So this

45:25 the cost function that we have used linear regression. Right? This this

45:31 a state of X I that is prediction prediction for the ice data.

45:41 why I that the sea label or true answer for the I've data.

45:48 these thing magazine difference between eternal prediction the labels again, As you

45:58 we have this squared and then with these differences over all of our training

46:07 . So this is cause function we been using for linear regression. But

46:17 out that this this cost function is a good one for logistic regression.

46:22 reason has something to do with these comebacks and comebacks function. So I

46:28 to do next. I want to just some time explaining this important

46:36 So this is very important for So what I mean by they seize

46:46 wth the cost function. When it to often, musician came. You

46:53 have to two time. You have types of cost function wise comebacks the

46:58 known comebacks or turns out that there's types of cost function. We have

47:04 different behaviors for example, on the comeback stopped, emit and cost function

47:11 look something like this or just like all this. I don't know why

47:17 is already always happens. So it many, many local minimum. So

47:27 one of this is a local, know, And this gives this is

47:30 is probably the global minimum because it the smallest among all of the local

47:40 . So they see that no commune of the existence of the life on

47:48 local minimum depend young where you start optimization from. For example, if

47:59 if your initial remember that way greedy dissent, we always initialize our more

48:09 Sit. If if Well, if were initialized If our initial mother perimeter

48:16 this place, then you can imagine by implementing the greedy in dissent,

48:23 will eventually end up some somewhere So we will be able to find

48:29 low communion solution. But this is the best solution we want.

48:34 we want we want the best solution , um will come from this global

48:42 . But because of these existence, this money off this local minimum and

48:49 of the greedy in the way How and the Senate works. Chances are

48:55 I'm not chancing. Most likely you end up in a local minima.

49:04 our cost function is contracts, then will look something. Magazines. It's

49:09 a bow shaped, um, cost . Well, good thing about these

49:16 cost function that it has only one any, any minute. Any solution

49:24 end up with these three global So when it does it help,

49:30 has nothing to do with where you your in issue right where you started

49:36 descent from or you can start from because I'm here. You end up

49:42 this global solution. Are you from from here, You and with

49:46 We're also gonna end up in the , um, global minimum, so

49:53 as you're learning rate is not too . So for optimization, if ever

50:01 , we would like to work with cost function. The reason that we

50:10 want to get get stark in La minimum while local human east us is

50:17 is a solution to our problem. it is not the best one.

50:20 best solution always comes from the global . So with that knowledge, he

50:29 a mind. Now let's let me . Let me walk you through the

50:35 we develop a cost function for comebacks function for logistic regression. Now,

50:44 make things simple, let's consider only single training example X and associate

50:50 Why the basic idea for developing cost is that if our prediction H State

51:00 Axe is very is very different from true label, while we want to

51:06 these one prediction heavily in our cost and cum. Conversely, if our

51:12 age data backs is very, very to the true label than we,

51:17 don't want to penalize the critic for other words we want to penalize thesis

51:22 this good prediction as less as I guess you can Simple understands the

51:32 function as a way to impose different for different predictions. So for largest

51:42 and remember that our prediction will always 10 and it turns out that the

51:52 function that having this form looks So let me let me rephrase

52:05 So this is a basic idea for a convict, a cost function for

52:10 regression. It turns out the one of actually implementing this idea is to

52:17 a logistic, usually cost function that has this form here. So now

52:24 know this looks a little bit a bit complicated to you, So let

52:32 explain what this means. So let's just consider the first case when the

52:38 labels want, when the when the answer is one. If that's the

52:45 , if the twenties line the cost social ways on this, why could

52:56 is a menace log of each other's ? Well, we can plot up

53:05 function in this plane in this it would look something like so this

53:14 horizontal axis correspondent to 18 of So the manners log it,

53:22 Um, well, we can figure out by looking at what happens when

53:28 detective actually won. When you With one, this is zero.

53:35 when age state of X, he's to zero. This become positive

53:44 So it would look something like So this is Matt Manners. Lock

53:55 theatre Max. So what this means these at Wednesday's when this prediction is

54:15 . When this prediction is one. means that our prediction is the same

54:21 the transit. With Jessica. It's , you know what in this

54:27 because the prediction is very close to label, so we don't want to

54:32 this prediction. Therefore, we have zero here and Congress. The prediction

54:39 very different from the true answer in case meaning that's the prediction is close

54:43 zero. In this case, the is so different from our our Trans

54:49 with one. So we want to a very, very high penalty on

54:56 prediction. In this case, when one's predicting that one Z, the

55:05 actually goes to infinity so that that's I did. I'm here when the

55:10 , when this prediction is one the of zero and when the predictions close

55:22 zeros cost infinity, so it captures intuition that, um, ive the

55:28 is different from our label. We to penalize this'll any other than have

55:36 . So now that's a case for equal to one. So now let's

55:41 a look at what happens when what happens to the cost or to the

55:46 will invite zero. So when why zero? We are looking at a

55:53 function that has this form, and way we can lock it up by

56:02 the following, if so again, the horizontal axis that is the age

56:08 of acts. So when each state X equals zero, then this cost

56:22 zero. And when a state of you one, this is cost

56:37 Log one minute. State, state X. This is gonna be

56:46 So, um, it's something that like this infinity. So when this

56:58 this is true, this is our true label zero. When our prediction

57:03 zero, that means that our prediction matches our labels and we don't want

57:08 any and it cost or any penalty this prediction. Therefore, we have

57:15 a penalty zero. And when these one When's when's the actor answer is

57:22 zero. We want Panelist this prediction so. So that is essentially what

57:29 cost function does. It penalizes prediction when the prediction is different from the

57:36 , and it penalizes the prediction much heavily if the prediction is similar to

57:43 label. So this is a this out to be the cost function that

57:49 have developed for calling for largest Um, Waken turns out we can

57:58 rewrite this cost function in a more form on that looks like this.

58:04 this equation is exactly the same as swine. Remember that this is this

58:18 only the cost function for one single example for multiple. For many,

58:25 training examples, we will just simply them up. So this part is

58:33 same as this part. But for training examples way need to some Some

58:43 has a cost over all the training . I guess if you want more

58:51 , we should always we should also see superscript i above accent. So

59:03 is what, um that easy final for the cost function for the largest

59:14 . A good thing. I'm positive this is convex, meaning that there's

59:19 one solution. The whatever is a you end up with. That is

59:24 global minimum solution. That is the solution. Best solution. Okay,

59:31 this is the cost functions as you saw And remember that learning is all

59:37 minimizing this cost function So bye. this by minimizing. This is the

59:44 of this cost function. We can off a 10 the optimal motor

59:56 I notice that this cost function is because every part every component in this

60:03 function is a differential function. So this car function is defensible, their

60:09 very straightforward to calculus Ingredient so So this is the greedy Inter have

60:15 cost function with respect to well, than primitive state. Um, and

60:23 see houses. Credence is defined it find us in pass one by one

60:31 . Well, because because we can ingredient easily. Therefore we can we

60:37 play catch grading descent or slow Good, good and decent. I

60:40 about mini batch gradient descent to train largest regression model. While that stringing

60:52 that second part of the machine running important part of motion on Islamic

60:57 So once they learn is completed, will have a tenancy learned more the

61:04 fate you want to watch while next when the new data comes in,

61:12 did acts comes in. We can predict, um, on this news

61:19 this new live acts by calculating by this age data backs where these things

61:29 is Seymour the printers with current, have learned from the training face.

61:36 , next, the implementation of logistic using second learned. So to do

61:42 , I'm going to come open Demonstration the jukt a notebook. So

61:54 going thio my azar notebook and walking a very simple example of largest

62:13 So if you go to my No, no account. Um,

62:18 you click this lack of exercise Week on there is already a joke notebook

62:24 largest regression. So in this I just used the example, Data

62:42 , called areas State Said to illustrate this whole how you can implement largest

62:49 using secular. So this arrested said is a very femurs. Public dissent

62:56 motion running this'll arrested said, contains sample and pad Oh, Len thing

63:03 values from 150 iris flowers for different three different species the Tosa versus collar

63:14 on Virgin Eka. So this is picture of these three different Aris

63:22 In this case, without our our task is to train a lot

63:27 regression or a binary classifier to classic into Virgin Eka or non Virgin eka

63:36 based on two features. Petulant and . With, Um, the first

63:41 you want to do is to report pi array that would have you.

63:45 let me restarting Clea Clea R output that so that you can set it

63:52 cleaner and you can see what each of the code dust wait. You

64:00 to impulsive vampire E as number. also want to import this Ari states

64:07 . So here's what you can do impart the absent from second learn imported

64:16 and Aires equals this says Start, , Carrie's So this is how you

64:25 at every state said. So let's ahead and run it. Okay,

64:31 the resistance as we sing our If you want to take a look

64:36 everything, that's that you can simply Aris and run it. So this

64:40 how the distance that looks like it's little bit messy. But if you

64:49 has look at all the information here , you can you can recognize that

64:55 arrested said, is the dictionary. that the dictionaries always is one of

65:01 other amongst python did types, and dictionary always consists off a few a

65:07 of key the repairs. That is I wrote down here in this

65:11 Arrested sent consist of five key battle . And if you want to find

65:16 what keys are included in this, said, you can clicks this cell

65:23 and run it. So we have keys data target talking names on this

65:33 and future names. Description K Just a few sentences describing that they said

65:40 state of Qi that active contents the is array matrix with one drop,

65:49 instance, and one column for In this case, we have four

65:54 . The pad Oh, and several . Wait, so we have four

65:58 . The Target cake contains the wasti labels and future names. Cake

66:04 , and these are the names of on target names packed. It contains

66:10 names of the actor targets, so not Let's take a closer okay,

66:19 that each case corresponding to so you . If you want to find out

66:24 what's the target value is you just Aires on square bracket is tight so

66:32 will give you the value Correspondences, , touch, kid. So in

66:39 case, you notice that is a . The values correspondent to these key

66:47 are simply one Simply zeros ones and . Again, this values just discreet

66:56 values. It simply means class zero one class to what you want to

67:02 what each class. What What targets class expected correspondent to. You came

67:12 on this coat areas, talk It actually tells you that class wine

67:18 to save Tosa class too. Class Zero Correspondence that talks a class

67:24 is mercy collar. And plus two the emergent Nika. And if we

67:31 to find out, see Fisher So this is this is the

67:37 Names in the names of features in garrison said we have four features several

67:43 separate with had tow lines and better . Here is the description of the

67:52 set that you can treat. You to learn more about It was active

67:58 , active training, data input, middle input. Um, data looks

68:05 this. So this is the you consider this matrix that has 150 rose

68:12 four columns because we have 150. better way Take the battlements on 150

68:21 . And we have four columns because have four features. So next thing

68:26 me to do on the next thing want to do is to pray.

68:30 opinion, Dayton. In this we want to use a paddle lands

68:34 patter ways to make predictions that correspondent this data array actually see,

68:43 third and fourth column of the things so that is what you see

68:47 I just assigned the third and fourth columns from this delivery to this numerical

68:57 ax. And this is our, this is the target. This is

69:05 target of fury. Too sensitive the . So we convert all the

69:13 remember that labels are with the labels it. Average students that are simply

69:18 , ones and twos. We're We have a, um we just

69:25 his, um, So essentially, this this lanco does is to convert

69:30 true and enforce just converts all these values into zeroes and ones that we

69:38 that will serve as our labels the . So if you want to

69:44 Second, learn to trim the largest model again way we need to import

69:49 more do so that so that we use it in our workspace. The

69:56 to do that is to simply right this coat from second learned doctor the

70:00 modern. Because Lena regret largest regression to the category linear more than from

70:07 and dot William or the importance of regression and then feel less Korean.

70:17 this one prepared data import largest regression the training party is very, very

70:26 if you use in second the theory second packages training part. So this

70:31 regression, that is what has just imported from second learn. And don't

70:36 about this practice instead of a few that a user can cast pacifying your

70:42 to really taters. Is this regression due to their spacing problem? But

70:48 , don't worry about it. I this'll am code. I'm justifying my

70:57 regression over them. I want a weapon. I want the largest

71:02 um, algorithm with this two So and I name my largest regression

71:11 them as log on this contract. they seize my, um, largest

71:18 over them Now I'm with that I'm ready to do the training part

71:24 it. Very, very easy on on it. The training parties down

71:32 . Wait. This is the name my largest regression classifier. Start feet

71:39 followed by Z input variable and doubtful . You run it and that's

71:46 That's see, that's That's all you to do in order to train a

71:51 river. Modern without foot here just you what you're more. The parameters

71:58 so many of you don't don't don't to worry about is because the default

72:02 primitive relatives are humanity's is good for purposes If you want to find

72:08 the learns more the premises from within thing is it there? Oh,

72:13 is essentially the intercept. You just the You just write down the name

72:22 these this regression classifier thought intercept on skull Run it ! And so that

72:29 the moral parameter Save the state a learned from largest regression. And

72:36 if you want to find other all other equivalents, for example, it

72:39 to three in this. In this , we only have two features,

72:43 we won't have to. The one two the way to the finals these

72:49 is to using the coat here. think this is the name of my

72:56 Regan classifier Thought co you feed on , Oh, yeah, on the

73:03 . What you found, too. you want a summary statistics for your

73:09 regression, for example, if you to find out the overall accuracy of

73:12 predictions from largest regression, you can call this method called Scott. So

73:20 this case, the prediction accuracy's 92 6% which is not bad, since

73:28 haven't says where were mostly using the default more the printer matavz against.

73:37 that's that's it. That's the premium . And you want to. If

73:44 want to predict, you can call method of social awaits largest regretting,

73:48 predict, um, printed. You use your predict or predict on the

73:55 . Probable. That will give you really, um, really longer value

74:02 this syringe from 0 to 1, this is a new data that you

74:07 to make a prediction on. So part of code to civilization okay,

74:21 you. Yes. So this is data. All the train goes and

74:29 . That is our training data. these death black line, that is

74:34 decision boundary and all these street line street, different colors that it's the

74:43 lines for the predictions. Okay, that's it for implementing, implementing just

74:53 in secular and very easy, very . For the training part, this

74:58 all you need to do. Largest you in the name of your different

75:02 dot feet and or the tribune or mathematics or the optimal Asian part is

75:08 care of by this simple coat. , so that's all for today.

75:16 you for attention on. If you any question, you can send me

75:20 or you can ask questions in the ,

-
+