CS229 - Machine Learning - Lecture 7 - Support Vector Machines

joel | Uncategorized | Tuesday, October 28th, 2008

Functional margin

If we use the classification scheme h(T,x) = sigmoid(T’x) > 0.5, then we are really just checking to see if T’x > 0. The greater T’x is away from 0, the more “confidence” we have in the prediction.

  • Functional margin:
  • Geometric margin: distance between the separating hyperplane and the nearest point in the dataset.

Notation for Geometric margin:

  • class labels y in {-1, 1}
  • h(w, b, x) must output {-1 or 1} so choose g(x) = 1{x>0}
  • w is in R^n
  • x is in R^n
  • b is a real number
  • h(w,b,x) = g(w’x + b)
  • “Functional margin for ith example” gamhat_i = yi (w’xi + b)
    This is the same as the distance of the prediction from the separating line
  • Margin for the training set is the worst case margin of all examples

Note that the ‘w’ term here can scale arbitrarily. So we also need a constraint that |w|=1. Or we could choose the constraint that gamhat=1.

Geometric Margin

Geometric margin gamma_i is just the distance between the points x_i and the decision boundary measured in the space of x.

  • gam_i = (w’x+b) / ||w||

This doesnt change as w and b are scaled.

So we can write a linear programming problem to optimize the geometric margin gamma

  • if |w|=1 then gamhat = gamma
  • by scaling w,b we can set gamhat without changing gamma

So here’s a non convex optimization problem:

  • input constants: xi, yi
  • variables: w, b
  • maximize gam
  • constraint all i: yi(w’xi+b) > gam
  • constraint: ||w||=1 {non-convex}

Optimal Margin Classifier

But if we want to make this into a convex optimization problem:

  • input constants: xi, yi
  • variables: w, b
  • maximize gamhat/||w|| {same as geometric margin}
  • constraint all i: yi(w’xi+b) > gamhat
  • set gamhat = 1

Which is the same as minimizing ||w||, which makes this problem quadratic.

Lagrange Multipliers

Here’s a quick recap on lagrange multipliers.

Let’s say we want to minimize f(w) subject to some constraints:

  • min f(w)
  • st gi(w) <= 0 {i in 1..k} # NEGATIVE!
  • st hi(w) == 0 {i in 1..l}

The “generalized Lagrangian” is

  • L(w, alp, beta) = f(w) + Sum{i:1..k} alph_i gi(w) + Sum{i:1..l} beta_i hi(w)

where alpha_i and beta_i are the Lagrange multipliers. We seek solutions that meet these constraints:

  • dL/dw = 0 # solution is a local max wrt
  • dL/dalpha = 0 {true if and only if gi(w) = 0 }
  • dL/dbeta = 0 {true if and only if hi(w) = 0 }

Lagrange Duality - Primal optimization problem

Now we pose the primal optimization problem. Find the value of w which minimizes the maximum value of L(w,alpha,beta) when we pick the worst possible value of the lagrange multipliers.

Theta_p* = Min_{w} {{ Max{alpha>=0,beta} L(w,alpha,beta) }} = min_w Theta_p(w,alpha,beta)

Note that this is a solution because if the constraints are violated, we can choose

  • If gi(w) > 0 => Theta_p(w) = Infinity
  • If hi(w) != 0 => Theta_p(w) = Infinity

so the only way to prevent L(w,alpha,beta) from blowing up is if the constraints are satisfied.

Lagrange Duality - Dual optimization problem

We can also search for the maximal value of the lagrange multipliers over the mins.

Theta_d* = Max_{alpha, beta} {{ Min{w} L(w,alpha,beta) }}

Note that

  • Theta_d* <= Theta_p*

The dual problem is actually a little easier to solve.

The Sensation of Hunger

joel | life | Tuesday, October 21st, 2008

I had a conversation with someone the other day about severely obese people and the perception, right or wrong, that fat people are lazy or undisciplined.

I guess I think that it’s not a cut an dried case of willpower.  But even if it were consider this: is it easier for a person who is 100 pounds overweight to loose five pounds, or for a person who is five pounds overweight to loose five pounds?   After all, if you’re 100 pounds overweight there is a strong possibility that something is wrong with your biochemistry.  If you are only five pounds overweight, you are probably a normal person.  But those of us who are five pounds overweight might think that those last five pounds are the hardest to loose.

Consider this.  A can of soda has about 150 calories in it.  If you burn 2500 calories per day, this will get you by for a little more than an hour.  Say you’re on a diet but you get hungry at a certain point, and the hunger impulse tells you that a can of soda would be the bees knees.  If you say no to that impulse you’ve saved 150 calories.  Do that 23 more times, and you’ll have saved the equivalent of a pound of fat.  To loose five pounds, you’d have to say no to about 100 cans of soda.

Okay, some of you may be thinking, that’s easy because I dont drink that stuff anyway.  Well I unfortunately rely on the stuff to get through the workday, an unfortunate habit that started in all-night coding sessions in grad school when the only food in the lab was a vending machine.  The stuff is great for keeping mental focus, but there are side effects.

Today I asked myself to fast a bit in the afternoon and asked what the sensation of hunger feels like, and I realized something important.  The impulse to eat while working isn’t really hunger at all - at least not the hunger associated with an empty stomach.  I actually noticed that my cravings had at their source a general feeling of irritation and fatigue.   It was a sort of diffuse itchy tingling in my skin and throughout my body, which might have been resolved with a short nap instead of a snack.

Is it possible that the sugar and caffeine actually act as an analgesic?

I reasoned that if I stayed with this feeling of hunger long enough that I might become desensitized to it, but that experiment was interrupted by something else.  Maybe tomorrow I will try fasting some more and see what deeper insight I can get into hunger.

 

CS229 - Lecture 1

joel | Uncategorized | Tuesday, October 14th, 2008

Taking Andrew Ng’s machine learning class at Stanford this quarter.  Here are some memorable points from the first lecture.

Definition of machine learning: Program learns from experience E on some performance measure P if P improves with experience E.

“SUPERVISED LEARNING

Given set of example and correct answers, build a predictor that estimates new datasets

Examples:

  • regression (house prices)
  • classification/labeling (spam)

“UNSUPERVISED LEARNING”
given a dataset - find structure (k means/clustering)
examples image segmentation, cocktail party problem (ICA)

“REINFORCEMENT LEARNING”
use reward/punishment methodology to perform some task - example robots learning to walk

Turning points

joel | Uncategorized | Sunday, October 12th, 2008

The market has been through some major shifts in the last couple of months, and it makes sense to look at historical prices, if only to get an idea what the time constants underlying mob behavior really are.  If you zoom out over the last 16 years or so and look at the share price for the SP500, you get a chart that looks like this:

Turning Points in the SP500

 The red lines are simple linear trends on the log data, so they represent constant exponential growth and decay with time.  It is pretty amazing that the major trends in the data can be explained by just four changes in the long-term growth and decay rate.  Let’s examine each of these in turn and consider what factors might be driving the stock market’s multi-year cycles.

Turning Point A: January 1995

The first major turning point is in January of 1995, the second term of the Clinton presidency.   There are several factors which could contribute:

  • During this time Netscape and Dell were ramping up penetration of PCs and the internet in American households. 
  • May 2003 - capital gains cut to 15%
  • The federal funds rate was also rising from a long period of easy money at 3% to 5% at the end of the year.  1995 was also marked by a spike in treasury yields.
  • And Newt Gingrich and the Republicans had reclaimed the house with the so-called “Contract with America”. 
  • Formation of the WTO.

Turning Point B: 2000

In early 2000 the SP500 turns south to the tune of 15%/year.

  • The start of the Bush Presidency
  • The beginning of dearer oil (at slightly higher rates of $24/barrel)
  • Higher interest rates (federal funds rate of 6.5%)

Interesting although the 9/11 attacks may have contributed to the continued slide, the downward trend in the economy was already present in mid 2000.

Turning Point C: late 2002

  • Gulf war II begins
  • Federal funds rate lowered to 1.25%, partially in response to 9/11 and downturn

Turning Point D: late 2007

  • Financial crisis
  • Mortgage meltdown

It guess the major open question is how do these events correlate to the flows of money which amount to trillions of dollars and how to we detect that the tide is receding or advancing…

 

 

Counting Rhythm - Soulfege

joel | the band | Sunday, October 12th, 2008

I often find that I get lost when counting rhythms.   When I was in band, they taught us to count this way:

one-e-and-a two-e-and-a three-e-and-a four-e-and-a

The problem comes when you skip a lot of downbeats, it’s hard to remember which beat you’re on.  I’m going to try counting with this system for a while:

one-e-and-a

two-ti-tan-tuh

three-de-dan-duh

four-ve-van-vuh

I used the version of the closest consonant that is easy to say fast.  

The Value of Trust

joel | Uncategorized | Tuesday, October 7th, 2008

The recent financial crisis is all about trust.  Trust in banks, trust in business, trust in government and trust in each other.

Its obvious that abuse of trust led to many of these problems.  If, for example, morgage brokers hadn’t created so called “stated income loans” many fewer mortgages would be in default.  Borrowers on the other hand should have heard alarm bells going off when participating in this scam, but I guess people are just dumb.

Now the fallout from this is that many loans made throughout the financial system are turning bad and the money markets are frozen because banks dont even trust each other.  In times like this, people buy “safe” government bonds in the belief that the government can always print more money if it needs to.

Which really isnt as bad as it sounds.  If the government is willing to act as a lender of last resort, it is also capable of being the bank of last resort for depositors.  Since rates on short term govt bonds are close to zero this kind of makes sense.

Another problem is that if the government really becomes a player in the banking business, what is to prevent it from stomping out private banks?  After all private banks have to follow government regulations, but who regulates the treasury?

What happens, for example if the Treasury takes its deposits and makes bad loans?  This is basically the way Fannie Mae and Freddie Mac operated and wound up with negative balance sheets.  What happens if the loans to the Treasury are not repaid?