CS229 - Lecture 13 & 14 - K-means and the EM Algorithm

joel | Uncategorized | Sunday, November 9th, 2008

Confusion Matrix: true/false positives/negatives

Precision: truepositive / all positive

Recall: truepositive / (truepos +falseneg)

We can plot a Precision/recall curve to express the tradeoff between decision thresholds.  Area under the curve can be used to generate a single number of validation.

 

Unsupervised Learning - Find “Structure” in Data

K- means

  • Start with k centroids in random starting positions
  • Label points with closest point
  • Move centroids to mean of their respective areas

 K-means converges, but needs multi-start to find the good clustering

 

Density Estimation

Given a data set x, model P(x) which models the probability density of the new features and use it to detect anomalys.

Mixture of gaussians model. 

Imagine that the xi’s are determined from a hidden multinomial selector variable zi. 

P(xi,zi) = P(xi|zi)P(zi)

Where P(xi|zi=j) ~ Norm(mu, sigma)

The big difference here between EM and Gaussian Discriminant Analysis is we dont know the labels y and z.

We can then compute the maximum value of this:

P(xi) = Sum{zi} P(x,zi)

Max (mu,sigma,P(z)) Prod Prob(xi)

 Procedure -

  • “E” Step
  • guess some value of the zis
  • compute the parameters wi
  • “M” Step
    Find max liklihood of parameters Sigma, Mu_j

Jensens inequality

  • f(x) is a convex function f”(x) >= 0
  • let x be a random variable
  • f(E[x]) <= E[f(x)]
  • and if f(x) > 0, then inequality holds with equality iff X is constant

EM Algorithm

EM algorithm constructs a lower bound for l(Theta) for some Theta and then the M-step moves theta to the maximum for that lower bound.

 To optimize P(x,z; Theta)

  • P(Theta)
  • = Sum_i:m {{ log P(xi, Theta) }}
  • = Sum_i:m {{ Sum_j log P(xi, zi=j; Theta) }}
  • = Sum_i:m {{ Sum_j log Q(zi) P(xi, zi=j; Theta)/Q(zi) }}
    where Q(zi) is a distribution over the possible zi
  • =Sum_i:m {{ log Expect[ P(xi, zi=j; Theta)/Q(zi)] }}
  • >= Sum_i:m {{  Expect[P(xi, zi=j; Theta)/Q(zi)] }}
  • >= Sum_i:m {{ Sum_j Q(zi)  log [ P(xi, zi=j; Theta)/Q(zi)] }}

During the E-step we can now choose Q such that l(Theta) == l(Theta, Q) which by Jensens inequality if

  • P(xi,zi;Theta)/Q(zi) = constant
  • Q(zi) = alpha P(xi,zi,Theta)

So for a guessed value of Theta, choose Qi such that

  • Q(zi)
  • = P(xi,zi;Theta) / Sum{zi} P(xi,zi;Theta)
  • = P(zi|xi; Theta)

To Summarize:

  • Start with some theta
  • Estep: choose Qi = P(zi|xi; Theta)
  • Mstep: choose new Theta
    = argmax{Theta}  Sum_i Sum_zi { Q(zi) log P(xi,zi,Theta)/Q(zi) }

EM can be viewed as coordinate ascent in Q and Theta.

Pretty cool. 

 

 

 

 

Leave a comment

»

RSS feed for comments on this post. TrackBack URI