changeset 1:90d2a292663c

Do k-means
author Jordi Gutiérrez Hermoso <jordigh@octave.org>
date Mon, 05 Dec 2011 01:22:39 -0500
parents ded78d0b4987
children be1f915bd52a
files computeCentroids.m findClosestCentroids.m kMeansInitCentroids.m
diffstat 3 files changed, 33 insertions(+), 87 deletions(-) [+]
line wrap: on
line diff
--- a/computeCentroids.m
+++ b/computeCentroids.m
@@ -1,40 +1,17 @@
 function centroids = computeCentroids(X, idx, K)
-%COMPUTECENTROIDS returs the new centroids by computing the means of the 
-%data points assigned to each centroid.
-%   centroids = COMPUTECENTROIDS(X, idx, K) returns the new centroids by 
-%   computing the means of the data points assigned to each centroid. It is
-%   given a dataset X where each row is a single data point, a vector
-%   idx of centroid assignments (i.e. each entry in range [1..K]) for each
-%   example, and K, the number of centroids. You should return a matrix
-%   centroids, where each row of centroids is the mean of the data points
-%   assigned to it.
-%
-
-% Useful variables
-[m n] = size(X);
-
-% You need to return the following variables correctly.
-centroids = zeros(K, n);
-
+  ##COMPUTECENTROIDS returs the new centroids by computing the means of the 
+  ##data points assigned to each centroid.
+  ##   centroids = COMPUTECENTROIDS(X, idx, K) returns the new centroids by 
+  ##   computing the means of the data points assigned to each centroid. It is
+  ##   given a dataset X where each row is a single data point, a vector
+  ##   idx of centroid assignments (i.e. each entry in range [1..K]) for each
+  ##   example, and K, the number of centroids. You should return a matrix
+  ##   centroids, where each row of centroids is the mean of the data points
+  ##   assigned to it.
+  ##
 
-% ====================== YOUR CODE HERE ======================
-% Instructions: Go over every centroid and compute mean of all points that
-%               belong to it. Concretely, the row vector centroids(i, :)
-%               should contain the mean of the data points assigned to
-%               centroid i.
-%
-% Note: You can use a for-loop over the centroids to compute this.
-%
-
-
+  centroids = cell2mat(cellfun(@(i) mean (X(idx == i, :)), 
+                               num2cell([1:K]'), "uniformoutput", false))
 
-
-
-
-
+endfunction
 
-% =============================================================
-
-
-end
-
--- a/findClosestCentroids.m
+++ b/findClosestCentroids.m
@@ -1,33 +1,16 @@
 function idx = findClosestCentroids(X, centroids)
-%FINDCLOSESTCENTROIDS computes the centroid memberships for every example
-%   idx = FINDCLOSESTCENTROIDS (X, centroids) returns the closest centroids
-%   in idx for a dataset X where each row is a single example. idx = m x 1 
-%   vector of centroid assignments (i.e. each entry in range [1..K])
-%
-
-% Set K
-K = size(centroids, 1);
-
-% You need to return the following variables correctly.
-idx = zeros(size(X,1), 1);
+  ##FINDCLOSESTCENTROIDS computes the centroid memberships for every example
+  ##   idx = FINDCLOSESTCENTROIDS (X, centroids) returns the closest centroids
+  ##   in idx for a dataset X where each row is a single example. idx = m x 1 
+  ##   vector of centroid assignments (i.e. each entry in range [1..K])
+  ##
 
-% ====================== YOUR CODE HERE ======================
-% Instructions: Go over every example, find its closest centroid, and store
-%               the index inside idx at the appropriate location.
-%               Concretely, idx(i) should contain the index of the centroid
-%               closest to example i. Hence, it should be a value in the 
-%               range 1..K
-%
-% Note: You can use a for-loop over the examples to compute this.
-%
+  ## Set K
+  K = rows (centroids);
+  
+  ## Using broadcasting (auto BSX) as available in Octave 3.5.0+
+  d = sum ((permute (X, [1,3,2]) - permute (centroids, [3,1,2])).^2, 3);
+  [~, idx] = min (d, [], 2);
 
-
-
-
-
+endfunction
 
-
-% =============================================================
-
-end
-
--- a/kMeansInitCentroids.m
+++ b/kMeansInitCentroids.m
@@ -1,26 +1,12 @@
 function centroids = kMeansInitCentroids(X, K)
-%KMEANSINITCENTROIDS This function initializes K centroids that are to be 
-%used in K-Means on the dataset X
-%   centroids = KMEANSINITCENTROIDS(X, K) returns K initial centroids to be
-%   used with the K-Means on the dataset X
-%
-
-% You should return this values correctly
-centroids = zeros(K, size(X, 2));
-
-% ====================== YOUR CODE HERE ======================
-% Instructions: You should set centroids to randomly chosen examples from
-%               the dataset X
-%
-
-
-
-
-
-
-
-
-% =============================================================
+##KMEANSINITCENTROIDS This function initializes K centroids that are to be 
+##used in K-Means on the dataset X
+##   centroids = KMEANSINITCENTROIDS(X, K) returns K initial centroids to be
+##   used with the K-Means on the dataset X
+##
+  
+  ## Using second argument to randperm implemented in dev version
+  centroids = X(randperm (rows (X), K), :);
 
 end