Developer API

The following entries are all of the functions and classes developed for this package as submodules. The source code is available.

kmeans.base_funcs module

kmeans.base_funcs._assign_clusters(
data: ndarray,
centroids: ndarray,
) dict[int, ndarray]

Assigns each data element to a cluster

Parameters:
  • data – The data to be labeled.

  • centroids – The given information to use as cluster criteria.

Returns:

The Clusters

Return type:

dict[int, np.ndarray]

kmeans.base_funcs._generate_means(data: ndarray, k: int, ndim: int) ndarray

Randomly selects initial means with uniform distribution

Parameters:
  • data – The data from which the means are selected

  • k – How many means to select

  • ndim – Dimensionality of means

Returns:

Initial Cluster Centroids

Return type:

np.ndarray

Raises:

ValueError – If can’t find unique set of means.

kmeans.base_funcs._new_centroids(clusters: dict[int, ndarray], ndim: int) ndarray

Returns a new set of centroids

Parameters:
  • clusters – the current grouped data

  • ndim – Dimension of data we are clustering

Returns:

New Centroids

Return type:

np.ndarray

kmeans.base_funcs._validate(
data: ndarray | list[ndarray] | tuple[ndarray],
k: int,
*,
initial_means: ndarray | None = None,
ndim: int | None = None,
tolerance: float = 0.5,
max_iterations: int = 100,
) tuple[ndarray, ndarray, int]

Perform validation checks on cluster arguments

Parameters:
  • data – The input data

  • k – Amount of clusters desired

  • initial_means – The initial cluster centroids

  • ndim – Dimension limit for clustering. If default, the length of a given data element is used (all data dimensions clustered).

  • tolerance – Max tolerable distance a centroid can move before requiring another round of clustering

  • max_iterations – Max number of iterations before terminating function execution.

Returns:

Validated Data, Initial Centroids, ndim

Return type:

np.ndarray, np.ndarray, int

Raises:
  • ValueError – if an input argument is incorrect in value

  • TypeError – if an input argument is of the wrong type.

kmeans.clustering module

exception kmeans.clustering.MaxIterationError

Bases: Exception

An exception to be raised when the maximum iteration tolerance is exceeded.

kmeans.clustering.cluster(
data: ndarray | list[ndarray] | tuple[ndarray],
k: int,
*,
initial_means: ndarray | list[ndarray] | tuple[ndarray] | None = None,
ndim: int | None = None,
tolerance: float = 4.440892098500626e-15,
max_iterations: int = 250,
) tuple[dict[int, ndarray], ndarray]

Perform k-means clustering

The input data should be formatted in terms of row vectors. Given a flat numpy array data=np.array([0, 1, 2, 3, 4]), do the following:

data = data.reshape(data.shape[-1], -1)
# or
data = data[..., np.newaxis]

It should make each point a row entry:

[[0], [1], [2], [3], [4]]

Data of higher dimensions (ex. a multi-channeled image) should be flattened using the number of indices for the deepest dimension. So, for an image with shape (480, 640, 3), run:

data = data.reshape(-1, data.shape[-1])
Parameters:
  • data – The input data. Expects data homogeneity (all elements are the same dimension)

  • k – Amount of clusters desired.

  • initial_means – The initial cluster centroids. Means are randomly selected from data with uniform probability by default.

  • ndim – Dimension limit for clustering. If default, the length of a given data element is used (all data dimensions clustered).

  • tolerance – Controls the completion criteria. Lower values -> more iterations. Defaults to 20*eps for np.float64.

  • max_iterations – Max number of iterations before terminating function execution.

Returns:

Clustered Data, Cluster Centroids

Return type:

dict[int, np.ndarray], np.ndarray

Raises:

kmeans.MaxIterationError – Raise this exception if the clustering doesn’t converge before reaching the max_iterations count.

kmeans.animate module

kmeans.animate._draw(
clusters: dict[int, ndarray],
centroids: ndarray,
ax_obj: Axes,
ndim: int,
*,
legend_loc: str = 'best',
) None

Draws the clusters onto the figure

Parameters:
  • clusters – The segmented data

  • centroids – The centers of the clusters

  • ax_obj – The axes object (from the figure)

  • ndim – The number of dimensions

  • legend_loc – Where to place the legend. Defaults to ‘best’

Returns:

None

kmeans.animate.view_clustering(
data: ndarray | list[ndarray] | tuple[ndarray],
k: int,
*,
initial_means: ndarray | list[ndarray] | tuple[ndarray] | None = None,
ndim: int | None = None,
tolerance: float = 4.440892098500626e-15,
max_iterations: int = 250,
) tuple[dict[int, ndarray], ndarray, Figure]

Perform and display k-means clustering

This is the same as kmeans.cluster(), just with plotting side-effects.

Parameters:
  • data – The input data. Expects data homogeneity (all elements are the same dimension)

  • k – Amount of clusters desired.

  • initial_means – The initial cluster centroids. Means are randomly selected from data with uniform probability by default.

  • ndim – Dimension limit for clustering. If default, the length of a given data element is used (all data dimensions clustered).

  • tolerance – Controls the completion criteria. Lower values -> more iterations. Defaults to 20*eps for np.float64.

  • max_iterations – Max number of iterations before terminating function execution.

Returns:

Clustered Data, Cluster Centroids, Matplotlib Figure

Return type:

dict[int, np.ndarray], np.ndarray, matplotlib.figure.Figure

Raises:
  • ValueError – if calculated ndim or provided ndim is neither 2 nor 3.

  • kmeans.MaxIterationError – Raise this exception if the clustering doesn’t converge before reaching the max_iterations count.

kmeans.segmentation module

kmeans.segmentation._append_coords(img: ndarray) ndarray

Append each pixel’s coordinate to itself.

Parameters:

img – The image.

Returns:

Array with appended indices.

Return type:

np.ndarray

kmeans.segmentation.segment_img(
img: ndarray,
groups: int,
random_colors: bool = False,
) ndarray

Segment the input RGB image by color groups.

Parameters:
  • img – The image to be segmented. Assumes RGB

  • groups – How many groups the image is segmented into. Higher numbers -> more detail

  • random_colors – Provide each group with a randomized RGB color instead of the average color.

Returns:

Segmented Image

Return type:

np.ndarray