Developer API¶
The following entries are all of the functions and classes developed for this package as submodules. The source code is available.
kmeans.base_funcs module¶
- kmeans.base_funcs._assign_clusters(
- data: ndarray,
- centroids: ndarray,
Assigns each data element to a cluster
- Parameters:
data – The data to be labeled.
centroids – The given information to use as cluster criteria.
- Returns:
The Clusters
- Return type:
dict[int, np.ndarray]
- kmeans.base_funcs._generate_means(data: ndarray, k: int, ndim: int) ndarray ¶
Randomly selects initial means with uniform distribution
- Parameters:
data – The data from which the means are selected
k – How many means to select
ndim – Dimensionality of means
- Returns:
Initial Cluster Centroids
- Return type:
np.ndarray
- Raises:
ValueError – If can’t find unique set of means.
- kmeans.base_funcs._new_centroids(clusters: dict[int, ndarray], ndim: int) ndarray ¶
Returns a new set of centroids
- Parameters:
clusters – the current grouped data
ndim – Dimension of data we are clustering
- Returns:
New Centroids
- Return type:
np.ndarray
- kmeans.base_funcs._validate(
- data: ndarray | list[ndarray] | tuple[ndarray],
- k: int,
- *,
- initial_means: ndarray | None = None,
- ndim: int | None = None,
- tolerance: float = 0.5,
- max_iterations: int = 100,
Perform validation checks on cluster arguments
- Parameters:
data – The input data
k – Amount of clusters desired
initial_means – The initial cluster centroids
ndim – Dimension limit for clustering. If default, the length of a given data element is used (all data dimensions clustered).
tolerance – Max tolerable distance a centroid can move before requiring another round of clustering
max_iterations – Max number of iterations before terminating function execution.
- Returns:
Validated Data, Initial Centroids, ndim
- Return type:
np.ndarray, np.ndarray, int
- Raises:
ValueError – if an input argument is incorrect in value
TypeError – if an input argument is of the wrong type.
kmeans.clustering module¶
- exception kmeans.clustering.MaxIterationError¶
Bases:
Exception
An exception to be raised when the maximum iteration tolerance is exceeded.
- kmeans.clustering.cluster(
- data: ndarray | list[ndarray] | tuple[ndarray],
- k: int,
- *,
- initial_means: ndarray | list[ndarray] | tuple[ndarray] | None = None,
- ndim: int | None = None,
- tolerance: float = 4.440892098500626e-15,
- max_iterations: int = 250,
Perform k-means clustering
The input data should be formatted in terms of row vectors. Given a flat numpy array
data=np.array([0, 1, 2, 3, 4])
, do the following:data = data.reshape(data.shape[-1], -1) # or data = data[..., np.newaxis]
It should make each point a row entry:
[[0], [1], [2], [3], [4]]
Data of higher dimensions (ex. a multi-channeled image) should be flattened using the number of indices for the deepest dimension. So, for an image with shape (480, 640, 3), run:
data = data.reshape(-1, data.shape[-1])
- Parameters:
data – The input data. Expects data homogeneity (all elements are the same dimension)
k – Amount of clusters desired.
initial_means – The initial cluster centroids. Means are randomly selected from data with uniform probability by default.
ndim – Dimension limit for clustering. If default, the length of a given data element is used (all data dimensions clustered).
tolerance – Controls the completion criteria. Lower values -> more iterations. Defaults to 20*eps for np.float64.
max_iterations – Max number of iterations before terminating function execution.
- Returns:
Clustered Data, Cluster Centroids
- Return type:
dict[int, np.ndarray], np.ndarray
- Raises:
kmeans.MaxIterationError – Raise this exception if the clustering doesn’t converge before reaching the max_iterations count.
kmeans.animate module¶
- kmeans.animate._draw(
- clusters: dict[int, ndarray],
- centroids: ndarray,
- ax_obj: Axes,
- ndim: int,
- *,
- legend_loc: str = 'best',
Draws the clusters onto the figure
- Parameters:
clusters – The segmented data
centroids – The centers of the clusters
ax_obj – The axes object (from the figure)
ndim – The number of dimensions
legend_loc – Where to place the legend. Defaults to ‘best’
- Returns:
None
- kmeans.animate.view_clustering(
- data: ndarray | list[ndarray] | tuple[ndarray],
- k: int,
- *,
- initial_means: ndarray | list[ndarray] | tuple[ndarray] | None = None,
- ndim: int | None = None,
- tolerance: float = 4.440892098500626e-15,
- max_iterations: int = 250,
Perform and display k-means clustering
This is the same as
kmeans.cluster()
, just with plotting side-effects.- Parameters:
data – The input data. Expects data homogeneity (all elements are the same dimension)
k – Amount of clusters desired.
initial_means – The initial cluster centroids. Means are randomly selected from data with uniform probability by default.
ndim – Dimension limit for clustering. If default, the length of a given data element is used (all data dimensions clustered).
tolerance – Controls the completion criteria. Lower values -> more iterations. Defaults to 20*eps for np.float64.
max_iterations – Max number of iterations before terminating function execution.
- Returns:
Clustered Data, Cluster Centroids, Matplotlib Figure
- Return type:
dict[int, np.ndarray], np.ndarray, matplotlib.figure.Figure
- Raises:
ValueError – if calculated
ndim
or providedndim
is neither 2 nor 3.kmeans.MaxIterationError – Raise this exception if the clustering doesn’t converge before reaching the
max_iterations
count.
kmeans.segmentation module¶
- kmeans.segmentation._append_coords(img: ndarray) ndarray ¶
Append each pixel’s coordinate to itself.
- Parameters:
img – The image.
- Returns:
Array with appended indices.
- Return type:
np.ndarray
- kmeans.segmentation.segment_img(
- img: ndarray,
- groups: int,
- random_colors: bool = False,
Segment the input RGB image by color groups.
- Parameters:
img – The image to be segmented. Assumes RGB
groups – How many groups the image is segmented into. Higher numbers -> more detail
random_colors – Provide each group with a randomized RGB color instead of the average color.
- Returns:
Segmented Image
- Return type:
np.ndarray