Clustering

Next: Data Plotting Up: Sorting Previous: PCA based sorting

Clustering

Last step of the unit sorting is the clustering of the data. NEV2lkit uses KlustaKwik, a program for unsupervised classification of multidimensional continuous data. KlustaKwik delivers among others: fit a mixture of Gaussians with unconstrained covariance matrices, chooses automatically the number of mixture components, and runs fast on large data sets. KlustaKwik is based on the classification expectation maximization algorithm of Celeux and Govaert [CEM].

Definable parameters for the clustering process in the NEV2lkit GUI are:

[MINCLUSTERS] the random initial assignment will have no less than n clusters. The final number may be different, since clusters can be split or deleted during the course of the algorithm. The default value is 1.

[MAXCLUSTERS] defines the maximum of possible clusters n. Cluster splitting can produce no more than n clusters. The default value is 5.

[PENALTYMIX] it is possible to specify Bayesian information content (BIC) or Akaike information content (AIC) as the penalty for a larger number of clusters or a mixture of these two. This widget allows to define the amount of BIC to use a penalty for more clusters. Default of 0 sets to use all AIC. Use 10 to use all BIC (this generally produces fewer clusters).

All other parameters which are known in the standalone version of KlustaKwik program can be edited in the source code of NEV2lkit.