[ VIGRA Homepage | Function Index | Class Index | Namespaces | File List | Main Page ]
Unsupervised Decomposition |
Classes | |
class | PLSAOptions |
Option object for the pLSA algorithm. More... | |
Functions | |
template<... > | |
void | pLSA (...) |
Decompose a matrix according to the pLSA algorithm. More... | |
template<class T , class C1 , class C2 , class C3 > | |
void | principalComponents (MultiArrayView< 2, T, C1 > const &features, MultiArrayView< 2, T, C2 > fz, MultiArrayView< 2, T, C3 > zv) |
Decompose a matrix according to the PCA algorithm. More... | |
Unsupervised matrix decomposition methods.
void vigra::principalComponents | ( | MultiArrayView< 2, T, C1 > const & | features, |
MultiArrayView< 2, T, C2 > | fz, | ||
MultiArrayView< 2, T, C3 > | zv | ||
) |
Decompose a matrix according to the PCA algorithm.
This function implements the PCA algorithm (principal component analysis).
(numFeatures * numSamples)
, which is decomposed into the matrices (numFeatures * numComponents)
and (numComponents * numSamples)
such that
(this formula requires that the features have been centered around the mean by linalg::prepareRows (features, features, ZeroMean)
).
The shape parameter numComponents
determines the complexity of the decomposition model and therefore the approximation quality (if numComponents == numFeatures
, the representation becomes exact). Intuitively, fz
is a projection matrix from the reduced space into the original space, and zv
is the reduced representation of the data, using just numComponents
features.
Declaration:
#include <vigra/unsupervised_decomposition.hxx>
Usage:
void vigra::pLSA | ( | ... | ) |
Decompose a matrix according to the pLSA algorithm.
This function implements the pLSA algorithm (probabilistic latent semantic analysis) proposed in
T. Hofmann: "Probabilistic Latent Semantic Analysis", in: UAI'99, Proc. 15th Conf. on Uncertainty in Artificial Intelligence, pp. 289-296, Morgan Kaufmann, 1999
(numFeatures * numSamples)
and non-negative entries, which is decomposed into the matrices (numFeatures * numComponents)
and (numComponents * numSamples)
such that
(this formula applies when pLSA is called with PLSAOptions.normalizedComponentWeights(false)
. Otherwise, you must normalize the features by calling linalg::prepareColumns (features, features, UnitSum)
to make the formula hold).
The shape parameter numComponents
determines the complexity of the decomposition model and therefore the approximation quality. Intuitively, features are a set of words, and the samples a set of documents. The entries of the features
matrix denote the relative frequency of the words in each document. The components represents a (presumably small) set of topics. The matrix fz
encodes the relative frequency of words in the different topics, and the matrix zv
encodes to what extend each topic explains the content of each document.
The option object determines the iteration termination conditions and the output normalization. In addition, you may pass a random number generator to pLSA() which is used to create the initial solution.
Declarations:
#include <vigra/unsupervised_decomposition.hxx>
Usage:
© Ullrich Köthe (ullrich.koethe@iwr.uni-heidelberg.de) |
html generated using doxygen and Python
|