[ VIGRA Homepage | Function Index | Class Index | Namespaces | File List | Main Page ]

details Unsupervised Decomposition VIGRA

Classes

class  PLSAOptions
 Option object for the pLSA algorithm. More...
 

Functions

template<... >
void pLSA (...)
 Decompose a matrix according to the pLSA algorithm. More...
 
template<class T , class C1 , class C2 , class C3 >
void principalComponents (MultiArrayView< 2, T, C1 > const &features, MultiArrayView< 2, T, C2 > fz, MultiArrayView< 2, T, C3 > zv)
 Decompose a matrix according to the PCA algorithm. More...
 

Detailed Description

Unsupervised matrix decomposition methods.

Function Documentation

void vigra::principalComponents ( MultiArrayView< 2, T, C1 > const &  features,
MultiArrayView< 2, T, C2 >  fz,
MultiArrayView< 2, T, C3 >  zv 
)

Decompose a matrix according to the PCA algorithm.

This function implements the PCA algorithm (principal component analysis).

  • features must be a matrix with shape (numFeatures * numSamples), which is decomposed into the matrices
  • fz with shape (numFeatures * numComponents) and
  • zv with shape (numComponents * numSamples)

such that

\[ \mathrm{features} \approx \mathrm{fz} * \mathrm{zv} \]

(this formula requires that the features have been centered around the mean by linalg::prepareRows (features, features, ZeroMean)).

The shape parameter numComponents determines the complexity of the decomposition model and therefore the approximation quality (if numComponents == numFeatures, the representation becomes exact). Intuitively, fz is a projection matrix from the reduced space into the original space, and zv is the reduced representation of the data, using just numComponents features.

Declaration:

#include <vigra/unsupervised_decomposition.hxx>

namespace vigra {
template <class U, class C1, class C2, class C3>
void
principalComponents(MultiArrayView<2, U, C1> const & features,
MultiArrayView<2, U, C2> fz,
MultiArrayView<2, U, C3> zv);
}

Usage:

Matrix<double> data(numFeatures, numSamples);
... // fill the input matrix
int numComponents = 3;
Matrix<double> fz(numFeatures, numComponents),
zv(numComponents, numSamples);
// center the data
prepareRows(data, data, ZeroMean);
// compute the reduced representation
principalComponents(data, fz, zv);
Matrix<double> model = fz*zv;
double meanSquaredError = squaredNorm(data - model) / numSamples;
void vigra::pLSA (   ...)

Decompose a matrix according to the pLSA algorithm.

This function implements the pLSA algorithm (probabilistic latent semantic analysis) proposed in

T. Hofmann: "Probabilistic Latent Semantic Analysis", in: UAI'99, Proc. 15th Conf. on Uncertainty in Artificial Intelligence, pp. 289-296, Morgan Kaufmann, 1999

  • features must be a matrix with shape (numFeatures * numSamples) and non-negative entries, which is decomposed into the matrices
  • fz with shape (numFeatures * numComponents) and
  • zv with shape (numComponents * numSamples)

such that

\[ \mathrm{features} \approx \mathrm{fz} * \mathrm{zv} \]

(this formula applies when pLSA is called with PLSAOptions.normalizedComponentWeights(false). Otherwise, you must normalize the features by calling linalg::prepareColumns (features, features, UnitSum) to make the formula hold).

The shape parameter numComponents determines the complexity of the decomposition model and therefore the approximation quality. Intuitively, features are a set of words, and the samples a set of documents. The entries of the features matrix denote the relative frequency of the words in each document. The components represents a (presumably small) set of topics. The matrix fz encodes the relative frequency of words in the different topics, and the matrix zv encodes to what extend each topic explains the content of each document.

The option object determines the iteration termination conditions and the output normalization. In addition, you may pass a random number generator to pLSA() which is used to create the initial solution.

Declarations:

#include <vigra/unsupervised_decomposition.hxx>

namespace vigra {
template <class U, class C1, class C2, class C3, class Random>
void
pLSA(MultiArrayView<2, U, C1> const & features,
MultiArrayView<2, U, C2> & fz,
MultiArrayView<2, U, C3> & zv,
Random const& random,
PLSAOptions const & options = PLSAOptions());
template <class U, class C1, class C2, class C3>
void
pLSA(MultiArrayView<2, U, C1> const & features,
MultiArrayView<2, U, C2> & fz,
MultiArrayView<2, U, C3> & zv,
PLSAOptions const & options = PLSAOptions());
}

Usage:

Matrix<double> words(numWords, numDocuments);
... // fill the input matrix
int numTopics = 3;
Matrix<double> fz(numWords, numTopics),
zv(numTopics, numDocuments);
pLSA(words, fz, zv, PLSAOptions().normalizedComponentWeights(false));
Matrix<double> model = fz*zv;
double meanSquaredError = (words - model).squaredNorm() / numDocuments;

© Ullrich Köthe (ullrich.koethe@iwr.uni-heidelberg.de)
Heidelberg Collaboratory for Image Processing, University of Heidelberg, Germany

html generated using doxygen and Python
vigra 1.11.1 (Fri May 19 2017)