Unsupervised Decomposition

Classes

class PLSAOptions

Option object for the pLSA algorithm. More...

Functions

template<... >

void pLSA (...)

Decompose a matrix according to the pLSA algorithm. More...

template<class T , class C1 , class C2 , class C3 >

void principalComponents (MultiArrayView< 2, T, C1 > const &features, MultiArrayView< 2, T, C2 > fz, MultiArrayView< 2, T, C3 > zv)

Decompose a matrix according to the PCA algorithm. More...

Detailed Description

Unsupervised matrix decomposition methods.

Function Documentation

void vigra::principalComponents	(	MultiArrayView< 2, T, C1 > const &	features,
		MultiArrayView< 2, T, C2 >	fz,
		MultiArrayView< 2, T, C3 >	zv
	)

Decompose a matrix according to the PCA algorithm.

This function implements the PCA algorithm (principal component analysis).

features must be a matrix with shape (numFeatures * numSamples), which is decomposed into the matrices
fz with shape (numFeatures * numComponents) and
zv with shape (numComponents * numSamples)

such that

$\mathrm{features} \approx \mathrm{fz} * \mathrm{zv}$

(this formula requires that the features have been centered around the mean by linalg::prepareRows (features, features, ZeroMean)).

The shape parameter numComponents determines the complexity of the decomposition model and therefore the approximation quality (if numComponents == numFeatures, the representation becomes exact). Intuitively, fz is a projection matrix from the reduced space into the original space, and zv is the reduced representation of the data, using just numComponents features.

Declaration:

#include <vigra/unsupervised_decomposition.hxx>

namespace vigra {
    template <class U, class C1, class C2, class C3>
    void
    principalComponents(MultiArrayView<2, U, C1> const & features,
                        MultiArrayView<2, U, C2> fz,
                        MultiArrayView<2, U, C3> zv);
}

Usage:

Matrix<double> data(numFeatures, numSamples);
... // fill the input matrix
int numComponents = 3;
Matrix<double> fz(numFeatures, numComponents),
               zv(numComponents, numSamples);
// center the data
prepareRows(data, data, ZeroMean);
// compute the reduced representation
principalComponents(data, fz, zv);
Matrix<double> model = fz*zv;
double meanSquaredError = squaredNorm(data - model) / numSamples;

void vigra::pLSA ( ... )

Decompose a matrix according to the pLSA algorithm.

This function implements the pLSA algorithm (probabilistic latent semantic analysis) proposed in

T. Hofmann: "Probabilistic Latent Semantic Analysis", in: UAI'99, Proc. 15th Conf. on Uncertainty in Artificial Intelligence, pp. 289-296, Morgan Kaufmann, 1999

features must be a matrix with shape (numFeatures * numSamples) and non-negative entries, which is decomposed into the matrices
fz with shape (numFeatures * numComponents) and
zv with shape (numComponents * numSamples)

such that

$\mathrm{features} \approx \mathrm{fz} * \mathrm{zv}$

(this formula applies when pLSA is called with PLSAOptions.normalizedComponentWeights(false). Otherwise, you must normalize the features by calling linalg::prepareColumns (features, features, UnitSum) to make the formula hold).

The shape parameter numComponents determines the complexity of the decomposition model and therefore the approximation quality. Intuitively, features are a set of words, and the samples a set of documents. The entries of the features matrix denote the relative frequency of the words in each document. The components represents a (presumably small) set of topics. The matrix fz encodes the relative frequency of words in the different topics, and the matrix zv encodes to what extend each topic explains the content of each document.

The option object determines the iteration termination conditions and the output normalization. In addition, you may pass a random number generator to pLSA() which is used to create the initial solution.

Declarations:

#include <vigra/unsupervised_decomposition.hxx>

namespace vigra {
    template <class U, class C1, class C2, class C3, class Random>
    void
    pLSA(MultiArrayView<2, U, C1> const & features,
         MultiArrayView<2, U, C2> & fz,
         MultiArrayView<2, U, C3> & zv,
         Random const& random,
         PLSAOptions const & options = PLSAOptions());
    template <class U, class C1, class C2, class C3>
    void
    pLSA(MultiArrayView<2, U, C1> const & features,
         MultiArrayView<2, U, C2> & fz,
         MultiArrayView<2, U, C3> & zv,
         PLSAOptions const & options = PLSAOptions());
}

Usage:

Matrix<double> words(numWords, numDocuments);
... // fill the input matrix
int numTopics = 3;
Matrix<double> fz(numWords, numTopics),
               zv(numTopics, numDocuments);
pLSA(words, fz, zv, PLSAOptions().normalizedComponentWeights(false));
Matrix<double> model = fz*zv;
double meanSquaredError = (words - model).squaredNorm() / numDocuments;

Classes
class	PLSAOptions
	Option object for the pLSA algorithm. More...

Functions
template<... >
void	pLSA (...)
	Decompose a matrix according to the pLSA algorithm. More...

template<class T , class C1 , class C2 , class C3 >
void	principalComponents (MultiArrayView< 2, T, C1 > const &features, MultiArrayView< 2, T, C2 > fz, MultiArrayView< 2, T, C3 > zv)
	Decompose a matrix according to the PCA algorithm. More...