CMAPP: Clustered matrix approximations

Method description

Clustered matrix approximation is a fast and memory efficient framework for dimensionality reduction of matrices. The method is suited for large and sparse matrices that have some kind of cluster structure. This is the case for many matrices arizing from graphs or networks, bipartite graphs, and large scale information retrieval problems. The method involves three to four steps:

a clustering or co-clustering step so that the rows and columns of a given matrix are partitioned into a number of groups;
reordering the rows and columns according to cluster belonging and extracting sufficiently dense blocks;
computing low rank approximations of these dense blocks; and
combining the block-wise approximations into an approximation of the entire matrix.

Detailed descriptions of these methods, motivation, and numerous experimental results showing benefits of the clustered matrix approximation are presented in Savas and Dhillon 2011, and Savas and Dhillon 2015.

MATLAB code for clustered matrix approximations

Test examples with MATLAB Publish

We illustrate various versions of the clustered matrix approximation and their MATLAB implementations with a number of published documents. A complete list of available MATLAB-functions is given in this pdf-file.

Dependencies, additional software packages, and related algorithms

The clustered matrix approximation package uses additional software packages. These are:

GRACLUS: Efficient graph clustering software.
MatlabBGL: MATLAB package for working with graphs.
PROPACK: Software package for computing the singular value and spectral decomposition of large and sparse matrices.
We also empley probabilistic algorithms in the clustered matrix approximation framework. Consider Halko et al. 2011 for details.

Download

Download CMAPP: Clustered matrix approximations
README file for CMAPP: clustered matrix approximation
The code is released under the GNU Public License (GPL).

Citation

When using the clustered matrix approximation code, please acknowledge its use with citations:

Clustered matrix appsoximations
B. Savas and I. S. Dhillon.
Submitted to SIAM Journal on Matrix Analysis and Applications (SIMAX), 2015.
Clustered low rank approximation of graphs in information science applications
B. Savas and I. S. Dhillon.
In Proceedings of the SIAM International Conference on Data Mining (SDM). pp. 164-175, 2011.
Clustered embedding of massive social networks
Song, H. H. and Savas, B. and Cho, T. W. and Dave, V. and Lu, Z. and Dhillon, I. S. and Zhang, Y. and Qiu, L.
In Proceedings of the 12th joint international conference on Measurement and Modeling of Computer Systems (ACM SIGMETRICS/PERFORMANCE). pp. 331--342, 2012.

Questions? Comments?

Contact Berkant Savas.