TUBITAK 3501

Novel similarity criteria and sampling/quantization methods for approximate spectral clustering of large data sets

Spectral clustering has been increasingly popular in recent years, due to its ability to extract different types of clusters in the data set, its easy implementation and its success in various application areas. It is basically a manifold learning technique which depends on eigendecomposition of a similarity matrix representing the pairwise similarities between the samples. The criteria used for constructing this matrix are often limited to standard neighborhood approaches (such as k-, ε- or σ-neighborhood) and similarities derived from pairwise distances using a Gaussian function, even though the neighborhood and similarity criteria are very important for a successful partitioning. Therefore, to achieve even more effective spectral clustering, this project aims to develop novel similarity criteria by harnessing distance and density information and exploiting spatial and topological relations. The resulting novel criteria and methods will be helpful for fast and effective clustering of large data sets, particularly of remote-sensing images (which are an important kind of large data sets with many application areas). The proposed project, which will introduce novel and advanced machine learning approaches for data mining, will pave the road in the long run for automatic and successful control methods primarily for monitoring agriculture/forest/urban with remote sensing, which has been increasingly important day by day for sustainable development and management of environmental resources.