Search Engine Technology: Clustering Papers
Respect
my authority! HITS without hyperlinks, utilizing cluster-based language
models.
Authored by Lillian Lee and Oren Kurland of Cornell Univ. - 2006
Full PDF: http://www.cs.cornell.edu/home/llee/papers/lmhubsauth.pdf
Abstract:
We present an approach to improving the precision of an initial
document ranking wherein we utilize cluster information within
a graph-based framework. The main idea is to perform re-ranking
based on centrality within bipartite graphs of documents (on one
side) and clusters (on the other side), on the premise that these
are mutually reinforcing entities. Links between entities are
created via consideration of language models induced from them.
We find that our cluster-document graphs give rise to much better
retrieval performance than previously proposed document-only graphs
do. For example, authority-based re-ranking of documents via a
HITS-style cluster-based approach outperforms a previously-proposed
PageRank-inspired algorithm applied to solely-document graphs.
Moreover, we also show that computing authority scores for clusters
constitutes an effective method for identifying clusters containing
a large percentage of relevant documents.
Related terminology: HITS, clusters, re-ranking
An
Impossibility Theory for Clustering
Paper authored by Jon Kleinberg, Computer Science Professor at Cornell
University. (PDF Document)
Survey
of Clustering Data Mining Techniques
Exhaustive 56-page PDF document studying clustering, authored by
Pavel Berkhin. (PDF Document)
Learning
to Cluster Web Search Results
From MSN Search: Algorithm which clusters results with common words
that have different meanings, and that indicate a different context,
into relevant categories.
|