This section of the site is for information about search engines
and Information Retrieval, and for providing white papers and resources
on search engine technology, It is not about search engine optimization.
Inclusion is for listings of academic research papers and patents
issued and applied for, as well as articles and links to informative
sites and articles about search engines, Information Retrieval and
Includes academic research papers related or pertaining to Google
and PageRank, and U.S. Patents granted.
Papers related to PageRank, Google's patented technology.
Patents granted by the U.S. Patent Office with Google, Inc. as
the assignee, as well as pending patent applications that have
Based Indexing & Spam Detection
A group of several related Patent Applications, with additional
Documents related to Microsoft's Live Search Engine, including
several patents and white papers from Microsoft research.
Papers related to Yahoo Search, including patents.
Technologies, criteria and methods for grouping of web pages by
Topics related to definitions and detection of duplicate or near-duplicate
content on web pages, including papers authored by, relevant to
and/or pertaining to Google.
Document Frequency / TF - IDF
Analysis and Topic Distillation
Included are topics related to links, analysis of links, Topic
Distillation, Hubs and Authorities, as well as the Hilltop Algorithm.
Linguistics and Semantic Analysis
Includes topics related to Latent Semantic Indexing, lexical analysis
and taxonomies. I've also got the LSI
tutorial online, which is under Creative Commons License.
Search Related Papers and Patents
Assorted topics that don't necessarily fit elsewhere, or for which
there aren't enough related topics to warrant a separate topic
area. It's also a temporary resting place for links to papers
that will eventually be moved directly into applicable sections.
Topics related to detecting and combating search engine spam,
including link-related spam techniques. Includes those related
Site Navigation and Site Maps
Papers that examine the directory and file structures of websites,
hosts and domains, including physical and logical domains.