Search Engine Patents and White Papers

This section of the site is for information about search engines and Information Retrieval, and for providing white papers and resources on search engine technology, It is not about search engine optimization.

Inclusion is for listings of academic research papers and patents issued and applied for, as well as articles and links to informative sites and articles about search engines, Information Retrieval and search technology.

Google Search Engine
Includes academic research papers related or pertaining to Google and PageRank, and U.S. Patents granted.

PageRank
Papers related to PageRank, Google's patented technology.

Google Patents
Patents granted by the U.S. Patent Office with Google, Inc. as the assignee, as well as pending patent applications that have been published.

Phrase Based Indexing & Spam Detection
A group of several related Patent Applications, with additional related resources.

MSN Live Search
Documents related to Microsoft's Live Search Engine, including several patents and white papers from Microsoft research.

Yahoo Search
Papers related to Yahoo Search, including patents.

Clustering
Technologies, criteria and methods for grouping of web pages by similarities.

Duplicate Content Detection
Topics related to definitions and detection of duplicate or near-duplicate content on web pages, including papers authored by, relevant to and/or pertaining to Google.

Inverse Document Frequency / TF - IDF

Keyword Co-Occurrence

Link Analysis and Topic Distillation
Included are topics related to links, analysis of links, Topic Distillation, Hubs and Authorities, as well as the Hilltop Algorithm.

LSI, Linguistics and Semantic Analysis
Includes topics related to Latent Semantic Indexing, lexical analysis and taxonomies. I've also got the LSI tutorial online, which is under Creative Commons License.

Miscellaneous Search Related Papers and Patents
Assorted topics that don't necessarily fit elsewhere, or for which there aren't enough related topics to warrant a separate topic area. It's also a temporary resting place for links to papers that will eventually be moved directly into applicable sections.

Search Engine Spam
Topics related to detecting and combating search engine spam, including link-related spam techniques. Includes those related to Google.

Web Site Navigation and Site Maps
Papers that examine the directory and file structures of websites, hosts and domains, including physical and logical domains.

Word Sense Disambiguation