Google Search Engine Technology:
White Papers, Patents, and Resources

White papers related to and about the Google Search Engine, or authored by key people, and Google Patents granted by or applied for with the U. S. Patent Office. Also includes research papers and publications from other sources that discuss and/or make significant mention of Google technologies, particularly those at Stanford University in Palo Alto, CA. Recently added, a group of patents on phrase based indexing.

Newly added for 2008:

A separate page for published patents and applications:

Google Patents

Historical Data Patent


Google Search Engine Architecture

The Anatomy of a Large-Scale Hypertextual Web Search Engine
Exhaustive and revealing description of the inner workings and architecture of Google by the creators, Sergey Brin and Lawrence Page.


Page Rank

The PageRank Citation Ranking: Bringing Order to the Web

Efficient Computation of PageRank

Topic-Sensitive PageRank
Authored by Taher Haveliwala in 2002 and available at the Stanford document server in PDF and text formats.

Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search
Authored by Taher Haveliwala in 2003 and hosted at the Stanford document server in both text and PDF file formats.

An Analytic Comparison of Approaches to Personalizing Page Rank
Stanford paper, hosted on the Stanford Document Server.

BlockRank

BlockRank: Exploiting the Block Structure of the Web for Computing PageRank
Paper at Stanford University, assignee of the PageRank patent and authored by S. Kamvar, T. Haveliwala, C. Manning and G. Golub, examines an alternative computation of PR that utilizes the web's block structure, both by domain and subsections.


Crawling & Indexing Related

Efficient Crawling Through URL Ordering
Co-authored by J. Cho, Hector Garcia-Molina and Lawrence Page in 1998

Crawling a Country: Better Strategies than Breadth-First for Web Page Ordering
Included because reference is made to PageRank as a metric.


Google Labs
Self-described as Google's technology playground, where new search technologies and innovations are explored and developed.


Search Engine Spam Detection

Includes papers related to Google and published on the Stanford University Document Server.

Search Engine Spam


Duplicate Content Detection

Includes papers related to or authored by Google or staff, including U.S. Patents issued to Google.

Duplicate Content


Google Patents and Applications


Other

Publications authored or co-authored by Taher H. Haveliwala

"My research focuses on efficiently utilizing search context for large-scale web search. The following are research papers I have authored or coauthored."

Publications authored by Krishna Bharat

Web Search and Content Analysis

More from William Pugh on this patent:
Detecting duplicate and near-duplicate files