Google Search Engine Technology:
White Papers, Patents, and Resources
White papers related to and about the Google
Search Engine, or authored by key people, and Google
Patents granted by or applied for with the U. S. Patent Office.
Also includes research papers and publications from other sources
that discuss and/or make significant mention of Google technologies,
particularly those at Stanford University in Palo Alto, CA. Recently
added, a group of patents on phrase
Newly added for 2008:
A separate page for published patents and applications:
Google Search Engine Architecture
Anatomy of a Large-Scale Hypertextual Web Search Engine
Exhaustive and revealing description of the inner workings and architecture
of Google by the creators, Sergey Brin and Lawrence Page.
PageRank Citation Ranking: Bringing Order to the Web
Computation of PageRank
Authored by Taher Haveliwala in 2002 and available at the Stanford
document server in PDF and text formats.
PageRank: A Context-Sensitive Ranking Algorithm for Web Search
Authored by Taher Haveliwala in 2003 and hosted at the Stanford
document server in both text and PDF file formats.
Analytic Comparison of Approaches to Personalizing Page Rank
Stanford paper, hosted on the Stanford Document Server.
Exploiting the Block Structure of the Web for Computing PageRank
Paper at Stanford University, assignee of the PageRank patent and
authored by S. Kamvar, T. Haveliwala, C. Manning and G. Golub, examines
an alternative computation of PR that utilizes the web's block structure,
both by domain and subsections.
Crawling & Indexing Related
Crawling Through URL Ordering
Co-authored by J. Cho, Hector Garcia-Molina and Lawrence Page in
a Country: Better Strategies than Breadth-First for Web Page Ordering
Included because reference is made to PageRank as a metric.
Self-described as Google's technology playground, where new search
technologies and innovations are explored and developed.
Search Engine Spam Detection
Includes papers related to Google and published on the Stanford
University Document Server.
Duplicate Content Detection
Includes papers related to or authored by Google or staff, including
U.S. Patents issued to Google.
authored or co-authored by Taher H. Haveliwala
"My research focuses on efficiently utilizing search
context for large-scale web search. The following are research
papers I have authored or coauthored."
authored by Krishna Bharat
Web Search and Content Analysis
More from William Pugh on this patent:
duplicate and near-duplicate files