Google Search Engine Technology:
White Papers, Patents, and Resources
White papers related to and about the Google
Search Engine, or authored by key people, and Google
Patents granted by or applied for with the U. S. Patent Office.
Also includes research papers and publications from other sources
that discuss and/or make significant mention of Google technologies,
particularly those at Stanford University in Palo Alto, CA. Recently
added, a group of patents on phrase
based indexing.
Newly added for 2008:
A separate page for published patents and applications:
Google
Patents
Historical
Data Patent
Google Search Engine Architecture
The
Anatomy of a Large-Scale Hypertextual Web Search Engine
Exhaustive and revealing description of the inner workings and architecture
of Google by the creators, Sergey Brin and Lawrence Page.
Page Rank
The
PageRank Citation Ranking: Bringing Order to the Web
Efficient
Computation of PageRank
Topic-Sensitive
PageRank
Authored by Taher Haveliwala in 2002 and available at the Stanford
document server in PDF and text formats.
Topic-Sensitive
PageRank: A Context-Sensitive Ranking Algorithm for Web Search
Authored by Taher Haveliwala in 2003 and hosted at the Stanford
document server in both text and PDF file formats.
An
Analytic Comparison of Approaches to Personalizing Page Rank
Stanford paper, hosted on the Stanford Document Server.
BlockRank
BlockRank:
Exploiting the Block Structure of the Web for Computing PageRank
Paper at Stanford University, assignee of the PageRank patent and
authored by S. Kamvar, T. Haveliwala, C. Manning and G. Golub, examines
an alternative computation of PR that utilizes the web's block structure,
both by domain and subsections.
Crawling & Indexing Related
Efficient
Crawling Through URL Ordering
Co-authored by J. Cho, Hector Garcia-Molina and Lawrence Page in
1998
Crawling
a Country: Better Strategies than Breadth-First for Web Page Ordering
Included because reference is made to PageRank as a metric.
Google Labs
Self-described as Google's technology playground, where new search
technologies and innovations are explored and developed.
Search Engine Spam Detection
Includes papers related to Google and published on the Stanford
University Document Server.
Search
Engine Spam
Duplicate Content Detection
Includes papers related to or authored by Google or staff, including
U.S. Patents issued to Google.
Duplicate
Content
Other
Publications
authored or co-authored by Taher H. Haveliwala
"My research focuses on efficiently utilizing search
context for large-scale web search. The following are research
papers I have authored or coauthored."
Publications
authored by Krishna Bharat
Web Search and Content Analysis
More from William Pugh on this patent:
Detecting
duplicate and near-duplicate files
|