Google Patents, Applications and Publications

Patents granted to Google, Inc. as Assignee, as well as pending Patent Applications that have been applied for and publicly published at the U. S. Patent Office.

Patent Applications

Systems and methods for analyzing boilerplate

U. S. Patent Application: 20080040316
Date Published: February 14, 2008
Date Filed: March 31, 2004
Inventor:Lawrence; Stephen R.
Assignee: Google Inc.

Abstract

Systems and methods for analyzing boilerplate are described. In one described system, an indexer identifies a common element in a plurality of related articles. The indexer then classifies the common element as boilerplate. For example, the indexer may identify a copyright notice appearing in a plurality of related articles. The copyright notice in these articles is considered boilerplate.

Information retrieval based on historical data
(Gone MIA at the patent office, copy on this site. Loads slowly now, will be breaking it up into separate pages.)

Document Scoring Based on Document Inception Date

U.S. Patent Application: 20070094254
Date published: April 26, 2007
Application No.: 10/676,651
Date Filed: November 20, 2006
Assignee: Google Inc.

Inventors: Cutts; Matt; (Los Altos, CA) ; Dean; Jeffrey; (Palo Alto, CA) ; Haahr; Paul; (San Francisco, CA) ; Henzinger; Monika; (Corseaux, CH) ; Lawrence; Steve; (Mountain View, CA) ; Pfleger; Karl; (Mountain View, CA) ; Tong; Simon; (Mountain View, CA)

Abstract

A system may determine a document inception date associated with a document, generate a score for the document based, at least in part, on the document inception date, and rank the document with regard to at least one other document based, at least in part, on the score.

Systems and methods for determining document freshness

United States Patent Application: 20050144193
Date published: June 30, 2005
Date filed: June 30, 2004
Assignee: Google, Inc.

Inventor: Henzinger, Monika

Abstract

Systems and methods for determining document freshness Abstract A system determines a freshness of a first document. The system determines whether a freshness attribute is associated with the first document. The system identifies, based on the determination, a set of second documents that each contain a link to the first document. The system assigns a freshness score to the first document based on a freshness attribute associated with each document of the set of second documents or the freshness attribute associated with the first document.

Document Scoring Based on Query Analysis

U.S. Patent Application: 20070088692
Date published: April 19, 2007
Date filed: November 22, 2006
Serial No.: 562617
Assigned: Google, Inc.

Inventors: Dean; Jeffrey; (Palo Alto, CA) ; Haahr; Paul; (San Francisco, CA) ; Henzinger; Monika; (Corseaux, CH) ; Lawrence; Steve; (Mountain View, CA) ; Pfleger; Karl; (Mountain View, CA) ; Sercinoglu; Olcan; (Mountain View, CA) ; Tong; Simon; (Mountain View, CA)

A system may determine an extent to which a document is selected when the document is included in a set of search results, generate a score for the document based, at least in part, on the extent to which the document is selected when the document is included in a set of search results; and rank the document with regard to at least one other document based, at least in part, on the score.

Document Scoring Based on Traffic Associated with a Document

U. S. Patent Application: 20070088693
Date published: April 19, 2007
Date filed: November 30, 2006
Serial No.: 565026
Assignee: Google, Inc.

Inventor: Lawrence; Steve; (Mountain View, CA)

Abstract

A system determines an extent to which advertisements are presented or updated within a document, a quality of an advertiser associated with an advertisement provided within the document, whether an advertisement in the document relates to an advertising document that has more than a threshold amount of traffic, and/or an extent to which an advertisement provided within the document generates user traffic to an advertising document related to the advertisement. The system generates a score for the document based, at least in part, on the extent to which advertisements are presented or updated within the document, the quality of the advertiser associated with the advertisement provided within the document, whether the advertisement relates to an advertising document that has more than the threshold amount of traffic, and/or the extent to which the advertisement generates user traffic to the advertising document. The system ranks the document with regard to at least one other document based, at least in part, on the score.

Presentation of search results based on document structure

U..S. Patent Application: 20060074907
Date published: April 6, 2006
Date filed: September 27, 2004
Serial No. 949708

Abstract

A system identifies a document relating to a search term, where the document includes a set of structural elements. The system determines a distribution of occurrences of the search term in the document, identifies one of the structural elements based on the distribution of occurrences of the search term in the document, and presents information associated with the identified structural element.

Document Scoring Based on Link-Based Criteria

U.S. Patent Application: 20070094255
Published: April 26, 2007
Filed: November 30, 2006
Assignee: Google, Inc.

Inventors: Acharya; Anurag; (Campbell, CA) ; Cutts; Matt; (Los Altos, CA) ; Dean; Jeffrey; (Palo Alto, CA) ; Haahr; Paul; (San Francisco, CA) ; Henzinger; Monika; (Corseaux, CH) ; Lawrence; Steve; (Mountain View, CA) ; Pfleger; Karl; (Mountain View, CA) ; Tong; Simon; (Mountain View, CA)

Abstract:

A system may determine time-varying behavior of links pointing to a document, generate a score for the document based, at least in part, on the time-varying behavior of the links pointing to the document, and rank the document with regard to at least one other document based, at least in part, on the score.

Multi-stage query processing system and method for use with tokenspace repository

Patent Application: 20060036593
Published: February 16, 2006
Filed: August 13, 2004

Inventors: Dean; Jeffrey Adgate; (Palo Alto, CA) ; Haahr; Paul G.; (San Francisco, CA) ; Sercinoglu; Olcan; (Mountain View, CA) ; Singhal; Amitabh K.; (Palo Alto, CA)

Abstract:

A multi-stage query processing system and method enables multi-stage query scoring, including "snippet" generation, through incremental document reconstruction facilitated by a multi-tiered mapping scheme. At one or more stages of a multi-stage query processing system a set of relevancy scores are used to select a subset of documents for presentation as an ordered list to a user. The set of relevancy scores can be derived in part from one or more sets of relevancy scores determined in prior stages of the multi-stage query processing system. In some embodiments, the multi-stage query processing system is capable of executing one or more passes on a user query, and using information from each pass to expand the user query for use in a subsequent pass to improve the relevancy of documents in the ordered list.

Variable length snippet generation

U.S. Patent Application: 20050278314
Date Published: December 15, 2005
Filing Date: June 9, 2004

Inventors: Buchheit, Paul; (Mountain View, CA)

Abstract:

A method and system are disclosed that provide a variable length snippet when returning snippets in response to a search request. Under conditions where the search query matches a document with a high degree of certainty, a shorter snippet is provided than when the document does not match the search query with a high level certainty. A variable snippet length is also based on an estimate of how likely a user will recognize the document. For example, shorter snippets are provided is a user has recently viewed a document, but longer snippets are provided if a user has not recently viewed the document.

Google Patents

System and method for selectively searching partitions of a database

U.S. Patent: 7,254,580
Date granted: August 7, 2007
Application No.: 10/676,651
Date Filed: September 30, 2003
Assignee: Google Inc.

Inventors: Gharachorloo; Kourosh (Menlo Park, CA), Chang; Fay Wen (Mountain View, CA), Wallach; Deborah Anne (Emerald Hills, CA), Ghemawat; Sanjay (Mountain View, CA), Dean; Jeffrey (Menlo Park, CA) (Mountain View, CA)

Abstract:

When a search query is received, a plurality of partition indexes are searched using the set of search terms in the search query. Each partition index corresponds to a partition of a document index. The search of each respective partition index identifies a subset of a plurality of document index sub-partitions corresponding to the respective partition index. Next, the search query is executed by only those document index sub-partitions identified by the subsets, thereby identifying documents that satisfy the search query. By using the partition index to reduce the number of document index sub-partitions searched while executing a search query, the execution of the search query is made more efficient.

Link based clustering of hyperlinked documents

U.S. Patent: 7,213,198
Date granted: May 1, 2007
Assignee: Google, Inc.
Filed: August 10, 2000

Inventor: Author: Georges R. Harik

Abstract:

Techniques for grouping hyperlinked documents are provided. Links near or in the neighborhood of the hyperlinked documents are analyzed in order to group the hyperlinked documents by topic. For example, links that are search results can be grouped by identifying other hyperlinked documents that have multiple forward links to the search results. The search results can then be grouped according to the forward links of the other hyperlinked documents.

Ranking search results by reranking the results based on local inter-connectivity

Granted to Krishna Bharat with Google.com as assignee. Often referred to as the LocalRank patent.

Patent number: 6,526,440
Filing date: Jan 30, 2001
Issue date: Feb 25, 2003
Inventor: Krishna Bharat
Assignee: Google, Inc.

Abstract:

A search engine for searching a corpus improves the relevancy of the results by refining a standard relevancy score based on the interconnectivity of the initially returned set of documents. The search engine obtains an initial set of relevant documents by matching a user's search terms to an index of a corpus. A re-ranking component in the search engine then refines the initially returned document rankings so that documents that are frequently cited in the initial set of relevant documents are preferred over documents that are less frequently cited within the initial set.

Google Patents Related to Duplicate Content

Detecting query-specific duplicate documents

U.S. Patent 6,615,209
Date granted: September 2, 2003
Application date: October 6, 2000

Inventors: Gomes; Benedict (Berkeley, CA), Smith; Benjamin Thomas (Mountain View, CA)

Patent originally applied for Oct. 6, 2000 and granted to Google Sept. 2, 2003 by the U.S. Patent Office utilizes query-relevant information for similarity comparisons, in some cases relying on extracted snippets from the documents rather than the entire documents themselves.

Detecting duplicate and near-duplicate files
Authored by Wm. Pugh and Monika Henzinger

More from William Pugh on this work:
Detecting duplicate and near-duplicate files