Phrase Based Indexing, Phrase Based Information Retrieval
The latest Google news that's hitting recently is a series of patent
applications that was filed with the U.S. Patent Office not too
long ago that comprise a related "cluster" relating to
phrase based indexing and retrieval.
They're listed here (with the exception of one, which will be added),
with the addition of an important one that was added after the original
set and addresses the issue of search indexing by of use of a primary
and secondary index. In addition, it gives a little clearer, simpler
overview of phrase-based concepts.
I'll also be adding references to a couple of other related and
preceding technologies that can help to give a clearer picture of
the older foundational principles that apply, along with some personal
notes. Other suggested reading is the page with references to papers
on keyword
co-occurrence, a term used repeatedly in these patent applications.
Patent Applications
With the exception of the latest being listed first, the others
are listed in logical sequence.
Multiple
index based information retrieval system
United States Patent Application 20060106792
Published: May18, 2006
Filed: January 25, 2005
Inventor: Anna Lynn Patterson
An information retrieval system uses phrases to index, retrieve,
organize and describe documents. Phrases are identified that predict
the presence of other phrases in documents. Documents are the
indexed according to their included phrases. The document index
is partitioned into multiple indexes, including a primary index
and a secondary index. The primary index stores phrase posting
lists with relevance rank ordered documents. The secondary index
stores excess documents from the posting lists in document order.
Phrase
identification in an information retrieval system
United States Patent Application 20060018551
Filed: July 26, 2004
Inventor: Anna Lynn Patterson
An information retrieval system uses phrases to index, retrieve,
organize and describe documents. Phrases are identified that predict
the presence of other phrases in documents. Documents are the
indexed according to their included phrases. Related phrases and
phrase extensions are also identified. Phrases in a query are
identified and used to retrieve and rank documents. Phrases are
also used to cluster documents in the search results, create document
descriptions, and eliminate duplicate documents from the search
results, and from the index.
Phrase
Based Indexing in an Information Retrieval System
U. S. Patent Application 20060020607
Filed: July 26, 2004
Inventor: Anna Lynn Patterson
An information retrieval system uses phrases to index, retrieve,
organize and describe documents. Phrases are identified that predict
the presence of other phrases in documents. Documents are the
indexed according to their included phrases. Related phrases and
phrase extensions are also identified. Phrases in a query are
identified and used to retrieve and rank documents. Phrases are
also used to cluster documents in the search results, create document
descriptions, and eliminate duplicate documents from the search
results, and from the index.
Phrase-based
searching in an information retrieval system
United States Patent Application 20060031195
Filed: July 26, 2004
Inventor: Anna Lynn Patterson
An information retrieval system uses phrases to index, retrieve,
organize and describe documents. Phrases are identified that predict
the presence of other phrases in documents. Documents are the indexed
according to their included phrases. Related phrases and phrase
extensions are also identified. Phrases in a query are identified
and used to retrieve and rank documents. Phrases are also used to
cluster documents in the search results, create document descriptions,
and eliminate duplicate documents from the search results, and from
the index.
Phrase-based
generation of document descriptions
United States Patent Application 20060020571
Filed: July 26, 2004
Inventor: Anna Lynn Patterson
An information retrieval system uses phrases to index, retrieve,
organize and describe documents. Phrases are identified that predict
the presence of other phrases in documents. Documents are the
indexed according to their included phrases. Related phrases and
phrase extensions are also identified. Phrases in a query are
identified and used to retrieve and rank documents. Phrases are
also used to cluster documents in the search results, create document
descriptions, and eliminate duplicate documents from the search
results, and from the index.
Detecting
spam documents in a phrase based information retrieval system
United States Patent Application 20060294155
Published: December 28, 2006
Filed: June 28, 2006
Inventor: Anna Lynn Patterson
An information retrieval system uses phrases to index, retrieve,
organize and describe documents. Phrases are identified that predict
the presence of other phrases in documents. Documents are the
indexed according to their included phrases. A spam document is
identified based on the number of related phrases included in
a document.
Efficient
Phrase Based Document Indexing for Document Clustering
Examines clustering by phrases rather than by individual words.
Other Resources
Phrase Based Information
Retrieval and Spam Detection
Bill Slawski's article at SEO by the Sea
Using
WordNet in a Knowledge-Based Approach to Information Retrieval (PDF)
Abstract and references: Citeseer
More to come. :-)
|