Search Engine Spam Definitions and Detection

Papers related to defining, identifying and combating search engine spam, including those published and applicable to the Google Search Engine, particularly those on the Stanford University Document Server.

Search Engine and Link Spam Detection

Web Spam Taxonomy
Stanford paper authored by Z. Gyongi and Hector Garcia-Molina

"Web spamming refers to actions intended to mislead search engines and give some pages higher ranking than they deserve. Recently, the amount of web spam has increased dramatically, leading to a degradation of search results. This paper presents a comprehensive taxonomy of current spamming techniques, which we believe can help in developing appropriate countermeasures."

Link Spam Alliances
Stanford University Technical Paper published May 30, 2005 and authored by Zoltan Gyongyi and Hector Garcia-Molina

"Link spam is used to increase the ranking of certain target web pages by misleading the connectivity-based ranking algorithms in search engines. In this paper we study how web pages can be interconnected in a spam farm in order to optimize rankings. We also study alliances, that is, interconnections of spam farms. Our results identify the optimal structures and quantify the potential gains. In particular, we show that alliances can be synergistic and improve the rankings of all participants. We believe that the insights we gain will be useful in identifying and combating link spam."

Combating Web Spam with TrustRank
Stanford paper authored by Z. Gyongyi, J. Pedersen and Hector Garcia-Molina

"Web spam pages use various techniques to achieve higher-than-deserved rankings in a search engine's results. While human experts can identify spam, it is too expensive to manually evaluate a large number of pages. Instead, we propose techniques to semi-automatically separate reputable, good pages from spam."