Spider Trap

A spider trap (or crawler trap) is a set of web pages that may intentionally or unintentionally be used to cause a web crawler or search bot to make an infinite number of requests or cause a poorly constructed crawler to crash. Web crawlers are also called web spiders, from which the name is derived. Spider traps may be created to "catch" spambots or other crawlers that waste a website's bandwidth. They may also be created unintentionally by calendars that use dynamic pages with links that continually point to the next day or year.

Common techniques used are:

  • creation of indefinitely deep directory structures like http://foo.com/bar/foo/bar/foo/bar/foo/bar/.....
  • dynamic pages like calendars that produce an infinite number of pages for a web crawler to follow.
  • pages filled with a large number of characters, crashing the lexical analyzer parsing the page.
  • pages with session-id's based on required cookies.

There is no algorithm to detect all spider traps. Some classes of traps can be detected automatically, but new, unrecognized traps arise quickly.

Read more about Spider Trap:  Politeness, See Also

Famous quotes containing the words spider and/or trap:

    And now, dear little children, who may this story read,
    To idle, silly, flattering words, I pray you ne’er give heed;
    Unto an evil counselor close heart, and ear, and eye,
    And take a lesson from this tale of the Spider and the Fly.
    Mary Howitt (1799–1888)

    All Coolidge had to do in 1924 was to keep his mean trap shut, to be elected. All Harding had to do in 1920 was repeat “Avoid foreign entanglements.” All Hoover had to do in 1928 was to endorse Coolidge. All Roosevelt had to do in 1932 was to point to Hoover.
    Robert E. Sherwood (1896–1955)