Crawling The Deep Web
A vast amount of Web pages lie in the deep or invisible Web. These pages are typically only accessible by submitting queries to a database, and regular crawlers are unable to find these pages if there are no links that point to them. Google's Sitemaps protocol and mod oai are intended to allow discovery of these deep-Web resources.
Deep Web crawling also multiplies the number of Web links to be crawled. Some crawlers only take some of the -shaped URLs. In some cases, such as the Googlebot, Web crawling is done on all text contained inside the hypertext content, tags, or text.
Strategic approaches may be taken to target deep-Web content. With a technique called screen scraping, specialized software may be customized to automatically and repeatedly query a given Web form with the intention of aggregating the resulting data. Such software can be used to span multiple Web forms across multiple Websites. Data extracted from the results of one Web form submission can be taken and applied as input to another Web form thus establishing continuity across the Deep Web in a way not possible with traditional web crawlers.
Read more about this topic: Web Crawler
Famous quotes containing the words crawling, deep and/or web:
“We learn through experience and experiencing, and no one teaches anyone anything. This is as true for the infant moving from kicking to crawling to walking as it is for the scientist with his equations. If the environment permits it, anyone can learn whatever he chooses to learn; and if the individual permits it, the environment will teach him everything it has to teach.”
—Viola Spolin (b. 1911)
“The religion of the Bible is the best in the world. I see the infinite value of religion. Let it be always encouraged. A world of superstition and folly have grown up around its forms and ceremonies. But the truth in it is one of the deep sentiments in human nature.”
—Rutherford Birchard Hayes (18221893)
“Science is a dynamic undertaking directed to lowering the degree of the empiricism involved in solving problems; or, if you prefer, science is a process of fabricating a web of interconnected concepts and conceptual schemes arising from experiments and observations and fruitful of further experiments and observations.”
—James Conant (18931978)