The Web is fast becoming a titanic, complex entity. By the year 2015, it’s estimated that one zettabyte of content will be added to the web each and every year. Navigating this sea of information presents more and more of a challenge -- particularly when much of that content is not easily accessed by traditional search engines.
The Surface Web - When most of us think of the Web, we think of the 'Surface Web', also known as the visible web - the webpages we access directly, via links or via common search engines like Google. However, the Surface Web makes up just 4 percent of all the content on the Internet.
The Deep Web - The ‘Deep Web’ or ‘Invisible Web’ is several orders of magnitude larger than the Surface Web and represents a staggering 96 percent of information on the Web. This content includes:
- Dynamic or scripted content
- Unlinked content - pages which are not linked to by other pages, which may prevent web crawling pprograms from accessing the content.
- Private or password-proected websites
- Webpages with content varying for different access contexts (e.g., ranges of client IP addresses or previous navigation sequence).
- Limited access content - sites that limit access to their pages in a technical way
- Non-HTML/text content - textual content encoded in multimedia (image or video) files or specific file formats not handled by search engines.
This content can only be mined and leveraged using sophisticated search technologies, such as Goldfire's world-class semantic search.
If you aren't searching the Deep Web, check out what you might be missing:
Published by IHS