How to Penetrate and Search the Invisible Web

Yona F. Maro

R I P
Nov 2, 2006
4,201
234
Author: Courtland L. Bovee


This article teaches you how to reach that portion of the Internet that is not readily accessible with the best known search engines. The author provides additional search engines which can enable you to penetrate this "invisible web."

For most of our online activities, a general search engine like Google does the trick. But when researching obscure or complicated topics, being able to effectively penetrate the invisible web will take you places no automated web crawler ever could.

What is the invisible web?

The invisible web (also known as the "deep" or "hidden" web) is the term ascribed to the vast amount of information stored online that is inaccessible to regular search engines. This is mostly stored in databases and alternative file formats, as opposed to basic, public html websites. It is, by some estimates, 500 times bigger than the "visible" web. In other words, when you search using MSN or Yahoo, you're only drawing from 1/500th of the content out there. This leaves a lot to explore!

Why is it invisible?

There are several reasons why certain content stays hidden. Pages with no links to other pages (that is, no "backlinks" or "inlinks") may prevent web crawling programs from retrieving information. Sites that involve registration or typing in a query also bar surface searches from entering (web "spiders" just aren't that smart). Content that is scripted or non-text generally won't come up either-though Google does access PDF and DOC file formats and offers these as HTML views.

How do you access it?

The easiest way to breach the invisible barrier is by starting with search engines put together for this purpose. Many of them provide links to databases that are likely to be useful for your topic; others are put together by academic institutions, which generally provide high-quality results. Below is a list of the tried-and-true that guarantee to help you get below the surface (some may require that you pay a fee to subscribe to services).

Turbo10 (www.turbo10.com) connects you to niche-specific search engines, and enables you to access databases from various levels of government, businesses, and universities.

Direct Search (www.freepint.com/gary/direct.htm) is an excellent compilation of specialized search tools by Gary Price, an information retrieval expert.

The Invisible Web Directory (www.invisible-web.net), also from Gary Price and search expert Chris Sherman, is a directory of searchable databases, categorized by topic.

InfoMine (http://infomine.ucr.edu) offers a mind-boggling number of links and databases, and is maintained by the University of California, Riverside.

Another academic site is the SJSU Academic Gateway (www.sjlibrary.org/gateways/academic) from San Jose State University; it'll get you into the SJSU library and San Jose public libraries.

The Virtual Library (http://vlib.org) has annotated subject links, which can help cut down superfluous searching.

Complete Planet (http://aip.completeplanet.com) claims access to "over 70,000 searchable databases and specialty search engines."

Another comprehensive guide is WebData (www.webdata.com), which offers browse and search options.

The Librarians' Internet Index (http://lii.org) has a collection of over 20,000 quality sites and offers a weekly newsletter with updates and relevant links.

The Educator's Reference Desk (www.eduref.org/) presents access to 3,000-plus educational resources, which you can peruse by category.

The Education Resources Information Center (www.eric.ed.gov) provides free access to more than 1.2 million bibliographic records of journal articles and other education-related materials and, if available, includes links to full text.

OAIster (www.oaister.org) provides one-stop "shopping" for users interested in useful, academically-oriented digital resources. It gathers all potential digital resources available in an effort to build a comprehensive digital union catalog.

FindArticles (www.findarticles.com) now searches over 10 million articles from academic, industry, and general interest publications.

MagPortal (www.magportal.com) is another great tool for searching magazine articles online.

Remember that these are just a few of the many, many search engines that specialize in "hidden web" searches-not to mention the hundreds of thousands of databases you can search directly once you know where to find them. You may find that once you go "deep," there's no going back!
 
Back
Top Bottom