Search Techniques and Tools - Part 1
Searching for information on the Internet can be done in many ways. I have set out a review of the major options using Internet based searchable databases in the next paragraph, and then I have included more specific information on the major examples of each type.

The most common way is to use the search engines such as AltaVista, Northern Light, Fast Search, Google, Excite, HotBot, InfoSeek and Lycos. Because the search engines produce different answers to a given query, it is a good idea to use more than one. Fortunately this can now be done using a new type of search engine called a meta search engine which can send your query to several search engine databases at the same time, and then display a consolidated list of replies from all of them (ie MetaCrawler, Inference Find!, and DogPile.)

You can also look up your queries in directories of web sites like Yahoo, Look Smart, and the Argus Clearinghouse. which are more like the traditional libraries.

Specialist search engines allow you to look for information on personal information like newsgroups (DejaNews), email addresses ( Four11), and current news (NewsTracker). Web pages containing links to many of these specialist engines have been put together (C-Net Search.Com). Increasingly they are becoming available as options on the search engine home pages as they try to become your one stop source of information. These services are accessed via web pages, but over the last year a number of the major search engines have allowed you to download a front end program which will allow you to compose your search on your computer and then query the search engine database on the web.

At the moment the biggest problem is that the current state of locating, indexing, and updating the web pages on the web, and the subsequent searching of the web page database created from these pages by the search engines, is still a compromise between the achievable and the desirable. This leads to out of date links, poor matching of queries and hits, and about half the web sites not being indexed by the search engines. Because of the size of the web, the robots which compile the index used by the search engine often will only check the top three or four levels of a web site, leaving much to be found by other means. Search Engines are also reducing their inclusion of "free web sites" such as Geocities. The size of the task also means that the sites are not visited frequently, and so may have changed or disappeared when you access them. The revisit rate can vary from weeks to months.

More information on the major search engines and related topics is covered in great detail by other people, particularly on Danny Sullivan's Search Engine Watch site. More information on each of the different ways of searching for information on the Internet is given below.

Major Search engines

The major ones I use are AltaVista, Excite, HotBot, InfoSeek, Lycos, and occasionally OpenText. You can find good reviews of them and tips on how to search them in many places (Scout Tool kit, Danny Sullivan, Tracy Marks ,..), but to get the most out of them there are a few major points to understand. They will always give different answers to the same questions because they each have their own way of finding and indexing web pages, so it is worth trying more than one, to find out which ones you like best. There are lots of web sites which group the search interfaces for many of the search engines into one page. If you want to do more than just submit basic queries you are better off using the individual search engines where you have access to more options and new developments. There is a lot of competition and so the "best search engine" this month may have been overtaken by next month.

If you want to get the best out of them it is very important to understand how to talk to the search engines. Each of them has a help page with lots of tips, and some even have an advanced search interface. The major points to understand are that, in addition to the normal search on the text of the web page, you can carry out searches on the title, the url, urls linked to the page, specific files (.gif etc)...., which can make your request much more precise. Finally you need to learn to use the logic terms OR, AND, NEAR, and NOT, (Boolean searching) which, when linked with the use of brackets and searching for phrases, will help you to sieve out the specific information you want from that huge mass of irrelevant links. Full details on these techniques are given in the help pages which can be reached from each of the search engines. Unfortunately, whilst they all speak the same language they use different dialects <g>.

A new feature that is being added to most of the search engines allows you to let the search engines "improve" your query by suggesting new words based on the content of the pages you have found so far. This covers the More Like This approach used by Infoseek to additional search terms suggested by AltaVista.

A novel approach has been taken by a new search engine Northern Light. It promises a large database of frequently updated sites, on a par with Hotbot and Excite, but in addition it not only returns the top hits, but sorts the all of the hits into subject related folders which can help to focus the search. Another attempt to bring some more order to the chaos.

Directories and Electronic Libraries

Use these as you would use a library or a bookshop. Think of a subject and then see which which web pages they have decided to "stock". Because the pages are added manually they have the option to do a quality check, and so whilst you will get fewer results they should be of a higher value. Some of the main search engines also have a directory service option. There are many types of directories, (see InterNIC ) and here is a selection. Again, try more than one. They are often variable in their coverage.

Yahoo is the most famous, and now exists in many forms for different age groups, and geographic areas. Because it takes anything from anyone with little critical control it can be very varied in quality.

Magellan has a large number of more professional academic sites, and was taken over by Excite.

World Wide Web Virtual Library is a large collection of web pages devoted to links to quality resources on a particular subject, maintained by an authorized expert.

Trade Wave Galaxy is another professionally edited collection of web pages with a good directory tree structure.

Argus Clearinghouse has subject guides selected by experts.

Look Smart is the Readers Digest owned directory.

The Mining Company is a new attempt to gather together a directory of specialist topic pages maintained by a specialist who will provide information about updates to his site.

You can also find groups of web pages with similar interests by accessing some of the rings of web sites that are being established (Webring , TheRail). Each site on the ring has a similar theme, and will allow you to move to the next site on either side of it on the ring. You don't know where you are going, but it can be fun.

Enthusiasts home pages are often the best source of information in specialist areas. Once they are established and achieve critical mass they attract new web site announcements and can be comprehensive and up to date. Bookmark them and return frequently.

Meta-Search Engines, Specialist Search Engines, and Searching Software

I have covered searching using Meta-Search Engines, specialized Search Engines, and Search Software running on your own computer on a separate pages.

This page and all its contents, © 1998, all rights reserved.

Roger Trobridge, The Internet Gopher,