Using a browser, such as Netscape Navigator or
Microsoft Internet Explorer, can be fun -- but the size of the
Internet makes it impossible to find information by searching each and
every computer on the global network.
Think of the Internet as a disorganized library. Sure;
each web site has a unique address, or URL. Information on the
Internet, however, isn’t organized by topics.
Search Engines automate the process of filtering
through the World-Wide-Web. It would not be possible to run a
“real-time” search that would examine all files on all the computers
on the Internet that meet some requested criterion.
Some Common Search Engines
Search Engines, such as Lycos and Alta Vista, actually
perform their searches BEFORE anyone has actually asked for the
information. These programs sift through the Internet, gather lists of
available information that is contained in web pages, sort the lists,
and store the results on the computer that runs that particular search
engine.
Programs that constantly crawl through the Internet
looking for information are called “spiders.” The best search engines
are constantly updating their listings to maintain their indexes and
keep information current.
When Internet users enter keywords into a search
engine and submit the search to be executed, search sites generate a
list of web pages who’s content matches the search criterion.
Getting starting to using search engine is easy, just
enter the information you are seeking in the “keyword” text box and
let the program find appropriate matches. The main advantage of these
“string searches” is that they are easy.
The main disadvantage with string searches, however,
is that the search engine cannot understand the meaning of the words
or strings. Instead, the computer merely looks for a letter-by-letter,
literal match for each word used. Consider the following sentence:
This sentence does not contain any information about biology,
money, or foods like butter and milk, and certainly is not about
automobile pictures, airline fares, layer jokes, opera singers, or
library books.
While the above is a proper sentence, it would confuse string-matching
programs because of the context of each word in the sentence. Entering
this sentence into a string search would result in a list of diverse
topics such as money, automobiles, jokes, opera, or law.
Rocking the Internet
Suppose a person was interested in finding web pages
about “rock music.” With a simple string search, this will not be
effective because the computer will locate any pages with the term
“rock” or “music.” String searches will yield pages about other forms
of music (i.e. classical), and pages about geology.
Because a computer has no intelligence, search engines
do not recognize common phrases. String searches result in each word
being treated as an independent search term.
Advanced Search Techniques
Because string searches cannot understand the context
of the terms used in the keyword(s), they can give confusing results.
Suppose you needed some information on John F. Kennedy.
If you just did a string search for JFK, you would get
results that included buildings, monuments, and organizations with his
name in them.
Boolean logic allows users to get more controlled
search results by introducing three possibilities:
Find this term AND that term
Find this term OR that term
Find this term NOT that term.
Instead of just searching for JFK, you might try JFK
AND Massachusetts, or JFK AND Senator AND President. Each time the
word “AND” appears, the number of responses that the search engine
will generate will be limited. (Note: some search engines by default
assume that you want the word “AND” when running multiple word
searches.)
To broaden a search, use the Boolean operator OR. For
example, using JFK OR Onassis will find “hits” for either of Jackie
Kennedy-Onassis’ famous husbands.
To exclude some types of information from a search, use the Boolean
operator NOT. For example, Jackie AND Kennedy NOT Onassis will result
in hits about JFK, his wife Jackie, and exclude information about her
remarriage to Aristotle Onassis.
The real value of Boolean logic comes from using these
statements together. When more than one operator is used, the computer
interprets them from left to right, unless they are grouped in
parentheses.
For example, to find information about Jackie Kennedy Onassis before
she remarried, try using (Jackie OR Jacqueline) AND (Bouvier OR
Kennedy) NOT Onassis.
The concepts of Boolean logic are the same for all
search engines. Different programs, however, may have their own
versions (or “Syntax”) for these commands. Be sure to check the Help
section of a search engine to be sure how to express Boolean
operators.
Sometimes, the best keyword is a phrase instead of
individual words. To indicate to a search engine that you want a
literal match for a set of words as opposed to any web site that
contains these words, place the phrase in quotes.
For example, a search for “John F. Kennedy” will only
yield results for web pages that contain these exact combinations of
this phrase in that exact order. Even this search might not be
specific enough. In this case, it would probably be even better to
search for “President John Fitzgerald Kennedy.”
The Good News
Fortunately, most search tools today have HELP
features and "advanced search" dialog boxes to help guide users
through the process of successfully locating information. Be
sure to take a look at the search options and help sections of the
Websites you use - a little bit of reading or review can save a lot of
time! The world's most popular search engine,
Google, has an
easy to use "Advanced
Search" link. Virtually all the other search engines have
this too.
If you are used to just using simple keyword searches,
perhaps these tips seem a bit intimidating. Like anything, with
practice, they become easy.
Scrolling through screen after screen of web sites that do not contain
the information you desire is a drag and a waste of valuable time.
Learning to conduct more precise searches with various search engines
will give web surfers greater freedom and more time to explore the
vast resources of the Internet.
The World-Wide-Web contains a vast array of
information resources and services. Used properly, the www is a
student’s best friend. Up-to-date information on the web can be found
quickly.
A wide array of formats such as audio, video, and graphics are
available. Just point-and-click and you can easily move to related
websites.
Getting the most out of on-line research does have its
challenges. It is easy to become overloaded by the Internet’s vastness
and many sources are not checked for quality and accuracy.
While using search engines is easy, unless one
carefully limits these searches, they return results that may
overwhelm users or provide too much of the wrong type of information.
The www is also becoming more and more of an
advertising vehicle. These advertisements can be annoying or
distracting.
As a student, getting the most out of the web is
important. After all, time spent endlessly surfing is time that is not
actually producing an assignment.
Research and the Open Web
Most of us are familiar with the parts of the Internet
that anyone can freely access -
While many students immediately turn to the www to
start the research process, a more thoughtful approach might make
actually completing a project easier. Here are some suggestions to
help you get the most out of your research:
Fact/Statistics/Technical Data. Locating
accurate, specific pieces of information on the web can be a
challenge. Often, it is easier and quicker to locate accurate,
verifiable, useful data at a library or with source book. Try looking
at
Research-It!
Reference Lists. Instead of just surfing the net, try
locating sources with your library’s online catalog. Identifying
useful journal articles is easily done with topic indexes that are
frequently posted on libraries’ home pages.
Background Research.Properly used, Search engines
work great for locating a wide variety of background information on
almost any topic. Just be sure to review each site for content
quality.
Book Citations. Library online catalogs, Amazon (www.amazon.com),
and Barnes and Noble (www.bn.com) provide information to properly and
completely cite sources.
Current Events. Many new sources post news free. Check
the New York Times (www.nytimes.com) or Google News (tab at
www.google.com or
http://news.google.com/)
General Biographical Information. Some traditional,
verifiable, and widely accepted print biographical resources, such as
Current Biography, are available for subscribers online. Check to see
if your local library has access to them.
Graphics. Many pictures, graphs, and tables are
available online. Search engines can help locate them. Just beware,
they may not copy or print as clearly as you would like.
Demographic Data. Many states and communities post
this type of information, though always check for accuracy. A
librarian can help you get the best and most accurate information.
Company Information. The best sources are subscription
services such as Lexis-Nexis and Hoovers.com. They are not free; check
your local library.
Maps. Many good sites are available on the web. Verify
that maps you use are up-to-date.
Many people are surprised to learn that Internet
search engines do not provide comprehensive coverage of the web. Parts
of the web are inaccessible using traditional web search tools. This
is referred to as the “invisible” or “deep” web.
No
search engine knows every page on the web – let alone the invisible
web. The web grows and changes far too quickly for anyone to find
every new page.
Pages without links cannot be found because the software that searches
the Internet, called “crawlers” can’t follow links to find them.
Search engines do not have to list pages that are on
the web. For business and editorial reasons, they may refuse to
include some sites in their searches. Some webmasters actually request
that search engines do not index their pages.
Also,
most search engines are designed to index text and are not able to
process non-text information. Pages that consist primarily of audio,
images, or video are typically “invisible.”
Typically,
search engines are not able to handle some types of formats including:
PDF (Google excepted), Flash, Shockwave, Executables, and compressed
files. Because these files are not made up of HTML text, they are
difficult to index.
Much useful information on the web
is stored in databases – these present problems for search engines
because each database has its own design of data structure and search
tools.
Even if a search engine can find the gateway page to a
database, the tools that each database used to retrieve data prevents
search engines access to the database. Web-accessible databases make
up the largest part of the invisible web.
Why Use
the Invisible Web?
Since most of us can access many
different types of information on the web using general-purpose search
engines, why bother with the invisible web?
Invisible
web resources are usually more specialized and content oriented.
Researchers are looking to satisfy information needs in a timely
fashion. Speed and accuracy often represent a conflict.
Because of their need to appeal to a wide audience,
general-purpose search engines might not represent an adequate
compromise between conciseness of information and timeliness.
Invisible resources, however, are specifically set up for specialized
needs.
Advantages of invisible web sources include:
Specialized content focus means results are more
comprehensive.
Specialized search interfaces mean more control over
search input and output.
More relevant results.
Institutions or organizations that post specialized
“invisible” web sites often can legitimately claim on being
authorities on their areas of expertise.
Because of their specialization, invisible web
resources might be the only source of some types of on-line
information.
When to Use the Invisible Web?
So now that you understand the difference between the visible and
invisible web, how does one decide which will result in the best
information?
In general, let the following serve as guides:
When you are familiar with a subject. If one already knows a
great deal about a subject, they already likely know sources of
information on the invisible web. Familiarity results in better
knowledge of where to look and what keywords to use.
When you are familiar with specific search tools. Most invisible
web resources offer more sophisticated interfaces than
general-purpose search engines. Restricting searches with advanced
search functions makes it easier to locate specific information.
When you need precise answers. General-purpose search engines
can return hundreds or even thousands of results for each inquiry.
Most invisible resources provide locate specific and precise
information more concisely.
When you need exhaustive, authoritative answers. The web is too
large and complex for general-purpose search engines to provide
in-depth, authoritative results. Some key sites will be overlooked.
The invisible web is less likely to do so.
When current information is desired. Due to the fact that
general-purpose search engines are not “real-time” searches,
invisible web resources are often more up-to-date.
Like the entire Internet, the invisible web is changing all the
time. A number of on-line newsletters cover these resources. Here are
some good ones:
Academic Info. Online subject directory of over 25,000
hand-picked educational resources for high school and college
students as well as a directory of online degree programs and
admissions test preparation resources (SAT, GRE, LSAT, MCAT, GMAT,
USMLE, TOEFL).
Free Pint.
This e-mail newsletter helps web users to find reliable sites and
search the web more efficiently. Concisely written for people that
do not have time to endless “point-and-click,” each issue includes a
“Tips and Techniques” section where pros share their best
researching strategies and share their favorite web sites.
Infomine. A virtual library of Internet resources
relevant to faculty, students, and research staff at the university
level. It contains useful Internet resources such as databases,
electronic journals, electronic books, bulletin boards, mailing
lists, online library card catalogs, articles, directories of
researchers, and many other types of information.
Internet Resources Newsletter. This site specializes
in Internet awareness for academics, students, engineers,
scientists, and social scientists.
Librarians’
Index to the Internet (LII). A searchable directory of
web resources that is kept by Carole Leita and more than 70
voluntary reference librarians. It is organized into useful
categories and is extensively linked to cross-references.
ResearchBuzz. This site specializes in Internet
research. It is updated daily and contains an extensive listing of
quality reference tools. Creator Tara Calishain, author of many
Internet research books offers in-depth analysis and reviews of
various Internet resources.
The Scout
Report. The Scout Report is as a widely recognized
“seal of approval” for quality websites. It is updated weekly and
summarizes useful web resources.
NoodleTools
An overview of designs, process, and outcomes for information literacy.
Power Reporting(Columbia Journalism Review).Thousands of free, on-line search tools, including links to tutorials
and tips. While targeted for reports, it will be useful to any
researcher.