Do Search Engines Suppress
Controversy?
A June 2003 U.S.
FCC ruling
on
media ownership consolidation raised the question: Does the Internet
provide
effective alternatives to TV, radio, and news channels? Are websites
cost-effective outlets for expression of diverse opinions and
fair dissemination of
information? Some
researchers (see references)
suggest the
Web is much less than a distinct alternative to traditional
media, and even presents similar barriers and limits. Search engines
-- the central force for finding
information on the Web -- have inherent biases, as does any
technology.
Perhaps we can test one distinctive aspect
of this question: do
search engines suppress controversy?

Suppose you query a search
engine on a topic of new interest or to meet a current need.
Perhaps the herbal remedy "St John's Wort" intrigues you, or you're
planning a trip to the Central American country of "Belize", maybe
considering
getting another degree via "Distance Learning", helping your
child
write a report on "female astronauts", or reviewing the life and times
of "Albert Einstein". What if each topic came back in the top search
results without any
apparent controversy: no disagreements about effectiveness of the
herbal remedy, nothing but enticing beach hotels and ecotourism in
Belize, a roster of good-looking sites promising
learning at your own pace, a space history emphasizing first Russian
and American women astronauts, or a plethora of quotation-filled
biographies that briefly allude to Einstein's pre-WWII life in
Europe.
Each topic appears as if the search engine had a distinctly "sunny
personality", telling you
nothing suspicious or unpleasant about any of these five
topics.
Would
you have received a fair representation of the web content on each
topic? You'd certainly
find well-written, informative pages and web sites to surf from. But
would the search engines have suppressed underlying controversies,
facts and disputes that might alter your medical,
educational, or travel plans? Might you miss some alternative views of
the space program or the life of Einstein that could be
instructive for you or your children? Or could you be lulled into
thinking a
topic rather boring because you miss the richer indicators of
worldly activity, such as multiple
viewpoints, changing historical
perspectives,
and reflections of social norms? If underlying controversies exist,
how
would you find them?
Where's the
Controversy? Experiments on 5 popular topics
We investigated
specific
controversies for each of the
above five topics then measured how much each controversy appeared in
each of the general queries ("Belize", "distance
learning", "Albert Einstein", "St. John's Wort", "female astronauts").
The results were
mixed, 2 controversies showed and 3 were virtually missing when search
results were plied from three popular engines (Google,
Teoma, and AllTheWeb) and two meta-searchers (Profusion and Copernic,
querying and collating
from different engines). Most enlightening were the factors that
suppressed or revealed the
controversies.
1.
"Belize"
looks to search engines as mostly hotels and ecotourism, reflecting the
primary
industry of the country - ruins, reefs, and jungles. Online
also are country fact sheets, historical chronologies, local
newspapers, and several
Belizean government web sites. The underlying controversy we sought was
the
Belize-Guatemala border dispute -- claims from the early 19th century
that
Guatemala owned Belize -- now being mediated by international
organizations. The dispute is rich in the history of colonialism
(Belize gained independence in the 1960's), a major saga for a
small developing country, and an occasional tragedy for settlers.
Ask
for "Belize Guatemala border dispute" and you'll get many descriptive
and passionate historical
and political pages. But the top search results will mention the
dispute only as small items in country fact sheets.
2. "Distance
Learning" search results are a web of suppliers
and trade
associations with an occasional page of links over to the analytic
literature researching quality of learning.
In fact, around 1998, technology historian David Noble coined the
term "digital diploma mills" to raise red flags about
commercialization of the academic enterprise by outside interests
(and university administrations), subverting the control of
faculty over their own content and their relationships with students.
Thought provoking articles reproduced online stimulated wide debate in
socio-technical newsletters and fora. But this controversy will be
found on only
one library-based page of links, leaving the reader to perhaps assume
academia not only accepts distance learning as
equal to traditional learning but has fully embraced its commercial
potential. Could someone searching
for "distance learning" be mislead into a commercial maze of choices
rather than cautioned about their options?
3. Surely,
everything must be known about "Albert Einstein" and his
early 20th century life, as
presented in
a multitude of biographies on science and reference sites. Well,
not
quite, with emerging suggestions there's more to his first
marriage to Serbian science student Mileva Maric. Her characterization
in his
biography is "they met in physics classes in Zurich, became
lovers and colleagues, gave up an illegitimate daughter, married
against
family wishes, settled down (with two more sons) for Albert to pursue
his career, and Maric just didn't make it through her graduate
qualifiers. After
a while, Albert strayed from his unhappy wife and married again,
emigrating to the U.S.". However, letters and
investigations by scholars of early women scientists suggest Maric
contributed
rather more than has been acknowledged to Einstein's work, leading up
to
his Nobel prize (which monies were her divorce settlement). Will a
search on "Albert Einstein" yield this "first wives club" tale of woe?
No, but a search on "Einstein Maric" or "Mileva Maric" will.
We've now seen 3 controversies where
considerable web material showed
up in queries specific to the controversy ("Belize Guatemala border
dispute", "David Noble" and "digital diploma mills", and "Mileva Maric
Einstein") but not in the simple query
on the broader topic ("Belize", "distance learning", "Albert
Einstein"). What's suppressing these
controversies? The
main reason is organizational clout: the usual services and clearing
houses of tourism, ecology, and archeology in Belize; large schools
with
distance learning services and associations promoting distance
learning; and reference and science sites for Einstein biographies and
quotations. The sites containing controversy have more analytic
pages (data, detail, less glossy), are more from individuals and
smaller organizations, and are less inter-linked and coordinated. There
are no largish
central web sites for Belize-Guatemala disputes, David Noble, or Mileva
Maric.
4. However,
search for "female
astronauts" and up will come pages about the "Mercury
13". This group of women aviators passed preliminary physical
and psychological tests but was rejected by NASA in 1961. Why? The
women
weren't test pilots (restricted to male military), astronauts
at the time were uncomfortable with changes in the "way things are",
and
loss of a female astronaut might jeopardize the whole space
program. However, the tale of these brave and skilled pilots, lives
already risked as World War II WASPs, their
careers disrupted for this secret training, and their
aspirations for contributing to the US space program have been told in
two recent books "Mercury 13" by J. Ackmann and "Promised the Moon" by
S. Nolen. 20th and 40th anniversaries of American and Russian women
astronaut firsts and the 1998 geriatric flight by original Mercury
astronaut
John
Glenn combined with take-up of the cause
by many women's organizations, current astronauts (Commander Eileen
Collins), and the still adventurous surviving pilots. The
controversial subtopic appeared in roughly 15% of search results,
enlivening a rather stale list of "first women" exploits and a saddened
manned space program. Although organizational clout (NASA) dominates
the primary query "female astronaut", this controversy is well
orchestrated with web sites, petitions, news columns, and press
releases.

5. "St. John's Wort" exhibits recent
controversy over effectiveness of the
herb for its claimed mood enhancement and depression reduction. With
rising public interest in alternative medicine, the National
Institutes of
Health and several drug companies, in 2000, began clinical
trials. Government-issued advisories on certain side effects (primarily
for HIV patients) and questions about comparative effectiveness were
rapidly
promulgated by many syndicated health columns and newsletters. A
simple search on "St. John's Wort" draws out many herbal descriptions
and myriad storefront shops, as well as these cautionary newsletters. A
shopper
just looking for a good buy might well bypass the advisories, some of
which are long and analytic. A reader with an open mind would
likely find the advisories useful, although the trials are hardly yet
conclusive.
To summarize, suppressing factors
include organizational clout, search engine preference for glossy over
analytic material, and considerable
duplication
of top results. Some controversial topics suffer from poor
coordination,
interlinking, and low status of web sites. Factors that reveal
controversies
include: explicit promotion, timeliness, media interest, and social
relevance.
You mean, the
Web isn't all that different from traditional media?
Well, duh, these results are pretty
much to be
expected and in many ways
comparable to traditional media. Websites exist to promote their
organizations,"money talks", and search engines return what society is
currently interested in. What's different on the Web? First,
links are the "currency of the Web", amplifying inter-organizational
relationships, and suppressing smaller sites that don't inherit links
from high status sites or that link poorly among themselves. Links and
hypertext transcend citations and catalogs in libraries and
periodicals. Second,
search
engine ranking reinforces that people are less
interested in controversial subtopics: border disputes, academic
defensiveness, early 20th
century science careers of women, rejected women aviator adventures,
and
long-going clinical trials. But, on the Web, these controversial topics
are only one query away, if the searcher knows the right keywords.
Third, search engines are businesses, not a service to society.
Search
engines seek to provide relevant and widely useful popular
pages to searchers who also click ads and paid placements,
leaving the more detailed and perhaps less pleasant content to
searchers who are willing to work harder. Of course, a search engine
needs a positive outlook; it's another form of sales, not a librarian,
professional expert, or social activist.
The
dilemma is anybody can
find the
controversies --- if they know the right query terms. But if the
top
results don't reveal the controversy, it's quite easy to be lulled into
the "sunny side" of the topic and miss the more cautionary or
interesting
"darker side".
Further
distinctions exist in the growing literature on search engines
and
biases of technology (see references below).
Observed "power" laws
describe a Web where traffic and links for
many topics
are dominated by a few sites; most sites are neither linked to nor
often
visited.
With search engines basing query results on links,
smaller and newer sites are harder to find by crawlers, then rank
lower, and
consequently receive even less attention except by more narrowly
interested parties who find them by specific queries or blogs.
Furthermore, even the most potent search engines index relatively small
parts of the
Web, maybe 25% or less, and different engines index different parts,
all
missing the "Invisible Web" of dynamic pages and submerged databases.
Thus, small,
new, or alternative
sites face obstacles first becoming retrievable, with no guarantees of
high
visibility via search engines. Research on the effects of these inherent
biases is difficult due to
the scale of the Web, unpredictability of proprietary search engine
strategies, and
mixed expectations of Web quality. Technology, politics, economics,
even physics, all help characterize behavior of the Web, helping to
explain our
experimental results.
Can Web behavior be
changed? Must controversies remain submerged?
Whether these five
topics represent
the wider Web or indicate trends is
hard to judge without further experimentation. However, from these
results, we can hypothesize a more "Objective Web" which exhibits
greater diversity and fairness. We've seen that Web behavior is
governed by
three interlinked communities: web page authors who promote topics and
bestow links, search engines that crawl and rank via links, and
searchers who reward authors and engines with their attention and
clicks. Search engines certainly cannot be expected to
become informed, context-aware librarians striving for collection
development or professional topic
experts offering penetrating and balanced advice. However, they could
distinguish better and provide more support
for the Analytic Web (details, analysis, longer content) versus the
Organizational Web (real world institutions, showing their Web
presence). For example, Teoma.com
offers an "enthusiasts" section listing heavy linkers and focused topic
sites that often tap into more diverse and selected resources. Perhaps
the "Semantic Web" mission to bring structure and meaning to web
content will offer an alternative to current searching.
In general, search engines reflect the links
and pages
of website authors. Such authors might adopt a more objective and
extensive linking policy, comparable to scientific paper citation, more
thoroughly addressing (and linking to) pages with agreeing, opposing,
and neutral objective viewpoints.
A more
aggressive linking strategy among analytic and
controversy-expressing sites could exert influence on search engines as
well as benefit readers.
Certainly, our five topics
suggest additional advice for
building good searches. Since engines don't overlap much or cover all
the web, it's usually advisable to search multiple engines for a more
comprehensive search. Indeed, these experiments showed that search
engines overlapped around 30% and were roughly comparable (within 10%
of each other) at exposing controversy pages. No, Google wasn't that
much better. More engines + more
queries = more chance of
exposing controversies. The "sunny disposition" of search engine
also recommends "looking
for trouble" by digging deeper into search results, seeking the most
informed link pages, being skeptical of commercialism and advertising,
deliberately going beyond the Organizational into the Analytic Web.
Just
asking "Topic AND Controversy" or some appropriate synonym (dispute,
opposition, objection, etc.) may reveal submerged controversial
content, but a good controversy search requires knowing the precise
keywords. Clustering search engines such as Vivisimo and
(the former) Northern Light also may highlight terms associated with
controversies.
So, do search
engines suppress controversy?
Search
engines are
extraordinarily
powerful technology that we are
absolutely dependent upon for using the Web. These databases,
algorithms, and user interfaces aren't conspiring to suppress
controversy, it's just the way they work as good business sense in
an information world barely a decade old. Searchers need to learn
their biases and how to
counteract them with better, more diverse, harder probing queries. Page
authors need to link more carefully, extensively, and objectively
since each page adds to the web of their topic and its influence on
search engines relative to competing topics. Every
link counts.
References
Politics of Search Engines
- the Federal
Communications Commission (FCC) Media
Ownership Policy Reexamination at http://www.fcc.gov/ownership/Welcome.html
and comments by Matthew
Hindman
and
Kenneth Neil Cukier, "More News, Less
Diversity", New
York Times, June 2, 2003, also at http://www.princeton.edu/~mhindman/NYTimesOpEd2.htm
- Lucas Introna and Helen
Nissenbaum, "Defining
the Web: The Politics
of
Search Engines", IEEE Computer, January 2000, 54-62 also " Shaping the Web: Why the Politics of
Search Engines
Matters" in The Information
Society, 16(3):1-17, 2000. http://www.nyu.edu/projects/nissenbaum/papers/searchengines.pdf
- Walker,
Jill. "Links and Power: The Political Economy
of Linking on the Web", Proceedings of Hypertext
2002. Baltimore: ACM
Press,
2002. 78-79.
http://cmc.uib.no/jill/txt/linksandpower.html
- Jakob Nielsen, "Alert Box:
Diversity is Power for
Specialized
Sites", June 16, 2003. http://www.useit.com/alertbox/20030616.html
Empirical
studies of the Web
- Matthew
Hindman,
Kostas
Tsioutsiouliklis, and Judy Johnson, "'Googlearchy':
How a Few Heavily Linked Sites Dominate Politics on the Web",
Annual
Meeting of the Midwest Political Science
Association, Chicago IL, April 4,
2003, http://www.princeton.edu/~mhindman/googlearchy--hindman.pdf.
- Lee Giles and Steve Lawrence, "Accessibility and
Distribution of Information on the Web", Nature,
Vol. 400, pp. 107-109, 1999, http://wwwmetrics.com
- Huberman,
Bernardo "The Laws of the Web:
Patterns
in
the Ecology of Information", MIT Press, 2001
- Albert-Laszlo Barabasi, "Linked:
the New Science of Networks",
Perseus
Publishing, 2002. http://www.nd.edu/~networks/linked/
- David
Pennock,
Gary
Flake, Steve Lawrence, Eric Glover, C. Lee Giles, Winners
don't take all: Characterizing the competition for links on the web,
Proceedings
of the National Academy of Sciences, 99(8): 5207-5211, April 2002. http://www.modelingtheweb.com/
- "The Semantic Web", http://www.w3.org/2001/sw/
- Power laws look like

Bias Studies
- Chris Sherman, "Are Search
Engines Biased?", Search Day
Newsletter,
11 March 2002
http://www.searchenginewatch.com/searchday/article.php/2159431
- Abbe Mowshowitz and Akira
Kawaguchi, "Bias on the Web", Communications
of
the ACM, September 2002, pp. 56-60. Also
CCNY Intelligent Searcher at http://wikiwiki.engr.ccny.cuny.edu/IntelSearch/
- Leslie
Marable, "False
Oracles: Consumer Reaction
to Learning
the
Truth about How Search Engines Work, An Ethnographic Study",
June 2003 http://www.consumerwebwatch.org/news/searchengines
- WebMasterWorld forums at http://www.webmasterworld.com/
- Chris
Sherman
and
Gary Price, "The Invisible
Web: Finding
Hidden Internet Resources Search Engines Can't See", Cyberage
Books,
2002, http://www.invisible-web.net
- Genie Tyburski, "Evaluating
The Quality Of Information
On
The Internet", http://www.virtualchase.com/quality
- Leah Graham and P. Takis Metaxis,
"Of course
it's
true; I saw it on the Internet", Communications of the ACM, May
2003,
70-75, also http://www.wellesley.edu/CS/pmetaxas/CriticalThinking.pdf
- Stephen Adams, "Information
Quality,
Liability, and Corrections", Online Magazine, September/October,
2003, pp. 16-22. http://www.infotoday.com/online/sep03/adams.shtml
Experimental methodology and Results
An expanded version of this
paper appears in First
Monday, Jan. 5, 2004. Lists of URLs used in the
experiments are
available
Also online is an experimental query expander, a Controversy
Discovery
Engine
We collected URLs from 5 search engine sources: Google,
Teoma, AllTheWeb
(FAST)
and two
multi-searchers web-based Profusion (Altavista, About,
AOL, Lycos,
Raging Search, Wisenut, Metacrawler, MSN, Adobe PDF, Looksmart,
Netscape,
Teoma, AllTheWeb) and desktop
Copernic (Ah-ha, Altavista,
AOL,
Euroseek, AlltheWeb, Findwhat, Hotbot, Infospace, Looksmart, Lycos,
Mamma,
MSN, Netscape, OpenDirectory, Teoma, Wisenut, Yahoo). Appropriate
simple queries about the broad topic and about the controversy were
collected, top 50 for single engines and top 100 for
multi-searchers.Searches were performed during August 2003. All URLs
were merged together using a Windows
URL Analyzer, twURL. Each
topic contained 500-900 distinct URLs, which were browsed using twURL
views (link counts, domains, keywords) , tossing out off-topic and dead
URLs. Each was rated as "Deep"
(right on target the controversy,
so that a searcher wouldn't miss the story),
"Revealing" (with links to or passing
mentions of the controversy, but
a searcher might well miss the controversial subtopic or its
importance), and "Other"
(informative, relevant URLs but not about the
controversy). Summary
data
on the
overlap of the simple shows that
(1)
for the controversy queries (ABOVE), there are many deep and revealing
URLs,
i.e. controversy pages exist for the right query (average 300 URLs per
controversy), but (2) the controversial pages
were suppressed or submerged in the simple topic (BELOW):