During this project, we have begun to get a grasp on some of the empirical characteristics of the web information space, in the spirit of "information foraging" but without the characterizing formulae. We have used tools, described in Appendix C and demonstrated at ONR on September 17, 1997. We have experimented with collecting and portraying topic collections of web pages using three complementary views: domain, links, and concepts. We have defined a canonical set of tasks in a typical professional web use setting and compiled the data for evaluating user response. And we have used web patterns to begin to define some of the ways that common problems occurring during web use may be addressed by "browsing in context". User-driven tools have helped us perform our empirical investigations and codified our model.
Understanding how to use the WWW productively is a big challenge today for all individual information professionals and all information-intensive organizations. In a sense, for heavy-duty information users, the web browser technology has gone the wrong way. Histories, bookmarks, and other aides have become fossilized within dominant browsers that address multimedia, advertising delivery, and operating systems rather than the original objective for the web more like a large technical library. The nature of the marketplace favoring browser vendors who control the rendering of HTML has made it difficult for tools supporting alternative uses of the web to mature. The scale of the web and the ability of enough people to "just about cope" with that scale, together with the genuine opportunities to access both traditional and "gray" (products, resumes, newsletters, etc.) materials, has both revolutionized modern information and trapped it. More research is needed to define viable alternative and complementary tools and techniques to maintain the quality uses of the web for government, scientific, and educational communities.
The following opportunities could be pursued.
It surprises many information professionals to learn the reality of how much (or how little) of the web is indexed, how recently, and what recall/precision mean to a commercial search company. Closer to our goals, very little is known about the size of technical topics, the rate of change of this class of materials, let alone the quality aspects of the material and its authors. A few librarian researchers (an informal session at Online World 97) and the Xerox-Stanford experiments could be complemented by a supported effort to collect, validate, and track web information at the topic level.
We believe that the most benefit can now be gained by tapping into the professionals who are systematizing what reference librarians do for their clients (build good lists of resources), e.g. packaging information on important research programs for program managers, proposal submitters, technology transfer agents, and investors. The best improvements in the long run will come from information science researchers working with domain specialists and end users.
One challenge might be to take a topic of significant interest to an important community and capture ALL there is on the web, organize that to suit defined users, monitor and evolve the collection, and use it as a testbed for research on analysis tools, user interfaces, information retrieval, and cognitive models.
Is this doable? We have assessed the quality and quantity of WWW information on the subject of "software safety" with the target community associated with a forthcoming standard from the Underwriters Laboratory. Based on our FMEA experiments, we estimate some 20,000 URLs covering 100s of research groups, products, standards organizations, and the job market. Would it be worth it? e.g. to the Navy and the safety industry to have such a compilation of materials, further organized to match the professional criteria of engineers, researchers, and the public interested in computer risk? Perhaps, depending upon the costs of the compiling, qualifying, and maintaining the collection. We suggest a combination of domain-specific testbed (to attract users and provide coherence) together with objectives similar to the NIST-directed TREC projects (to compare techniques, drive tool improvement, train analysts, and gauge progress).
This is the kind of alternative vision enabled by an understanding of the value and costs of WWW material together with prototype tools, techniques, and skills to head toward that vision.
Clustering and our multiple views provide new on-the-fly information packages that can be used to drive browsers, rather than the other way around. HTML renderers (the heart of the modern browser) should be components of tools that provide users with ways of managing URL collections and, when needed, viewing them.
Our brief foray into the research on hypertext, navigation metaphors, information retrieval, and empirical studies convinces us there are strong alternatives for a next generation of browser technology. While the market forces may prevail for the vast number of current browser users, another type of browser user community could evolve. This one would base its choices on productivity (work accomplished, tradeoffs), quality (qualifying results for specific uses and users, defining criteria for qualification), learning (getting more information, more flexibly), and analysis (identifying trends, finding gaps, getting the big picture).
With the specific goal of fully utilizing the very large information space of the web, new techniques can come for processing other large information spaces, e.g. major policy issues or international operations. We claim our three views have significant generality: where did it come from? what is it linked to? what concepts does it address? The major lacking view is time (is it present, past, future?)
Behavioral studies of how web users work are just appearing. But the results seem to be directed toward identifying all the problems with ad hoc, fossilized mechanisms such as bookmarks and histories in the current dominant browsers. Defining and performing studies based on new approaches like our "browsing in context" might lead to genuinely new technologies, or training in new ways of using existing tools. Again, the problem is that information-intensive web users are trapped by browsers that don't serve their needs, yet weigh heavily on their desktops. A good set of principles about how users address certain tasks, e.g. "find the authority in field X" could further the empirical understanding of both web information spaces and user behavior.
Specifically, we see several additions to our pragmatically-driven "browsing in context" approach: