"Toward a Context-Driven Model of Web Navigation

Summary of Results

This research addresses problems of current WWW users by experimenting with alternative techniques for accessing and analyzing web information. The objective is to increase the productivity of web users and to improve their ability to find, qualify, and propagate high quality materials. The key idea of "Browsing in Context" is: user-managed alternative views of collections of web materials provide higher-level insights on trends and patterns within the collections and improved direct interaction with abridged and full materials.

"Browsing in Context" (BiC) offers a different approach to accessing and using WWW information content. Current browsers, commercial web utilities, and desktop computing systems do not adequately support several needs of WWW Information Professionals (librarians, journalists, market and policy analysts, research program trackers, etc.). Specifically lacking are information management and analysis tools for Topic Search Management, i.e. collecting and organizing large collections of URLs on specific subjects.

  • Our specific results are as follows:
    1. Refined definitions of 3 views of URL collections that support the "browsing in context" model, where the user establishes known regions of the URL collection according to various perspectives (links, domains, and concepts) to browse and evaluate the materials. Appendix A contains a position paper for the CHI 97 Workshop on "Augmented Conceptual Analysis of the Web".
    2. Empirical characterization of web materials on a variety of subjects. This helps establish a baseline of requirements for tools and expectations for users and relates our approach to information foraging.
    3. A scenario, cast as a survey, that we believe captures the essence of requirements for web information professionals. Appendix B contains the survey on the subject of "access control and security policies for digital library data" and instructions for using HTML summary reports (available upon request).
    4. A survey of research literature that informs the BIC model and provides additional attacks on the same general problems of improved web user productivity.
    5. A preliminary casting of the above results in the form of "design patterns", a recent approach to organizing common computing and business procedures at an abstract level, stemming from the object-oriented modelling techniques.

    The underlying technology used for the experimentation is LodeStar, described in Appendix C with an accompanying screenshot demonstration on the website. Appendix D contains an example, "information foraging", output from the LodeStar toolset. The reference list consists of LodeStar-produced index files of URLs by Internet Domain and URLs by Report File.

    Further work could include: cost-benefit and formal architectural models for alternative modes of browsing in context, establishing a baseline topic area and community collection mechanism for a specific topic area, augmented analyses by document comparison and other information retrieval techniques.

    Where Current Tools Fail the Web Information Professional

    The WWW presents two sets of problems. First is sheer size and growth, covering both quantity of materials and frequent appearance and disappearance of materials. Second, is the lack of metadata and organization that permits classification by type (technical, commercial, promotional, personal, etc.)

    Today's technology -- browsers, search engines, bookmark managers, and word processors -- fail at the point where web users need:

    Search engines (Lycos, Altavista, etc.) emerged to cope with size issues and have begun to tackle the classification issue, both by using human classifiers and evaluators (Magellan, Yahoo) and algorithmic clustering (Altavista, NLSearch). The problems of using search engines are well documented (e.g. columns and tutorials). Crafted URL lists provided by subject experts or aficionados provide an alternative by winnowing down larger topics.

    Browsers that began as information viewers have evolved little in that direction, remaining weak in their history, bookmark management, and organizational mechanisms. They deliver HTML files to the desktop and provide rendering of the HTML, leaving the organization of materials up to the web user. A marketplace of web utilities have emerged to assist the web user in using search engines and in storing and processing materials. Metasearchers such as EchoSearch and WebFerret query multiple search engines, merge results, and provide selectable lists and automatic downloading.

    Thus, it is possible to, in minutes, amass a considerable amount of web pages on a selected topic using automated downloaders from search engines, website crawlers, and selected collections, thereby generating new problems in filtering, organizing, and propagating collection results. Given that there are several hundred URLs on a particular subject, what techniques and tools help the WWW IP to understand and process the URLs toward their goals at the moment?

    Currently, there are few analytic tools available for lists of URLs and only rudimentary editing tools. Altavista and Northern Lights both provide elementary clustering of URLs by type of site, by discipline (e.g. biology), person, place. GUI mechanisms such as Windows 95 ListView offer multiple-columns so that URLs may be listed then sorted by date, URL, title, etc. and deleted.

    To address this gap, we are using as an underlying toolset for the experiments performed under this contract the LodeStar tools described in Appendix C. These tools offer the net access, HTML rendering, bookkeeping, visualization, and rudimentary analysis to support browsing in context from the perspectives of: Internet Domain, Links, and Concepts. However, these tools have not yet been user-tested.

    Compounding the bookkeeping and other information management problems associated with the scale of the Web is the lack of metadata to assist classification and organized access to the Web. We do not directly address this issue but expect to see slow improvement through the work of Digital Library Metadata groups (see summaries in the Digital Library Newsletter). A significant realization that appears clearly in our empirical studies is that the Web has some aspects of a traditional technical library but perform especially effectively at delivering the "gray literature" of product descriptions, organizational newsletters, and other traditionally unmanageable materials. Any web classification improvements would provide more contexts for our model, e.g. product descriptions and reviews.

    Our claim, supported by the tool prototypes and partially validated by these experiments, is that the browser component (in the narrow sense of the HTML rendered) should be a subordinate to a user-directed information management system rather than the current market game of browsers dominating the desktop and becoming operating systems. This inverted mode of thinking gives rise to new strategies for browsing, such as described here.