We will be using the formulation of the "design pattern" or "pattern language" R&D community, now also being applied to Internet situations (CHI 97 workshop). to identify our assumptions and rationale for the "browsing in context" model On the one hand the pattern approach is quite liberal, permitting various characterizations of essentially problem-solution-consequences. We adopt this approach for several reasons:
Setting: Web users access a wide variety of materials from multiple sites across millions of URLs, using search engines, crafted guides, and personal/professional knowledge. For many technical topics, 1000s of URLs are candidates of interest for typical web tasks.
Problem 1a: An unordered list of URLs is difficult to browse.
Problem 1b: It is difficult to understand the patterns and trends of a subject from an unordered list of URLs.
Solution: Find a meaningful ordering or classification scheme over the URL space and use it to sort the URLs into classes with the same assignment or into order. Classes and ordering proximity should indicate some commonality or distinctive difference among grouped or ordered URLs.
Consequences 1a: Browsing the list in the given order will provide the browser (person) with some sense that one URL following another are likely to either (a) have something in common or (b) be distinguished from each other in some known way.
Consequences 1b: Orderings provide ways of seeing extremes, distributions, centers, and gaps. Comparing ordering provides ways of distinguish collections and identifying time and other trends.
Implementation 1.1: Order URLs by number of links to each within a given pool or links. Extract the pool of links from a set of web pages.
Example 1.1.a. The pool may be generated from a pages generated by web search engines. Then # links to a URL indicates, literally, whether multiple search engines have indexed the page under exactly the same URL.
Forces 1.1.a.1 Webmasters, web page authors, and independent vendors often prime the search engines by explicitly submitting their URLs and by defining HTML Meta information or prefaces with frequent or chosen terms in order to place their web pages highest in a given search engine's relevance scheme. Promotional efforts may influence a URL's position in the ordering.
Forces 1.1.a.2 Search engines may index or revisit only part of a site and miss pages of relevance. Global web processes may influence a URL's frequency of linking in the ordering.
Forces 1.1.a.3 URLs with longer longevity are likelier to have references within the search engines or crafted lists.
Implication 1.1. A context is established for each URL within the ordering, depending upon interpretation as popularity, promotional effectiveness, prominence of organization, longevity of web presence, etc. Specific interpretations may be: a "1-link" URL may indicate a very specific, potentially useful, piece of information; a high-link URL may be a clearinghouse or a highly influential document.
Implementation 1.2: Order URLs by Internet names.
Example 1.2.a (e.g. .edu, .uk), sites (e.g. stanford.edu, ca.ac.uk), server (...) ordering alphabetically within each level.
Forces: Internic domain name assignment follows a rationale of type of organization requesting a name, where the name usually symbolizes some aspect, or its name itself. Additional domain names may in the future be more specific about type of business or purpose of web pages.
Implication 1.2. Browsing or evaluating URLs by domain context can utilize the domain naming rationale, e.g. .edu as an academic institution or association, .com as a company or major network, .net as an Internet provider, ...
Implementation 1.3: Extract phrases that, in a natural language sense of context, correspond to objects, persons, places, events, or concepts.
Order all phrases alphabetically and separate apparent person names from other phrases. For each phrase show each page containing it and a few surrounding words.
Forces: Concept analysis is a difficult text analysis process, requiring some grammatical and dictionary mechanisms. Approximate techniques may assist or may be confusing depending on their precision and completeness.
Implication 1.3: Phrase listings may help the browser/evaluator to understand the scope and nature of the domain. Phrase usage may indicate whether a page is within the topic or not, e.g. "no-fault" is not a part of the "fault tolerance" subject. Phrase profiles may help characterize a document during a scanning phase. Identical phrase lists may indicate identical pages. Modifications in phrase lists on different versions of a page may indicate significant changes.
Definition: Bookmark file is a collection of URLs selected for frequent access directly from the user's browser.
Context: WWW users frequently visit the same URLs over and over and need an online list for rapid lookup and direct link from the list to the WWW.
Forces/Constraints: Heavy WWW users need to manage large collections.
Problems: (from Tauscher et al)
Setting: Web users build up long lists of bookmarks or collect files with long lists. They often want to determine if a URL is multiply referenced (maybe it's more important than others or interlinks fields)
Problem: Given two (or more) lists of URLs, identify the ones on both lists.
Solution 1: Go down the first list and check off each URL that is on the second list. Delete the ones not on both lists or recopy the common ones to a separate list
Solution 2: Order the lists and merge them attaching a count where one URL merges into another. Extract from this list the ones with counts greater than one.
Consequences: Neither of these solutions is practical for manual use where the lists are more than a dozen.
Implementation 1: Classify URLs into buckets for 1, 2, ... For a given URL, find its bucket, say n, then move it to n+1, adding it to bucket 1 if not in any other buckets.
Implementation 2: Classify URLs by URL paths, attaching an identity for the list to the URL leaves.
Discussion: When are 2 URLs the same? Textual identity is one measure, but URLs may be written with or without port numbers, omitting default files for directories, include anchors within a page. Furthermore, web servers often redirect from one server to another so there is no canonical address for a URL (so it can be moved). Also, many international mirrors exist: is it significant to count both the page in Australia and the page in the U.S.?
This is only a start at defining some of our experience using the "browsing in context" methodology. We find the patterns approach conducive to clearer statements of the purpose, process, and pitfalls of the approach, but are not sure of the value for others learning about the approach.