KWIC, KWOC and A-Z site indexes

The redesign team I work with has been experiencing the joy that only a librarian can derive from trying out different index displays. Hallway discussion led me into a senior programmer/manager's office for a fun history of printed KWIC indexes of Bell Labs serials, whose programs lived on Fortran punch cards back in their day. Only an information geek like me would find these conversations as fascinating as I do. (insert snorting infogeek laughter here).

These past weeks we've been trying to improve how our site's A-Z index functions. Our publishing system (a database and programs that handle different publishing functions) has fields for synonyms or label variations, but we haven't filled in that field for all database records. To give you an example of how this functions, we have a content record for "The American Heritage dictionary". This lists in the A-Z index under A. And to get it to also list under D for dictionary, we could add a synonym, "dictionary".

KWOC

The synonym field is satisfactory for most purposes, but requires manual work. To take a programmatic approach we can simply rotate the terms in the label so that each term gets a turn in the front position -- functioning as the sort letter. So for instance, a problem label like "Lucent Archives" can be found under:

Lucent Archives
Archives, Lucent

Simple. This shifting of terms in the label should be familiar to people who've used printed indexes to do research. It shows each Keyword Out of Context (KWOC). This is nice, because in an electronic index, every term with the exception of stop words (a, or, an, the) can be used in the index to make the item findable. A similar method is used in the Modern Language Association index which uses something called faceted citation order syntax.

KWIC

Another method exists in Keyword In Context (KWIC) displays. You have undoubtedly seen displays of search engine results that look like this:

KMNetwork: World's most reputed Knowledge Management resource and ...
The Knowledge Management Network includes world's most renowned content and community portals on Knowledge Management including the WWW Virtual Library on ...

This is a KWIC display. Keywords from your search are shown exactly as they appeared within the context of the sentences they are found in. This is common for search results displays. It's a useful method to help you determine if a document is relevant.

The programmer that helped me come up with a display for our A-Z site index is now suggesting that we try a KWIC display for the listing. This way, in our A-Z index, a page labeled "Lucent Archives" will appear under both the L sort and A sort without necessitating the creation of a term variation "Archives, Lucent". I've found his demo below to be extremely helpful to illustrate how this works. Once a person gets used to seeing the listings this way, I think they are likely to find this display very useful. It increases the amount of ways that labels will be listed. So, using our "Lucent Archives" example, here is how a KWIC display shows you that listing under the A sort:

(sample KWIC display of "A" listing with "Lucent Archives" emphasized)

    NOTICES archive
    Archived Networking Quarterlies
    Competitive intelligence newsgroups and email list archives
    Lucent archives
    LA Times: Los Angeles Metropolitan Area Stories
    New York Times: NY/NJ/CT Metropolitan Area Stories

I think this does a nice job and based on a few demos of KWIC and KWOC displays am advocating the conversion of our A-Z index to the KWIC version you see above. Compare the KWOC display to a rotated term display:

I think it takes more cognitive work to process the labels with rotated terms. The KWIC display may not be visually as easy to read because of the jagged left and right sides, but the coloring of the sort word gives your eyes a line to follow downward and you see the label in its correct order.

For another illustration of these methods, see this library school project at the University of British Columbia, which shows KWIC and rotated KWOC displays for a hypothetical cheese thesaurus. This is making me wonder why wine and cheese are such popular subjects for information organization.