Cluehunting: A Proposal Regarding The Intelligent Use of Available Data In The User Interface
Where Cluehunting Comes From
(Editor´s Note: I have received an impressive and incredible amount
of email regarding cluehunting, and I thank everybody who mailed me.
Much of the text here needs to be rewritten to accommodate the lucid
and honestly surprising quantity and quality of research people put
into advancing this proposal. Some stuff regarding the true history
of cluehunting does need to be modified.. Bear with me.)
Cluehunting is an advanced Expansion Agent, defined as a system that
allows the computer to search possible “expansions” throughout given
contexts given a “clue” by the user. Clues are defined as segments of
data(type irrelevant) that the computer would be able to utilize to
predict the final contents of the user’s intention. Expansions are
the presumed intentions of the user. Finally, contexts are the
“search space” that is being scanned–the file system context, the
launcher context, or even a thesaurus/spell check context are all
valid options.
It would be completely unfair to describe cluehunting as a totally
original concept–it stands, if you will, on the shoulders of giants.
Tab-completion is the oldie, and as far as I know originated with the
Unix shell tcsh, though it’s also a hidden option in the NT command
shell. This technology is quite file-system specific: Enter as much
as you know about a path, starting with the root, and tab complete
will expand what you type to fit. For example–enter /usr/home/eff
and hit tab, and you will be given the first entry in /usr/home/ that
begins with “eff”. Some limited regular expressions are allowed–for
example, if I’m in the directory /usr/home/effugas and type unzip
*.zip, I will be able to tab through each zip file in my home
directory. Very slick.
Tab Completion is nice, but it has it’s flaws. First of all, tab has
become the de facto standard for “advance to next field” in GUIs, and
there’s no way I want to get rid of one of the best keyboard
timesavers in existence. Secondly, it searches files and only files.
There are other search contexts that should be hit. Finally,
tab-complete provides no way to expand into anything but a single
entry–what if the didn’t want just one of the group, what if the user
wanted to expand into all entries that fit the given form? In other
words, instead of just one zip, all of *.zip was inserted? Would be
logical in a number of situations.
Tab Complete’s newborn sibling, Autocomplete, was a web browser
innovation that began at the much maligned UI shop known as Microsoft
and was later adopted by Netscape for its Communicator browser. (To
be as fair as possible, the emacs editor includes substantial
autocomplete facilities. I am referring here to the fact that this
was the first implementation of autocomplete for ordinary users, and
as far as I know was the first implementation among the thousands of
Windows apps over the last few years.) As Microsoft integrated
Internet Explorer and Windows Explorer, both the Run Dialog and the
Web Open Dialog possess Autocomplete functionality. (Actually,
Microsoft Word will also Autocomplete anything you type that is
related to a few known categories, i.e. date, author name, etc., but
I’ll deal with this later.) So what does this bring to the table?
Well, we see the beginnings of clue contexts showing up here, since at
first glance it appears that the run menu will autocomplete files and
the web browser will autocomplete web sites. But these are both
searches of the same clue context–the history context, in which
things that have been typed before are called back to be expanded back
into reality. And how does Autocomplete expand entries? In the
middle of typing, inverted text will appear containing the contents of
what the computer is guessing the user is trying to get at. This text
will only appear to the next valid level–http:// will expand to
http://www.best.com, but it will not expand to
http://www.best.com/~effugas nor
http://www.best.com/~effugas/Personal/SILC/silc.html. There’s no way
to really scroll through possible entries in this history-based
autocomplete–the first thing that matches will be matched to its
first level, and that’s all you get.(Ed Note: Holding shift and arrow
down lets you scroll through possible autocompletes on Netscape.)
Worse, sometimes a delay in typing is required to simply trigger an
autocomplete. Still, this functionality is total joy, even with all
of its warts.
What´s New In Cluehunting
Cluehunting specifies the following advancements beyond present-day
expansion technology:
- Universal Expansions
- Inputstream Aware Expansion Styles
- Application-Dependant Clue Contexts
- Clue Context Overrides
- Pluggable Context Servers
- Regular Expressions
- Batch Expansions
- Cluelists
- General Accessability
Definitions help, of course:
- Univeral Expansions: Expansion should be available in all
interface components. The primary limitation of present expansion
methods is that can’t really be available everywhere. Cluehunting
is designed to allow every interface construct to read the
intentions of the user. It is the purpose of the next nine points
to make sure that this works, and works well. - Inputstream Aware Expansion Styles: Segmented streams of input
data ought to implement commanded expansion, while unified
inputstreams may take advantage of automatic expansion. A little
background is going to be necessary to understand this. First,
You can’t outclass something you can’t recognize the class of.
That being said, lets talk about Microsoft’s UI department. Take
Microsoft Word 95/97. Red and green spelling and grammar warning
underlines are excellent interface components. They’re
unobtrusive enough to ignore in the heat of thought, yet available
enough to make it difficult to miss misspelled or inappropriate
words. I miss them any time I type in anything else. They
enhance the feedback loop of the inputstream. The inputstream is
defined as the flow of commands from the user to the computer as
well as any information fed back along the same channels as the
input–for example, a clock in the lower right hand corner is not
part of the inputstream, but the characters that pop up in
response to the corresponding character being pressed on the
keyboard is. What does not work in Word, however, is
Autocomplete. When I type Dan, I’m not always talking about
myself, and when I type August, I’m not always talking about the
present date. I don’t want to have to interrupt my stream of
thought to correct Word–my concepts are segmented into words from
sentences, paragraphs, and full documents. This contrasts sharply
with the very appropriate and useful usage of autocomplete for web
sites, which have addresses that are single-phrase and thus
unified. Therefore, while Word, and any other segmented
inputstream receivers ought to require a key to be pressed before
the phrase is expanded(though a graphical hint like a different
cursor would help), Netscape should attempt to expand
automatically. NOTE: Research is required to make sure this
inconsistency does not overly confuse users. It is very possible
that automatically triggering an expansion in unified instances
but delaying expansions in segmented cases is utterly confusing to
users. In this case, I’d lean towards an completely delayed
expansion interface. - Application Dependant Clue Contexts: Applications should search
multiple clue contexts appropriate to the active application
context. Strange words coming from someone who worships
consistency in user interfaces, but I really think this is
necessary. Applications generate context, and all clues should
not expand from some single chosen source. For example: Suppose
I enter the word “liffe” into a word processor. The ideal word
processor would notify the user immediately and non-intrusively
that the word was mispelled. Obviously, the appropriate clue
context for a misspelled word is to search through alternative
correct spellings. Multiple presses of the Continue Cluehunt
keybinding would search through multiple alternative spellings,
until the user chose to press either the Cancel Cluehunt
keybinding(probably Escape) to revert to the misspelled form or to
press the Cluehunt Successful keybinding(probably Enter). The
user could, of course, reselect the correctly spelled word, and
this time search through the default context for a correctly
spelled word: the thesaurus. So, life would be replaced with
various synonyms–or, the thesaurus dialog could come up to
provide a multidimensional search between life-as-vocation,
life-as-socialness, or life-as-complete-lack-thereof. All that
cluehunting specifies is a precondition and a
postcondition–dialogs do not violate this. It would be
preferable if these weren’t modal dialogs, however–it is rarely
appropriate for the user to be locked out of his or her document. - Pluggable Context Indexes: Clue contexts, either attached to an
application or independant, should register themself with a
central index. This index of clue contexts would be categorized
either by type or by owner application, would have MRU(most
recently used) lists, and would be reconfigurable by the user. - Clue Context Overrides: The user should be able to specify a
specific clue context to expand from, in either a proactive manner
or a reactive manner. Despite the fact that applications often
have context that make sense, there are times when the user has
another context in mind. For example, the user should be able to
access the Thesaurus context while saving a file, or the
filesystem context while documenting an application, or the web
history context while creating a web page of links. This would be
implemented with a Set Clue Context keybinding which would modify
the present word’s clue context–a reactive override. If the user
had not yet typed a word, the next word would be the recipient of
the entered context–this would be a proactive override. Contexts
would be registered upon install as per the plug-in clue context
interface, and manipulatable via a replacable dialogs. Most
probably, some degree of categorization would be appropriate, as
well as expansion on the clue context type itself. (In other
words, a box would be given, and you’d type in Th and Thesaurus
might come up). Of course, common clue contexts should be
automatically recognized. A user typing in a path in any
application, for example, should usually first trigger the file
system history context, and then the literal file system search
context. Similar results should await a user typing http://.
However, there is an advantage to being able to select a context.
By selecting the Execute Command context, the user could load any
app directly from within any other app and have the stdout reply
be pasted at the cursor. Much like ircii’s /exec command, this
would allow the contents of, say, an ls to be directly pasted at
the cursor. Quite nice. - Regular Expressions: Regular Expressions should be available for
usage in clue expansions. Many users are familiar with using * to
signify a wildcard. While the default expansion would, in
general, presume a * at the end of the provided clue and expand
from there, there is no reason this is necessary. A user
searching for dictionary words that end with “sort” should be able
to expand *sort into resort, consort, and plain old sort. The
only problem–how to differentiate between a clue containing a
regex for search purposes(execute context for ls -l *.gz) versus a
clue that wants its regex expanded before search(command history
context for ls -l *.gz). It’s quite probable that most contexts
will only fit one or the other, but I’m unsure. Email me if you
think that a specific “begin regex” keybinding would be necessary. - Batch Expansions: All entries that fit the provided clue should
be available for simultaneous expansion. Through an “expand all”
keybinding, the contents of all clues that fit the given context
should be pasted at the cursor. This facilitates things such as
“gunzip *.gz” being expanded into a list of all files to be
gunzipped, allowing the user to make sure the shell was expanding
the list correctly, among other uses. - Cluelists: All entries that fit the provided clue should be
listable in a multiselectable sortable dialog. In same ways, a
basic version of this is part of Microsoft Word 97: Right click
on a misspelled word and note the four or five alternate correct
spellings right there in front of you. Most GUI web browsers also
allow you to search the typed-in history by clicking on the down
arrow at the far right of the entry bar. Cluelists extend this
behavior by allowing the user a listmode or detailsmode(more
windowspeak, so shoot me) interface to select between multiple
options for expansion. Suppose the user wants to gunzip a couple
of his or her .gz files. Simply typing gunzip *.gz inside of a
cluehunt-enabled xterm and pressing the “cluelist” keybinding
would generate a window containing a list of all files ending in
“.gz”. Then, the user would control-click or shift-click the
specific gzipped files desired to be expanded, press OK, and hit
enter to cause those files to be gunzipped. - General Accessibility: All capabilities of cluehunting must be
accessible by mouse as well as by keyboard. It is critical that
Cluehunting be part of a self-documenting interface, defined as an
interface that bolsters the user’s understanding and mapping of
available options. One major way to make an interface
self-documenting is to provide multiple paths to the same
destination that reference eachother. Right-clicking on a batch
of text should either bring up a single menu item containing
“cluehunt” or a list of all the cluehunting options directly in
the root right-click–research will be necessary to see which is
preferable. Now, of course, each entry in the right-click menu
would contain the keyboard shortcut right-justified, and the
corresponding shortcut would be listed in the keybox(dev-note:
Will be explained in upcoming proposal). Pretty slick.
Default Cluehunting Keybindings
(Editor´s Note: I have some association with the GNOME project, which
hopefully will end up creating a world class User Interface for Linux
and other Unix systems. Nothing official, anymore.)
Well, I’ll be blunt: We’re still working on a default keyspace for
GNOME compliant apps. However, the following are a preliminary set of
keybindings for cluehunting:
+ Cluehunt Forwards: Alt-Shift-Right Arrow + Cluehunt Backwards: Alt-Shift-Left Arrow + Accept Cluehunt: Anything that moves the cursor. Enter has its functionality modified to not clear the contents of the expansion. + Reject Cluehunt: Esc + Expand All: Alt-Shift-Enter + Scroll Through Cluelist: Alt-Shift-Up and Down.
The Future Of Cluehunting
Cluehunting is a developed proposal, but it’s still in development.
Research will be needed to check for areas of confusion and
functionality. Still to be determined:
- How to notify the user that the existing text is
expandable via a cluehunt? Different cursors, different
text colors, a note in the title bar…? - How to implement cluehunting? One possible way is to
simply have a directory structure that corresponds to
individual clue contexts and contains standard
stdin/stdout apps that take in the appropriate segment
and spit out a return value. Implementation isn’t that
much of an issue, though–possibility is more relevant
than methodology.