Imagery | Dan Kaminsky's Blog

UI Theory: Speech Vs. Vision

December 16, 2000 Dan Kaminsky Leave a comment

LAW:
We talk faster than we write,
and we read faster than we listen.

COROLLARY:
We shout out known commands faster than we can click them,
but we visually absorb available commands faster than we can hear them.

CAVEAT:
We can write with more detail than we can speak.
We can listen with more comprehension than we can read.

EDUCATIONAL EXAMPLE:
Go tell some random person to do something on a computer.
Don’t do it yourself.
Don’t let them know in advance what you want them to do.
Notice how fast you realize they’re screwing up.
Notice how slow it is for you to correct them.
Notice how slow it is for them to absorb your corrections.
Notice how fast you sit down and do it yourself.

CONCLUSION:
The ideal interface will likely have us speaking at computers,
with them shooting responses back as fast as the eye can absorb.

Categories: Imagery

Analogous Key Arrays: A Proposal Regarding Relating Keyboard Hotkeys With Screen Position

January 20, 1999 Dan Kaminsky Leave a comment

Analogous Key Arrays, or AKA´s, are command arrays with keyboard
hotkeys that correspond to the physical location of the onscreen
commands. For example, given a 3×3 array(the standard size for an
AKA) of commands on screen, a 3×3 array on the keyboard should be
chosen. Consider the following illustration, created from the DoxSTAR
modification to the popular game Starcraft by Blizzard Entertainment.

3×3 Command Box from Starcraft with corresponding hotkeys overlaid
for this illustration.

It should be noted quickly that AKA´s do not document the effects of a
given command, nor do they require any graphical modification to
command icons. (The graphical modifications here are for illustration
purposes only.) Position alone specifies a command array as an AKA.
This, of course, means the user is dependant upon some other means of
determining that an AKA is being used–tooltips are the ideal means
here, and were chosen for this function in DoxSTAR.
AKA´s are quite powerful, but have a number of caveats to their
usage. They have, I think, popped up in small numbers throughout the
history of GUI´s. This document is an attempt to codify their
properties. I shall attempt to explain their advantages and
disadvantages with candor; this page shall be updated if additional
observations come with time. Peer review is a good thing.

Theories supporting the AKA

Much of the science that supports Pie Menus(see also this Mactech
article) applies to Analogous Key Arrays. Pie Menus, unlike linear
menus, place commands in a circle around a central point, thus making
all commands equidistant from eachother. There´s some decent research
showing this is a significantly quicker method of making commands
accessible. The link applies in that since most multidimensional
command arrays form pie-menu like structures anyway, a positional
keyboard equivalent of those structures should gain the positive
effects noted in much of the research for Pie Menus.

In fact, AKA´s should hold a substantial speed advantage over Pie
Menus due to the intrinsically faster nature of key entry versus
coordinate based mouse clicks. Provided a modicum of famliarity with
the keyboard´s layout, it is faster to move one´s fingers to the
appropriate key and press it than to move the mouse to a specific
onscreen location and click. This does not, however, mean it is
faster to execute a command via keyboard than via mouse–that´s just
not true. Mouse access is much easier in general, because “Open File”
is significantly more descriptive than “Ctrl+O”. In other words,
mouse-driven actions use the screen to self-document themselves,
whereas keyboard driven actions usually require shortcuts to be
memorized.

AKA´s are therefore designed to address this state of affairs. The
first thing they do is maximize the intrinisic advantage of the
keyboard. With all keys in the AKA being in close proximity, travel
time to execute a given command is significantly reduced–Fitts´ Law
at work(side note: This link extends interestingly on Fitts´ Law, as
well as explains it). The tight proximity also aids non-touch
typists–AKA´s take the form of either positionally relative
arrays(key is two left from other command) or, especially in the 3×3
case, positionally cardinal arrays(key is in the upper right hand
corner). Even the touch typer benefits from the tight array, since
the hand position doesn´t have to shift to contact the heavily used
keys in the AKA. The intrinsic speediness of the keyboard is extended
to its maximum capacity.

The second action of the Analogous Key Array is to make keyboard
hotkeys easily memorable. Loose Prime Character hotkey naming(the
system in which the hotkey usually corresponds to the first letter of
the command) makes the assumption that all commands need to be
accessable at all times, and thus each command needs a separate
hotkey. In many situations(not all!!!) this is not the case. Often,
an application is in a specific mode with a limited number of
commands. Consider the case of Starcraft. All commands for all units
reduce to 3×3 arrays–and these arrays show their available options at
any time the user might wish to execute commands. Only the slightest
glance at the onscreen array is necessary to deduce the appropriate
hotkey for any given command. This is a significant improvement over
either guesswork or tooltip-checking that the LPC method enforces.
Additionally, there is no extra labor required to memorize commands
for a new unit–once any unit is learned, every unit is learned. In
DoxSTAR-modified Starcraft, the mouse is allowed to roam freely around
the battlefield doing what it does best–pixel level selection–while
the keyboard is designated to trigger commands.

In conclusion, the Analogous Key Array applies instinctual properties
such as relative position and cardinal direction to keyboard hotkey
assignment by relating onscreen location with keyboard location.
Fitts´ Law and ergonomic efficiencies mean this system is
significantly faster that competing assignment methodologies, and the
consistent and efficient method of observing screen position to derive
command hotkey significantly reduces short term memory load.

Caveats and observations regarding the usage of the AKA

This is an attempt at a semi-scientific paper, and while the concept
of creating a new(?) command methodology is appealing to my ego, the
eventual destruction by the peer-reviewing public would rip said ego
to micron-thin shreds. Therefore, the following are caveats and
observations regarding the usage of Analogous Key Arrays.

AKA´s are biased towards 3×3 arrays with 8 or 9 options.
3×3 arrays are superior, because they tie directly in with the
cardinal directions(up, down, down-right) we understand
instinctually. With the center point optional, this means eight
or nine options are available. This leads to a bit of weirdness
for the user if only a few of these boxes are filled. This
shouldn´t be too much of a problem though– DoxSTAR-enabled
Starcraft has plenty of incomplete arrays.
With regards to what alternate array dimensions can be used, here
are a few observations:
It is better to be wider than to be taller.
Consider a building–it´s easier to access additional rooms on the
same floor than to take the stairs to additional floors. This
applies to keys–it´s easier to hit keys that are on the same row
rather than to constantly change rows. Just look at your
hands–how many rows of fingers do you have?
Work your way left to right, then down.
The farther down a key is, the more likely a user is to curl his
or her finger. Finger curling is more work than finger shifting.
It is arguable that when localizing your product for areas in
which the language is read in a different order then
Left-Right-Down, you should change the order. This is because of
the visual system´s seek patterns.
Avoid AKA´s with a width of 4.
Single width AKA´s are suboptimal but may be necessary for real
estate purposes. Double width AKA´s fall into the left/right
classification. Triple width AKA´s form left/middle/right.
Quintuple width AKA´s form Likert style Strongly
Left/Left/Middle/Right/Strongly Right scales. But Quadruple width
AKAs lack a true middle, making them harder to visually convert
from on-screen image to a keyboard equivalent.
If exceeding a width of five, you may need to mark your icons, or
split the AKA.

Supposing your interface consists of a 10×1 AKA that pulls down
from each number(1 -> 0) a 3×3 QWE Analogous Key Array. The 3×3
can remain unmarked(leave it to the tooltips to document their
behavior), but it´s going to be quite difficult for a user to
immediately pick which of, say, ten entries a random one near the
middle is. Splitting the 10×1 into two 5×1´s is an alternative,
for instance.
Page, don´t scroll.
Scrolls went out of fashion a few thousand years ago for a reason.
AKA´s that scroll lose consistency–at one moment, Up(Q) might be
a given command, at the next, Middle(S) could be. Scrolling does
prevent Page Orphans(A situation that occurs when just one or two
objects overflows onto the next page), but this is far superior to
the alternate scenario.
AKA´s encourage modality in interface design

Since multiple usage contexts share the same hotkeys, these contexts
must be caused not to collide somehow. It is difficult to predict
what commands users may wish to execute in the same context, and
requiring constant context switches gets tiring. A couple notes on
this:
Most multicontext apps require modality anyway
Even a swiss army knife requires the desired tool to be pulled
out.
Many of the situations in which the AKA will be used will already
have established modality anyway. More research is definitely
required to learn just how to switch context appropriately.
AKA´s improve the quality of modal interfaces
In Starcraft, build units have a command that opens up another
command array of units to build. As soon as the build units
command is triggered, a new, automatically hotkey-documented
command array replaces it. No new learning is necessary.
AKA´s use up much more screen real estate than LPC´s
One advantage of the Loose Prime Character system is that the user
only needs to remember the name of the command being attempted. In
order for the user to remember the position, the user must actually be
able to see the position. This means that AKA´s don´t fit well with
linear menus(File, Edit, etc.), and also require onscreen real estate
to establish the relative/directional positions. A few observations:
The Rule of Sides severely affects AKA´s.
Kaminsky´s Rule of Side Pollution basically says that anything
placed along a side automatically pollutes the rest of that side
for objects larger in the dimension being polluted. (I shall
withdraw this rule if somebody else has already stated it in
similar language.) Objects at a corner pollute one of the two
sides they contact.. It may be unclear what this means. Suppose
in a 1024×768 screen, a 100×100 clock exists in the lower right
hand corner. Now suppose we have our “main window”, a web
browser, that we´re trying to make as large as possible. We have
three options:
1. Make the Netscape window 1024×668
2. Make the Netscape window 924×768
3. Make the Netscape window 1024×768, but accept that the lower
  100×100 corner of the window will remain occluded.
  This is what I refer to as Side Pollution, and is a significant
  problem in many GUI´s. The reason this bites AKA´s so severely is
  that, because AKA´s operate by associating position, not name,
  there is an additional demand for screen space since position of a
  command must be established. It´s still worth it, since most
  keyboard shortcuts will never, ever be used, but it´s still not
  that simple. For one thing, the 3×3 array that AKA´s work best
  with pollutes more space than a pulldown menu or a 10×1 taskbar.
  More research is needed, but most applications should find that
  two 5×1 taskbars split across the main number area provides the
  best use of screen real estate. The reason the 10×1 or the two
  5×1 taskbars work whereas various 3×3 arrays don´t is because a
  corrolary of the Rule of Sides shows that Side Pollution isn´t
  cumulative; two 100 pixel polluters side-by-side along the bottom
  of the screen will still only pollute 100 pixels on the bottom.
  This is of course the justification behind the Win95 Taskbar–as
  long as all of the small boxes are organized along the bottom,
  their collective value outweighs the amount of space they take up.
  Of course, games have much more leeway since they have much more
  real estate to play around with.
AKA´s can open up other, non identical AKA´si
This has been alluded to in earlier sections, but just to make it
explicit–just as clicking on a certain item can create a
“pulldown/popup” window, so can pressing the associated key bring
up a pulldown/popup AKA, replete with totally different keyboard
shortcuts. Just make sure your documentation method(tooltips in
DoxSTAR) explains what´s going on.

Keyspace is a serious issue, especially for applications
Back when I was part of the GNOME project, I made a call for a
definition of the GNOME keyspace. The keyspace is, essentially, the
breakdown of which keys do what functions in which contexts. As the
mouse has to deal with screen real estate, the keyboard has to cope
with a limited keyspace. Issues arise quite easily–while Starcraft
can be modified quite easily to have C trigger Cancel, users of a word
processing application are going to be quite peeved to find that C no
longer types the letter. In the application keyspace, normal keys
just type out their counterparts on screen–it´s when Control and
Control-Alt is added that keystrokes become commands, at least in the
application keyspace. Worse, there are certain keystrokes in the app
keyspace that seriously conflict with the QWE AKA–Control-C in
Windows must always equal Copy. Windows has also appropriated the
entire Alt keyspace to menu shortcuts–arguable for reducing
confusion, but painful when a new class of hotkeys shows up. A couple
observations on how to deal with this:

Applications should use the Control-# and the Control #Pad
Keyspace
There are some issues with this–namely, if additional contexts
are called via the two 5×1 arrays that Control 1-5 and Control 6-0
provide, it´s going to be harder for the user to trigger the #789
set than to trigger the QWE set. Also, using the #Pad keyspace
could lead to some interesting non-uniform(but self-documenting!)
shortcut sets. It´s much easier, by the way, for most
applications to justify using the number pad since users are more
used to shifting between the mouse and the keyboard while typing.
While it is true that the whole idea of using keyboard shortcuts
is to avoid having to move one´s hand to the mouse, it should be
noted that the amount of mental processing, let alone time
necessary to execute a command via a mouse far outweighs the
amount of time necessary to hit a “far away” key and return. Apps
where speed is critical or in which this many keyboard commands
could not exist can rely on the Control-# keyspace anyway.
Mouse-Intensive Programs should never use the #Pad as their
primary AKA. Rather, they should use the QWE Analogous Key Array.
Most applications will not fall into this category. Games are a
different story–pretty much any game where a mouse is used
moderately heavily should not be programmed with a #pad AKA.
Here´s why–if the program is designed to have the user leave his
right hand on the mouse, his left hand is the one at the controls
of the AKA. The #Pad, however, is at the right side of the
keyboard. To use this setup, either the user moves his or her
keyboard a foot to the left, allowing both hands to be placed in
the center right in front of the monitor, or the user shifts his
left hand all the way to to the right side of the keyboard, which
isn´t exactly the most ergonomic position. It is much better to
have the right hand stay on the right side and the left hand stay
on the left–plus, Control, Alt, and at least half of the 1-0
number row are within easy reach of the left hand in the QWE
layout.
The Operating/Windowing System should use Control-Alt for
launching other software
Windows allows shortcuts to have arbitrary keyboard hotkeys.
(Windows is most probably the most keyboard enabled GUI in
existence–this coming from an ardent Linux fan) Unfortunately
they don´t always work perfectly or universally. This is a
shame–for applications that are constantly run, keyboard
shortcuts are a wonderful way to boot. Perhaps the items in the
Windows Quick Launch bar should automatically be bound to
Control-Alt-#? The only issue with the Control-Alt keyspace is
that Photoshop uses it to reasonably good effect, but then only
with mouse commands.
Customization can be good.
For applications or games with large or expanding command options,
an AKA that can be easily customized by the user should be quite
useful to users who are otherwise overburdened.

Multiple AKA´s pose issues
If there is only one AKA, and it´s on the right of the screen but the
left of the keyboard, it´s somewhat bothersome but not that big of a
deal. If there are two AKA´s in simultaneous use, and they´re
sideswapped in the same manner, the hotkeys are unusable–the user
will repetitively choose the wrong hotkey. Due to the increased
horizontal range of the keyboard, horizontal conflicts are more
common, but the point holds in the vertical dimension as well–if
hotkeys are misaligned vertically, the user may have difficulties.
However, it´s probably not too troublesome if one AKA is at the top
left of the screen and the other is at the lower right, yet both are
on the same horizontal plane.

Watch out for custom keyboards!
AKA´s operate on position–what if the user´s key position isn´t the
same as the developer´s? Uh oh is right. A good portion of the
population has Microsoft Natural Keyboards or a knockoff thereof.
Using a custom AKA that spans the split in the middle of these boards
is not only foolhardy(since even normal boards lack a hard, vertical
line for the hand to align itself against) but also completely
unusable for these users. Furthermore, if you expect any large number
of users to use your layout, ship with a Dvorak configuration–there
aren´t many Dvorak users out there, but they´re not worth
ignoring–they´re arguably using a better system, and they´re quite
capable of arguing it publically 🙂 Finally, if at all possible(and
this holds true even if you aren´t writing a program that uses AKAs),
let your keyboard configuration be configurable. Please.

Don´t ignore left handed people.
Left handers have been obscenely ignored by joystick manufacturers for
the past few years–it´s probably not a nice thing to embed this
subtle dis in your code. Most left handers will work fine with the
right-handed layout, but a number of them won´t, so you may want to
ship either with a method to reconfigure the AKA hotkeys or just with
a swapper to move QWE to #789.

Miscellaneous Notes

I think I basically covered most of the issues you will have to
address when choosing if and how to implement Analogous Key Arrays.
If you feel there is anything I missed, please email me.

Categories: Imagery

Cluehunting: A Proposal Regarding The Intelligent Use of Available Data In The User Interface

August 1, 1998 Dan Kaminsky Leave a comment

Where Cluehunting Comes From

(Editor´s Note: I have received an impressive and incredible amount
of email regarding cluehunting, and I thank everybody who mailed me.
Much of the text here needs to be rewritten to accommodate the lucid
and honestly surprising quantity and quality of research people put
into advancing this proposal. Some stuff regarding the true history
of cluehunting does need to be modified.. Bear with me.)
Cluehunting is an advanced Expansion Agent, defined as a system that
allows the computer to search possible “expansions” throughout given
contexts given a “clue” by the user. Clues are defined as segments of
data(type irrelevant) that the computer would be able to utilize to
predict the final contents of the user’s intention. Expansions are
the presumed intentions of the user. Finally, contexts are the
“search space” that is being scanned–the file system context, the
launcher context, or even a thesaurus/spell check context are all
valid options.

It would be completely unfair to describe cluehunting as a totally
original concept–it stands, if you will, on the shoulders of giants.
Tab-completion is the oldie, and as far as I know originated with the
Unix shell tcsh, though it’s also a hidden option in the NT command
shell. This technology is quite file-system specific: Enter as much
as you know about a path, starting with the root, and tab complete
will expand what you type to fit. For example–enter /usr/home/eff
and hit tab, and you will be given the first entry in /usr/home/ that
begins with “eff”. Some limited regular expressions are allowed–for
example, if I’m in the directory /usr/home/effugas and type unzip
*.zip, I will be able to tab through each zip file in my home
directory. Very slick.

Tab Completion is nice, but it has it’s flaws. First of all, tab has
become the de facto standard for “advance to next field” in GUIs, and
there’s no way I want to get rid of one of the best keyboard
timesavers in existence. Secondly, it searches files and only files.
There are other search contexts that should be hit. Finally,
tab-complete provides no way to expand into anything but a single
entry–what if the didn’t want just one of the group, what if the user
wanted to expand into all entries that fit the given form? In other
words, instead of just one zip, all of *.zip was inserted? Would be
logical in a number of situations.

Tab Complete’s newborn sibling, Autocomplete, was a web browser
innovation that began at the much maligned UI shop known as Microsoft
and was later adopted by Netscape for its Communicator browser. (To
be as fair as possible, the emacs editor includes substantial
autocomplete facilities. I am referring here to the fact that this
was the first implementation of autocomplete for ordinary users, and
as far as I know was the first implementation among the thousands of
Windows apps over the last few years.) As Microsoft integrated
Internet Explorer and Windows Explorer, both the Run Dialog and the
Web Open Dialog possess Autocomplete functionality. (Actually,
Microsoft Word will also Autocomplete anything you type that is
related to a few known categories, i.e. date, author name, etc., but
I’ll deal with this later.) So what does this bring to the table?
Well, we see the beginnings of clue contexts showing up here, since at
first glance it appears that the run menu will autocomplete files and
the web browser will autocomplete web sites. But these are both
searches of the same clue context–the history context, in which
things that have been typed before are called back to be expanded back
into reality. And how does Autocomplete expand entries? In the
middle of typing, inverted text will appear containing the contents of
what the computer is guessing the user is trying to get at. This text
will only appear to the next valid level–http:// will expand to
http://www.best.com, but it will not expand to
http://www.best.com/~effugas nor
http://www.best.com/~effugas/Personal/SILC/silc.html. There’s no way
to really scroll through possible entries in this history-based
autocomplete–the first thing that matches will be matched to its
first level, and that’s all you get.(Ed Note: Holding shift and arrow
down lets you scroll through possible autocompletes on Netscape.)
Worse, sometimes a delay in typing is required to simply trigger an
autocomplete. Still, this functionality is total joy, even with all
of its warts.

What´s New In Cluehunting

Cluehunting specifies the following advancements beyond present-day
expansion technology:

Universal Expansions
Inputstream Aware Expansion Styles
Application-Dependant Clue Contexts
Clue Context Overrides
Pluggable Context Servers
Regular Expressions
Batch Expansions
Cluelists
General Accessability

Definitions help, of course:

Univeral Expansions: Expansion should be available in all
interface components. The primary limitation of present expansion
methods is that can’t really be available everywhere. Cluehunting
is designed to allow every interface construct to read the
intentions of the user. It is the purpose of the next nine points
to make sure that this works, and works well.
Inputstream Aware Expansion Styles: Segmented streams of input
data ought to implement commanded expansion, while unified
inputstreams may take advantage of automatic expansion. A little
background is going to be necessary to understand this. First,
You can’t outclass something you can’t recognize the class of.
That being said, lets talk about Microsoft’s UI department. Take
Microsoft Word 95/97. Red and green spelling and grammar warning
underlines are excellent interface components. They’re
unobtrusive enough to ignore in the heat of thought, yet available
enough to make it difficult to miss misspelled or inappropriate
words. I miss them any time I type in anything else. They
enhance the feedback loop of the inputstream. The inputstream is
defined as the flow of commands from the user to the computer as
well as any information fed back along the same channels as the
input–for example, a clock in the lower right hand corner is not
part of the inputstream, but the characters that pop up in
response to the corresponding character being pressed on the
keyboard is. What does not work in Word, however, is
Autocomplete. When I type Dan, I’m not always talking about
myself, and when I type August, I’m not always talking about the
present date. I don’t want to have to interrupt my stream of
thought to correct Word–my concepts are segmented into words from
sentences, paragraphs, and full documents. This contrasts sharply
with the very appropriate and useful usage of autocomplete for web
sites, which have addresses that are single-phrase and thus
unified. Therefore, while Word, and any other segmented
inputstream receivers ought to require a key to be pressed before
the phrase is expanded(though a graphical hint like a different
cursor would help), Netscape should attempt to expand
automatically. NOTE: Research is required to make sure this
inconsistency does not overly confuse users. It is very possible
that automatically triggering an expansion in unified instances
but delaying expansions in segmented cases is utterly confusing to
users. In this case, I’d lean towards an completely delayed
expansion interface.
Application Dependant Clue Contexts: Applications should search
multiple clue contexts appropriate to the active application
context. Strange words coming from someone who worships
consistency in user interfaces, but I really think this is
necessary. Applications generate context, and all clues should
not expand from some single chosen source. For example: Suppose
I enter the word “liffe” into a word processor. The ideal word
processor would notify the user immediately and non-intrusively
that the word was mispelled. Obviously, the appropriate clue
context for a misspelled word is to search through alternative
correct spellings. Multiple presses of the Continue Cluehunt
keybinding would search through multiple alternative spellings,
until the user chose to press either the Cancel Cluehunt
keybinding(probably Escape) to revert to the misspelled form or to
press the Cluehunt Successful keybinding(probably Enter). The
user could, of course, reselect the correctly spelled word, and
this time search through the default context for a correctly
spelled word: the thesaurus. So, life would be replaced with
various synonyms–or, the thesaurus dialog could come up to
provide a multidimensional search between life-as-vocation,
life-as-socialness, or life-as-complete-lack-thereof. All that
cluehunting specifies is a precondition and a
postcondition–dialogs do not violate this. It would be
preferable if these weren’t modal dialogs, however–it is rarely
appropriate for the user to be locked out of his or her document.
Pluggable Context Indexes: Clue contexts, either attached to an
application or independant, should register themself with a
central index. This index of clue contexts would be categorized
either by type or by owner application, would have MRU(most
recently used) lists, and would be reconfigurable by the user.
Clue Context Overrides: The user should be able to specify a
specific clue context to expand from, in either a proactive manner
or a reactive manner. Despite the fact that applications often
have context that make sense, there are times when the user has
another context in mind. For example, the user should be able to
access the Thesaurus context while saving a file, or the
filesystem context while documenting an application, or the web
history context while creating a web page of links. This would be
implemented with a Set Clue Context keybinding which would modify
the present word’s clue context–a reactive override. If the user
had not yet typed a word, the next word would be the recipient of
the entered context–this would be a proactive override. Contexts
would be registered upon install as per the plug-in clue context
interface, and manipulatable via a replacable dialogs. Most
probably, some degree of categorization would be appropriate, as
well as expansion on the clue context type itself. (In other
words, a box would be given, and you’d type in Th and Thesaurus
might come up). Of course, common clue contexts should be
automatically recognized. A user typing in a path in any
application, for example, should usually first trigger the file
system history context, and then the literal file system search
context. Similar results should await a user typing http://.
However, there is an advantage to being able to select a context.
By selecting the Execute Command context, the user could load any
app directly from within any other app and have the stdout reply
be pasted at the cursor. Much like ircii’s /exec command, this
would allow the contents of, say, an ls to be directly pasted at
the cursor. Quite nice.
Regular Expressions: Regular Expressions should be available for
usage in clue expansions. Many users are familiar with using * to
signify a wildcard. While the default expansion would, in
general, presume a * at the end of the provided clue and expand
from there, there is no reason this is necessary. A user
searching for dictionary words that end with “sort” should be able
to expand *sort into resort, consort, and plain old sort. The
only problem–how to differentiate between a clue containing a
regex for search purposes(execute context for ls -l *.gz) versus a
clue that wants its regex expanded before search(command history
context for ls -l *.gz). It’s quite probable that most contexts
will only fit one or the other, but I’m unsure. Email me if you
think that a specific “begin regex” keybinding would be necessary.
Batch Expansions: All entries that fit the provided clue should
be available for simultaneous expansion. Through an “expand all”
keybinding, the contents of all clues that fit the given context
should be pasted at the cursor. This facilitates things such as
“gunzip *.gz” being expanded into a list of all files to be
gunzipped, allowing the user to make sure the shell was expanding
the list correctly, among other uses.
Cluelists: All entries that fit the provided clue should be
listable in a multiselectable sortable dialog. In same ways, a
basic version of this is part of Microsoft Word 97: Right click
on a misspelled word and note the four or five alternate correct
spellings right there in front of you. Most GUI web browsers also
allow you to search the typed-in history by clicking on the down
arrow at the far right of the entry bar. Cluelists extend this
behavior by allowing the user a listmode or detailsmode(more
windowspeak, so shoot me) interface to select between multiple
options for expansion. Suppose the user wants to gunzip a couple
of his or her .gz files. Simply typing gunzip *.gz inside of a
cluehunt-enabled xterm and pressing the “cluelist” keybinding
would generate a window containing a list of all files ending in
“.gz”. Then, the user would control-click or shift-click the
specific gzipped files desired to be expanded, press OK, and hit
enter to cause those files to be gunzipped.
General Accessibility: All capabilities of cluehunting must be
accessible by mouse as well as by keyboard. It is critical that
Cluehunting be part of a self-documenting interface, defined as an
interface that bolsters the user’s understanding and mapping of
available options. One major way to make an interface
self-documenting is to provide multiple paths to the same
destination that reference eachother. Right-clicking on a batch
of text should either bring up a single menu item containing
“cluehunt” or a list of all the cluehunting options directly in
the root right-click–research will be necessary to see which is
preferable. Now, of course, each entry in the right-click menu
would contain the keyboard shortcut right-justified, and the
corresponding shortcut would be listed in the keybox(dev-note:
Will be explained in upcoming proposal). Pretty slick.

Default Cluehunting Keybindings

(Editor´s Note: I have some association with the GNOME project, which
hopefully will end up creating a world class User Interface for Linux
and other Unix systems. Nothing official, anymore.)
Well, I’ll be blunt: We’re still working on a default keyspace for
GNOME compliant apps. However, the following are a preliminary set of
keybindings for cluehunting:

          + Cluehunt Forwards:  Alt-Shift-Right Arrow
          + Cluehunt Backwards: Alt-Shift-Left Arrow
          + Accept Cluehunt:  Anything that moves the cursor.  Enter has
            its functionality modified to not clear the contents of the
            expansion.
          + Reject Cluehunt:  Esc
          + Expand All:  Alt-Shift-Enter
          + Scroll Through Cluelist:  Alt-Shift-Up and Down.

The Future Of Cluehunting

Cluehunting is a developed proposal, but it’s still in development.
Research will be needed to check for areas of confusion and
functionality. Still to be determined:

How to notify the user that the existing text is
expandable via a cluehunt? Different cursors, different
text colors, a note in the title bar…?
How to implement cluehunting? One possible way is to
simply have a directory structure that corresponds to
individual clue contexts and contains standard
stdin/stdout apps that take in the appropriate segment
and spit out a return value. Implementation isn’t that
much of an issue, though–possibility is more relevant
than methodology.

Categories: Imagery

Newer Entries

Security Talks

2014

Yet Another Dan Kaminsky Talk: Hard Drive Operating Systems, Storage XOR Execution, Secure Random By Default, Cryptomnemonics, Ending Use After Free in Browsers, Fast Spoofed DDoS Tracing, NSA Crypto Fallout
Slides
2012

Black Ops: Practical System-Wide Timing Attack Defense, Real World Entropy Generation For Devices, Safe String Interpolation, Image Loads For Censorship Detection, Certificate Extraction w/ Flash Sockets, Stateless TCP Sockets
Slides
2011

Black Ops of TCP/IP 2011: Bitcoin Cloud Deanon/Data Embedding, External Interface UPNP, TCP SEQ# Attacks Revisted, Generic Password to Asymmetric Key Generation, Net Neutrality Validation
Slides
2010

Introducing The Domain Key Infrastructure:
Zero Configuration DNSSEC Serving, End-To-End Client Integration w/ UI Via OpenSSL and Secure Proxies, Federated OpenSSH, DNS over HTTP/X.509, Self-Securing URLs, Secure Scalable Email (Finally!)
Slides
Code (Phreebird Suite)
Black Hat USA Slides

Interpolique:
Where's The Safety in Type Safety?, Preventing Injection Attacks (XSS/SQL) With String Safety, Why Ease Of Use Matters, Automatic Query Parameterization, How LISP Was Right About Dynamic Scope, Dynamic DOM Manipulation For Secure Integration of Untrusted HTML
Slides Audio
Code

Realism in Web Defense:
Why Security Fails, What's Wrong With Session Management On The Web, The Failure Of Referrer Checking, Interpreter Suicide, Towards a Real Session Context, Treelocking, The Beginnings of Interpolique
Slides
2009

Staring Into The Abyss:
Middleware Fingerprinting, Firewall Rule Bypass, Internal Address Disclosure, Same Origin Attacks Against Proxied Hosts, TCP NAT2NAT via Active FTP And TCP Spoofing
Slides Paper

Black Ops Of PKI:
Structural Weaknesses of X.509, Architectural Advantages of DNSSEC, ASN.1 Confusion, Null Terminator Attacks Against Certificates
Slides Video
Financial Cryptography Paper
2008

It's The End Of The Cache As We Know It:
DNS Server+Client Cache Poisoning, Issues with SSL, Breaking “Forgot My Password” Systems, Attacking Autoupdaters and Unhardened Parsers, Rerouting Internal Traffic
Black Hat Slides
BH Fed Slides (Adds Drupal, DNSSEC)
Video Audio
"Illustrated Guide To The Kaminsky Bug"
Sarah on DNS

Ad Injection Gone Wild:
Subdomain NXDOMAIN injection for Universal Cross Site Scripting
Slides
2007

Design Reviewing The Web:
DNS Rebinding, VPN to the Browser, Provider Hostility Detection, Audio CAPTCHA Analysis
Slides Video
2006

Pattern Recognition:
Net Neutrality Violation Detection, Large Scale SSL Scanning, Securing Online Banking, Cryptomnemonics, Context Free Grammar Fuzzing, Security Dotplots
Slides
Weaponizing Noam Chomsky, or Hacking with Pattern Languages:
The Nymic Domain, XML Trees For Automatically Extracted Grammar, Syntax Highlighting for Compression Depth, Live Discovered Grammar Rendering, "CFG9000" Context Free Grammar Fuzzer, Dotplots for Format Identification and Fuzzer Guidance, Tilt Shift Dotplots, Visual Bindiff
Slides Video Code
2005:

Black Ops of TCP/IP 2005.5:
Worldwide DNS Scans, Temporal IDS Evasion, the Sony Rootkit, MD5 Conflation of Web Pages
Slides Video
2004:

MD5 To Be Considered Harmful Someday:
Applied Attacks Against Simple Collisions Via Malicious Appendage, Executable Confusion, Auditor Bypass, Bit Commitment Shirking, HMAC Implications, Collision Steganography, P2P Attacks Against Kazaa Hash
Slides Paper
Code (Confoo)
Code (Stripwire)

Black Ops of DNS:
Tunneling Audio, Video, and SSH over DNS
Slides Audio
Code (OzymanDNS 0.1)
Code (OzymanDNS 0.1 for Windows)
2003:

Stack Black Ops:
Generic ActiveX, SQL for Large Network Scans, Bandwidth Brokering, SSL for IDS’s
Slides Audio
Code (Paketto Keiretsu 2.00pre5)
2002:

Black Ops of TCP/IP:
High Speed Scanning, Parasitic Traceroute, TCP NAT2NAT
Slides Audio 1 Audio 2
Code (Paketto Keiretsu 1.01)
2001:

Gateway Cryptography:
SSH Dynamic Forwarding, Securing Meet-In-The-Middle, PPTP over SSH
Slides Audio
SSH Cheat Sheet

Dan Kaminsky's Blog

Archive

UI Theory: Speech Vs. Vision

Analogous Key Arrays: A Proposal Regarding Relating Keyboard Hotkeys With Screen Position

Cluehunting: A Proposal Regarding The Intelligent Use of Available Data In The User Interface

Email Subscription

Contact Information

Major Projects

Security Talks

Other Research

@dakami

Login