Ehret, Gray, & Kirschenbaum (2000)

Ehret, B. D., Gray, W. D., & Kirschenbaum, S. S. (2000). Scaled worlds and computational cognitive models as tools to increase the usefulness of cognitive task analysis.

Task analysis is not an end in itself but a tool to serve a further purpose. To be useful, the tool must be usable. We argue that building computational cognitive models increases the usability of task analysis. To be useful, the tool must also extract the features of the task that are key to the purposes of the analysis; i.e., the analysis should seek to identify these key features rather than enumerating all features of the task environment. Achieving this end may entail building a series of manageably complex systems, or scaled worlds, within which to study the task of interest. We discuss the use of computational cognitive modeling and scaled worlds for task analysis in the context of Project Nemo; an effort to inform the design of a command workstation for the next generation of nuclear attack submarines.


Altmann & Gray 1998

Altmann, E. M., & Gray, W. D. (1998). Pervasive episodic memory: Evidence from a control-of-attention paradigm. In M. A. Gernsbacher & S. J. Derry (Eds.), Twentieth Annual Conference of the Cognitive Science Society (pp. 42-47). Hillsdale, NJ: Erlbaum.

Events appear to be represented distinctly in memory in large numbers at a fine grain, even in tasks in which memory retention is not a primary performance measure. In Experiment 1, participants classified character strings in sequences governed by randomly-alternating instructions. Response times were fastest near the start of a sequence, slowed gradually throughout the sequence, then sped up again near the start of the next sequence. This speedup and gradual slowdown were modeled in the ACT-R architecture as a combination of priming and interference effects in episodic memory. The model correctly predicts the absence of these effects in Experiment 2, in which the instruction must be inferred from the trial stimulus and hence is not a source of priming. These findings suggest (a) that episodic encoding is a pervasive side effect of cognitive performance; (b) that elements of episodic memory interact through priming and interferenceæeffects traditionally associated with semantic memory; and (c) that brief interruptions of task performance have more complex effects than previously documented.


Gray & Salzman, 1998

Gray, W. D., & Salzman, M. C. (1998). Damaged merchandise? A review of experiments that compare usability evaluation methods. Human-Computer Interaction, 13(3), 203-261.

An interest in the design of interfaces has been a core topic for researchers and practitioners in the field of human-computer interaction (HCI); an interest in the design of experiments has not. To the extent that reliable and valid guidance for the former depends upon the results of the latter, it is necessary that researchers and practitioners understand how small features of an experimental design can cast large shadows over the results and conclusions that can be drawn. In this review we examine the design of five experiments that compared usability evaluation methods (UEMs). Each has had an important influence on HCI thought and practice. Unfortunately, our examination shows that small problems in the way these experiments were designed and conducted call into serious question what we thought we knew regarding the efficacy of various UEMs. If the influence of these experiments was trivial then such small problems could be safely ignored. Unfortunately, the outcomes of these experiments have been used to justify advice to practitioners regarding their choice of UEMs. Making such choices based upon misleading or erroneous claims can be detrimental--compromising the quality and integrity of the evaluation, incurring unnecessary costs, or undermining the practitioner's credibility within the design team. The experimental method is a potent vehicle that can help inform the choice of a UEM as well as help to address other HCI issues. However, to obtain the desired outcomes, close attention must be paid to experimental design.


Gray & Boehm-Davis, 1997

Gray, W. D., & Boehm-Davis, D. A. (1997, Sept). Cognitive analysis of dynamic performance: Cognitive process analysis and modeling. Workshop presented at the Human Factors and Ergonomics Society, Albuquerque, NM.

Many cognitive task analysis techniques provide static descriptions of the declarative knowledge possessed by domain experts. However, skilled performance is dynamic, not static. This workshop will describe a family of cognitive process techniques which allow the analyst to go beyond static descriptions of declarative knowledge to develop analytic models that capture the interactions between human cognition, the design of objects, and task performance. Specifically, the workshop will combine lecture with a hands-on approach to provide an overview of the GOMS family of cognitive process analysis techniques and the ways in which GOMS can be used to represent activities that occur in parallel. During this workshop, participants will be provided hands-on experience at developing some types of GOMS models and in using other types of models to quickly see and evaluate design alternatives. Participants are expected to be GOMS novices but to be experienced at (or, at least, exposed to) more traditional approaches to cognitive task analysis.


Gray, Young, & Kirchenbaum, 1997

Gray, W. D., Young, R. M., & Kirschenbaum, S. S. (1997). Introduction to this Special Issue on Cognitive Architectures and Human-Computer Interaction. Human-Computer Interaction, 12(4), 301-309.

This special issue has been assembled by editors and contributors who believe that cognitive architectures provide the most important new contribution to a theoretical basis for HCI (human-computer interaction) since the publication of The Psychology of Human-Computer Interaction (Card, Moran & Newell, 1983). In this introduction we provide a brief overview of what cognitive architectures are and why we find them exciting. We then introduce the four architectures represented by papers in this special issue.


Gray, Kirschenbaum, & Ehret, 1997

Gray, W. D., Kirschenbaum, S. S., & Ehret, B. D. (1997). The précis of Project Nemo, phase 1: Subgoaling and subschemas for submariners. In Proceedings of the Nineteenth Annual Conference of the Cognitive Science Society). Hillsdale, NJ: Erlbaum.

Project Nemo examines the cognitive processes and representational structures used by submarine Commanders while attempting to locate an enemy submarine hiding in deep water. This report provides a précis of the first phase of this effort. Protocol data, collected from commanders with 20 years of submarine experience, have been transcribed and analyzed. The data suggest a shallow goal structure with a basic level of subgoals that are used by all Commanders throughout the task. Relatively few operators are required for each subgoal. The results are congruent with a schema theory interpretation in which the process of schema instantiation provides the control of cognition.


Kirschenbaum, Gray, Ehret, & Miller, 1996

Kirschenbaum, S. S., Gray, W. D., Ehret, B. D., & Miller, S. L. (1996). When using the tool interferes with doing the task. In M. J. Tauber (Ed.), Conference companion of the ACM CHI'96 Conference Human Factors in Computing Systems (pp. 203-204). New York: ACM Press.

How much time the user spends working on a task versus fiddling with the tool is an important aspect of usability. The concept of the ratio and distribution of tool-only operations to total operations is proposed to capture this aspect.

download PDF file


Gray, 1995

Gray, W. D. (1995). VCR-as-paradigm: A study and taxonomy of errors in an interactive task. In K. Nordby, P. Helmersen, D. J. Gilmore, & S. A. Arnesen (Eds.), Human-Computer Interaction--Interact'95, (pp. 265-270). New York: Chapman & Hall.

The error-prone task of programming a VCR is representative of a growing number of end-user programmable devices. To help understand this task a display-based model of VCR programming was developed and implemented as a computational cognitive model. The model accurately predicts the vast majority of correct and error recovery keypresses collected from 9 subjects during 56 successfully programmed shows. The model supports two taxonomies for classifying errors; one at the keypress level, the other at the goal and subgoal level. The taxonomies have descriptive utility and are being used to understand the non-random nature, discovery, recovery of errors in programmable devices.


Gray, John, Stuart, Lawrence, & Atwood, 1995

Gray, W. D., John, B. E., Stuart, R., Lawrence, D., & Atwood, M. E. (1995). GOMS meets the phone company: Analytic modeling applied to real-world problems. In R. M. Baecker, J. Grudin, W. A. S. Buxton, & S. Greenberg (Eds.), Readings in human-computer interaction: Toward the year 2000, (Second ed., pp. 634-639). San Francisco: Morgan Kaufmann Publishers, Increase.

GOMS analyses were used to interpret some perplexing data from a field evaluation of two telephone operator workstations. The new workstation is ergonomically superior to the old and is preferred by all who have used it. Despite these advantages telephone operators who use the new workstation are not faster than those who use the old but are, in fact, significantly slower. This bewildering result makes sense when seen with the aid of GOMS. With GOMS we can see that very few of the eliminated key-strokes or ergonomic advantages affect tasks that determine the operator's work time. Indeed, GOMS shows that some presumed procedural improvements have the contrary effect of increasing the time an operator spends handling a phone call. We concluded that if GOMS had been done early on, then the task, not the workstation, would have been redesigned.


Gray, John, & Atwood, 1993

Gray, W. D., John, B. E., & Atwood, M. E. (1993). Project Ernestine: Validating a GOMS analysis for predicting and explaining real-world performance. Human-Computer Interaction, 8(3), 237-309.

Project Ernestine served a pragmatic as well as a scientific goal: to compare the worktimes of telephone company toll and assistance operators on two different workstations, and to validate a GOMS analysis for predicting and explaining real-world performance. Contrary to expectations, GOMS predicted and the data confirmed, that performance with the proposed workstation was slower than with the current one. Pragmatically, this increase in performance time translates into a cost of almost $2 million dollars a year to NYNEX. Scientifically, the GOMS models predicted performance with exceptional accuracy.

The empirical data provided us with three interesting results: proof that the new workstation was slower than the old, evidence that this difference was not constant but varied with call category, and (in a trial that spanned four months and collected data on 72,450 phone calls) proof that performance on the new workstation stabilized after the first month. The GOMS models predicted the first two results and explained all three.

In this paper, we discuss the process and results of model building as well as the design and outcome of the field trial. We assess the accuracy of GOMS predictions and use the mechanisms of the models to explain the empirical results. Lastly, we demonstrate how the GOMS models can be used to guide the design of a new workstation and evaluate design decisions before they are implemented.


Gray & Atwood, 1992

Gray, W. D., & Atwood, M. E. (1992). Transfer, Adaptation, & Use of Intelligent Tutoring Technology: The Case of Grace. In M. Farr & J. Psotka (Eds.), Intelligent instruction by computer: Theory and practice, (pp. 179-203). New York: Taylor and Francis.

The Grace Tutor, an intelligent tutoring system (ITS) for teaching COBOL, is part of the ACT* (Anderson, 1983; 1987a) family of tutors. The Grace Tutor and the student interact in a mixed-initiative dialogue. The tutor's side of the dialogue is controlled by four components: a cognitive model (or simulation) of the ideal student, an "overlay" model of what the student does and does not know (knowledge tracing), a curriculum specification, and an interface component. For the Grace Tutor the ACT* tutor technology was transferred from the university research laboratory to an independent, corporate development laboratory. In this chapter we discuss our first year of work on the Grace Tutor as a case study in how an ITS architecture, developed at a university as a research project, was transferred, adapted, and used by a corporation. A second theme that we interweave with the first is that of ITS as CHI, or ITS as a good domain in which to study and explore issues in computer-human interaction.


Gray, John, & Atwood, 1992

Gray, W. D., John, B. E., & Atwood, M. E. (1992). The précis of Project Ernestine or an overview of a validation of GOMS. In P. Bauersfeld, J. Bennett, & G. Lynch (Eds.), CHI'92 Conference on Human Factors in Computing Systems, (pp. 307-312). New York: ACM Press.

Project Ernestine served a pragmatic as well as a scientific goal: to compare the worktimes of telephone company toll and assistance operators on two different workstations, and to test the validity of GOMS models for predicting and explaining real-world performance. Contrary to expectations, GOMS predicted and the data confirmed, that performance with the proposed workstation was slower than with the current one. Pragmaticly, this increase in performance time translates into a cost of $2.4 million dollars a year to NYNEX. Scientificly, the GOMS models predicted performance with exceptional accuracy.

download pdf file


Gray, Corbett, & VanLehn, 1988

Gray, W. D., Corbett, A. T., & VanLehn, K. (1988). Planning and Implementation Errors in Algorithm Design. In Proceedings of the Eleventh Annual Conference of the Cognitive Science Society). Montreal, Canada.

Fifty-nine programmers were asked to code a Lisp function that performs a depth-first search of an hierarchy. Twenty-three of these programmers performed a paper-and-pencil simulation of the algorithm prior to writing the code. An analysis of the 59 solutions revealed errors in both planning and implementation that occur at difficult points in the algorithm design process. The programmers who simulated the function produced fewer major planning errors than those who did not. We conclude that it is possible to discern categories of errors that reflect not just implementation failures, but failures in predictable steps in the algorithm design process.


Gray & Anderson, 1987

Gray, W. D., & Anderson, J. R. (1987). Change-Episodes in Coding: When and How Do Programmers Change Their Code? In G. M. Olson, S. Sheppard, & E. Soloway (Eds.), Empirical Studies of Programmers: Second Workshop, (pp. 185-197). Norwood, NJ: Ablex.

Any change in a programmer's code or intentions while coding constitutes a change-episode. Change-episodes include error detection and correction (including false positives) as well as stylistic, and tactical changes. In this study we examine change-episodes to determine what they can add to the study of the cognition of programming. We argue that change-episodes occur most often for constructs that allow the most variability (with variability defined by the language, the task, and the programmer's history). We predict and find that those constructs that are involved in the most change-episodes are those for which much planning is needed during coding. Similarly, we discuss two ways in which a goal can be changed in a change-episode. One involves relatively minor editing of a goal's subgoals, suggesting that much planning is local to the current goal. The other entails major transformations in the goal's structure. Finally, we find that change-episodes are initiated in one of three very distinct circumstances: as an interrupt to coding, a tag-along to another change-episode, or a byproduct of symbolic execution. Our findings support the distinction between inherent and planning subgoals (2,3) and the distinction between progressive and evaluative problem-solving activities (6).