Janssen, C., Gray, W. D., & Schoelles, M. J. (2008). How a modeler’s conception of rewards influences a model’s behavior: Investigating ACT-R 6’s utility learning mechanism. Paper presented at the 13th Annual ACT-R Workshop.
How a modeler’s conception of rewards influences a model’s behavior: Investigating ACT-R 6’s utility learning mechanism
Temporal difference learning has recently been introduced as the new utility learning mechanism in ACT-R 6 (e.g., Fu & Anderson, 2004). Common practices for using it still have to emerge. In this study we take a first step by investigating two critical aspects of utility learning: the location and size of rewards. As a case study we use the Blocks World task (Gray et al., 2006). In this task subjects have to copy a pattern of eight blocks, depicted in a target window, by moving blocks from a resource window to a workspace window. Information in each of the windows is covered by a gray rectangle and only becomes available when subjects move the mouse cursor into the window area. In addition, the information in the target window only becomes available after waiting for a lockout time of 0, 400 or 3200 milliseconds (manipulated between subjects). As the size of the lockout time increases, subjects tend to study and place more blocks per visit to the target window. Previous attempts in modeling the task in ACT-R 5 did not provide good fits to human data. Analysis indicated that this might be because ACT-R 5’s expected value equation can only handle binary feedback (Gray, Schoelles, & Sims, 2005). As ACT-R 6’s utility learning mechanism is not limited to binary feedback, its use seems more promising.
Please note that the copyright of this article is owned by the author.Back to Home << Publications Visitors since 2004.12.08: