Inferring intent in eye-based interfaces: Tracing eye movements with process models

D. D. Salvucci. Inferring intent in eye-based interfaces: Tracing eye movements with process models. In Proceedings of the SIGCHI conference on Human factors in computing systems, pages 254–261, Pittsburgh, Pennsylvania, USA, May 15-20 1999. Association for Computing Machinery. [pdf]

———–

While current eye-based interfaces offer enormous potential for efficient human-computer interaction, they also manifest the difficulty of inferring intent from user eye movements.  This paper describes how fixation tracing facilitates the interpretation of eye movements and improves the flexibility and usability of eye-based interfaces.  Fixation tracing uses hidden Markov models to map user actions to the sequential predictions of a cognitive process model.  In a study of eye typing, results show that fixation tracing generates significantly more accurate interpretations than simpler methods and allows for more flexibility in designing usable interfaces.  Implications for future research in eye-based interfaces and multimodal interfaces are discussed.

The main argument of this paper is that fixation tracing facilitates the analysis of eye movements to the user intentions that produced them. Tracing is the process of inferring intent by mapping observed actions to the sequential predictions of a process model. Fixation tracing interprets protocols by means of hidden Markov models, probabilistic models that have been used estensively in handwriting recognition.

Fixation tracing, according to the author, can interpret eye-movement protocols as accurately as human experts and can help in the creation, evaluation and refinement of cognitive models.

The author concludes saying that greater potential arises in the integration of eye movements with other input modelities as for instance, an interface in shich eye movements provide pointer or cursor positioning while speech allows typing or directed commands.

Salvucci Markov-Model

Tags: , ,

Eye gaze patterns in conversations: There is more to conversational agents than meets the eyes

R. Vertegaal, R. Slagter, G. van der Veer, and A. Nijholt. Eye gaze patterns in conversations: There is more to conversational agents than meets the eyes. In Proceedings of the SIGCHI conference on Human factors in computing systems, pages 301–308, Seattle, WA, USA, March 31-April 4 2001. Association for Computing Machinery. [pdf]

————

In multi-agent, multi-user environments, users as well as agents should have a means of establishing who is talking to whom. In this paper, we present an experiment aimed at evaluating whether gaze directional cues of users could be used for this purpose. Using an eye tracker, we measured subject gaze at the faces of conversational partners during four-person conversations. Results indicate that when someone is listening or speaking to individuals, there is indeed a high probability that the person looked at is the person listened (p=88%) or spoken to (p=77%). We conclude that gaze is an excellent predictor of conversational attention in multiparty conversations. As such, it may form a reliable source of input for conversational systems that need to establish whom the user is speaking or listening to. We implemented our findings in FRED, a multi-agent conversational system that uses eye input to gauge which agent the user is listening or speaking to.

This paper contains interesting references of eye-tracking studies showing that gaze is used a communication mechanism to: (1) give/obtain visual feedback; (2) communicate of conversational attention; (3) regulate arousal.

Two other interesting ideas that I found in the paper:

1- When preparing their utterances, speakers need to look away to avoid being distracted by visual imput (such as prolonged eye contact with a listener). [Argye and Cook, 1976]

2- If a pair of subjects was asked to plan a European holiday and there was a map of Europe in between them, the amount of gaze dropped from 77 percent to 6.4 percent [Argyle and Graham, 1977].

Vertegaal Eye-Gaze-Patterns

Tags: , ,

When life begins

I really enjoyed this discussion as I think we really need to talk about these issues. I personally consider myself as a pro-life person although I do not really like labels. Labels are made by politicians and always make extreme cases and opinions and help making statistics …

I subscribe to what many commenters said about what is to be considered as a central issue, that is when life begins! Depending on this we might talk about an object (a group of cells) or a human being.

I personally think that life begins when the egg is fertilized and from that point on we can consider that group of cells as a human being. Then everything else follows, like: who has the right to decide on that life?

I also agree to other opinions expressed in the comments like that of the ‘unwanted child’ but for which I think there are viable solutions (like giving the child to a couple that cannot have children).

Also I think that things are not always black and white as depicted in the original post. Although I am pro-life I also think that sex is a good thing and should be explored freely with “the courtesy” of not implying someone else’s life into play. I am in favor of anti-conception systems like condoms or women pills to avoid the union of the egg and the sperm (sorry for my poor language here but I lack some vocabulary).

As we do not want to be played in our own life, we should not do that with others’ lives. Conception should be a responsible choice. People should not be forced to be parents if they do not want to, but in the same way, fetus should not be killed because somebody said: “Oops!”.

Then all the corollary of extreme cases like: “the mother was risking her life to give birth to a sick child”, I think are just rhetorical cases taken by politicians to justify their point. If we really want to look at statistics then we should consider the fact that the majority of abortions are those of healthy fetus that are simply not wanted.

My two cents.

Tags: , , ,

CHI conference report, day 2, 3

On Tuesday, I attended a session on mobile interaction techniques. The first paper was presented by Will Seager and was titled: “Comparing Physical, Automatic, and Manual Map Rotation for Pedestrian Navigation“. The authors were trying to answer the question: what’s the best way to rotate digital maps on mobile devices? They compared three conditions one in which they were giving a paper-based map, the second where they were giving a mobile device but asked the participants to physically rotate the device to adapt to the different orientations and finally using automatic rotation of the map. Counter-intuitively, people had an hard time recognizing the map with automatic rotation. A good compromise seemed to be a combination of manual and automatic techniques.

Enrico presented a talk on his master’s thesis on liminal devices: “Intimate interfaces in Action: Assessing the Usability and Subtlety of EMG-Based Motionless gestures“. A new way of communicating with mobile devices, in which minimal information is exchanged. His device used EMG (electromyograms) electrical signals that are sent to the mussels to activate a movements. Sensing these signals is possible to use them to act on a device like a mobile phone.

On Wednesday, I attended a session on Tags, Tagging and Notetaking. One of the most interesting paper was presented by Morgan G. Ames, on “Why We Tag: Motivations for Annotation in Mobile and Online Media“. The author conducted a user study on ZoneTag a mobile application that allows to easily take pictures with the phone, tag these, and post them on Flickr. They interviewed a number of user using the application and they built a taxonomy of the reasons why people add tags to pictures. I had the impression that the same taxonomy also applied to other tagging situation, like geolocalized messaging. The short answer was that tagging is a social activity and the reasons for doing this are multifaceted. Another interesting question to ask would be: “why people do not tag?”.

Aaron Bauer presented a paper on “Selection-Based Note-Taking Applications“. They conducted a controlled experiment to answer the question: “Why/How should I take notes?”. They found that we should design better interventions and that user tests should include attitudes. Finally they concluded that we should reduce wordy-notes as they do not help recall.

Finally I attended two short talks. The first was presented by Kaj Makela and it was titled “Mobile Interaction with Visual and RFID Tags – A Field Study on User Perception“.  Presented a nice report on how RFID and visual tags are used in an everyday context. The second was on how people use tagclouds: it was presented by A. W. Rivadeneira and it was titled: “Getting Our Head in the Clouds: Toward Evaluation Studies of Tagclouds. The authors found that the location of words and their size had an effect on memory recall. This was called effect of “quadrant”.

In the afternoon I attended the session on designing for specific cultures. The first paper was presented by K. Boehner and it was titled: “How HCI interprets the probes“. The authors performed an analysis of the use of cultural probes and allied methods in HCI design practices. The paper provides an alternative account of the relationship between data gathering and knowledge production in HCI.

The session on learning and education was also very inspiring. I attended a short talk on the use of improvisation for design: “Improvisation Principles and Techniques for Design“, presented by E. Gerber. The authors explored the application of the principles and techniques of improvisation to the practice of design, demonstrating potential successful outcomes at the individual and group level in design. Mistakes should be celebrated.

One of the most interesting papers of the day was presented by Pamela J. Ludford on “Capturing, Sharing, and using Local Place Information“. The authors presented the benefits of shared local place information application reporting on two user studies. The starting assumption was that we find places by following social/popular places. They describe the use of an application called PlaceMail using which people could set location-based reminders for their personal use. Then they asked their participants their consent to share annotations with the general public. They used the data collected during this trial to show social trails in an application called ShareScape. Using this second experiment they details privacy preferences in this domain. This work was a follow-up of this paper presented at CHI2006.

Finally Carl DiSalvo presented a paper titled “MapMover: A Case Study of Design-Oriented Research into Collective Expression and Constructed Publics“. The paper report a failure in building a location-based service called MapMover that supported the authoring of audio recordings displayed at specific points on a map. In order for these services to work a public is required. Public should be intended as constructions from constraints.

Tags: ,

Visual information as a conversational resource in collaborative physical tasks

R. E. Kraut, S. R. Fussell, and J. Siegel. Visual information as a conversational resource in collaborative physical tasks. Human-Computer Interaction, 18:13–49, 2003. [pdf]

—————–

Visual information plays two interrelated roles in collaborative work: first it helps people maintin up-to-date mental models or situational awareness (Endsley, 1995) of the state of the task and others’ activities; second it helps people communicate about the task, by aiding conversational grounding (Clark, 1981). The authors’ assumption is that the usefulness of a video system for remote collaborative work depends on the extent to which the video configuration makes the same visual cues available to collaborators that they use when performing the task when co-located.

Communication media limits the visual information that can be shared, with resulting effects in the collaboration process and performance. To test the hypotheses raised by this intuition, the uathors conducted two experiments using a bike-repair task whern an expert was guiding a novice repairing a bike with various communication settings. The authors varied the presence of a visual channel and used a control condition given by face-to-face interaction.

Psysical tasks can be performed most efficiently when a helper is physically co-present. Having a remote helper leads to better performance than working alone, but having a remote helper is not as effective as having a helper working by one’s side. The visual information was valuable for keeping haware of the changing state of the task.

Communication was more efficient in the side-by-side condition, where the helper spent more time telling the worker what to do. In the mediated condition, not only the dialogues longer, but their focus shifted: more speaking turns are devoted to acknowledging the partners’ messages.

One of the limits of this study is that they used an head-mounted video camera to show to the worker the focus of attention of the helper. This might not give the right information as the worker’s camera view might still contain too much information to be effectively used.

The authors conclude with four implications for design that I report below:

(1) Provide people with a wide field of view, including both task objects and the wider environment, so that they can more easily maintain task awareness and ground conversations;

(2) Clarify what is part of the shared visual space. All parties to the task should have a clear understanding of what one another can see; that is, the contents of the shared visual space should be part of participants’ mutual knowledge or common ground;

(3) Provide mechanisms to allow people to track one another’s focus of attention. When people can see where each person is looking, it is easier to establish common ground;

(4) Provide support for gesture within the shared visual space. Talking about things is most efficient when people can use a combination of deictic expressions and gestures to refer to task objects.

Kraut Bike-Repair

Tags: , ,

Using eye movements to determine referents in a spoken dialogue system

E. Campana, J. Beldridge, J. Dowding, B. A. Hockey, R. W. Remington, and L. S. Stone. Using eye movements to determine referents in a spoken dialogue system. In Electronic Proceedings of the Workshop on Perceptive User Interfaces (PUI’01), Orlando, FL, USA, November 15-16 2001. [pdf]

——–

Most computational spoen dialogue systems take a literary approach to reference resolution.  With this type of approach, entities that are mentioned by a human interactor are unified with elements in the world state ased on the same principles that guide the process during text interpretation.  In human-to-human interaction, however, referring is a much more collaborative process.  Participants often under-specify their referents, relying on their discourse partners for feedback if more information is needed to uniquely identify a particular referent. By monitoring eye-movements during this interaction, it is possile to improve the performance of a spoken dialogue system on referring expressions that are underspecified according to the literary model. This paper descries a system currently under development that employs such a strategy.

Tags: ,

A rating scheme for assessing collaboration quality of computer-supported collaboration processes

A. Meier, H. Spada, and N. Rummel. A rating scheme for assessing collaboration quality of computer-supported collaboration processes. International Journal of Computer Supported Collaborative Learning, 2(1):63–86, March 2007. [pdf]

———–

This paper defines a nine-dimension rating scheme that can be asses the quality of Computer-Supported collaboration processes:

1. Sustaining mutual understanding. Speakers make their contributions understandable for their collaboration partner, e.g., by avoiding or explaining technical terms from their domain of expertise or by paraphrasing longer passages of text from their materials, rather than reading them aloud to their partner.

2. Dialogue management. A smooth “flow” of communication is maintained in which little time is lost due to overlaps in speech or confusion about whose turn it is to talk. Turn-taking is often facilitated by means of questions (“What do you think?”) or explicit handovers (“You go!”). Speakers avoid redundant phrases and fillers (“um… um,” “or….or”) at the end of a turn, thus signalling they are done and the partner may now speak.

3. Information pooling. Partners try to gather as many solution-relevant pieces of information as possible. New information is introduced in an elaborated way, for example by relating it to facts that have already been established, or by pointing out its relevance for the solution. The aspects that are important from the perspective of their own domain are taken into account and they take on the task of clarifying any information needs that relate to their domain of expertise.

4. Reaching consensus. Decisions for alternatives on the way to a final solution (i.e., parts of the diagnosis) stand at the end of a critical discussion in which partners have collected and evaluated arguments for and against the available options. If partners initially prefer different options, they exchange arguments until a consensus is reached that can be grounded in facts.

5. Task division. The task is divided into subtasks. Partners proceed with their task systematically, taking on one step toward the solution after the other with a clear goal or question guiding each work phase. Individual as well as joint phases of work are established, either in a plan that is set up at the beginning, or in short-term arrangements that partners agree upon as they go. Partners define and take on individual subtasks that match their expertise and their resources. The work is divided equally so none of the collaborators has to waste time waiting for his or her partner to finish a subtask.

6. Time management. Partners monitor the remaining time throughout their cooperation and make sure to finish the current subtask or topic with enough time to complete the remaining subtasks. They check, for example, whether the current topic of discussion is important enough to spend more time on, and remind one another of the time remaining for the current subtask or the overall collaboration.

7. Technical coordination. Partners master the basic technical skills that allow them to use the technical tools to their advantage (for example, they know how to switch between applications, or how to “copy and paste”). Collaborators further arrange who may write into the shared editor at which time. At least one partner makes use of his or her individual text editor, thus allowing for phases of parallel writing.

8. Reciprocal interaction. Partners treat each other with respect and encourage one another to contribute their opinions and perspectives. Critical remarks are constructive and factual, never personal; i.e., they are formulated as contributions toward the problem solution. Partners interact as equals, and decisions  are made cooperatively.

9. Individual task orientation. Each participant actively engages in finding a good solution to the problem, thus bringing his or her knowledge and skills to bear. He or she focuses attention on the task and on task relevant information, avoids distractions, and strives to mobilize his or her own as well as the partner ’s skills and resources.

Tags: , ,

Analyzing and Predicting Focus of Attention In Remote Collaborative Tasks

J. Ou, L. M. Oh, S. R. Fussell, T. Blum, and J. Yang. Analyzing and predicting focus of attention in remote collaborative tasks. In roceedings of the 7th international conference on Multimodal interfaces ICMI ’05, pages 116–123, Trento, Italy, October 4-6 2005. ACM Press. [pdf]

————–

The core argument of this work is that face-to-face collaboration leads to better performance than remote collaboration. One of the reason that might influence this difference, according to the authors, is that there are less visual cues that help the peers to situate their conversation. As bandwidth is limited a solution might reside in automatically control the participants’ point of view use gaze information on both sides.

Previous work reported in the paper shows that people look at intended objects before making conversational references to these objects. However, a major problem of gaze-based interfaces is the difficulty in interpreting eye-movement patterns due to unconscious eye movements such as saccades and to gaze tracking failures.

To asses this idea they divided the workspace in three areas: the pieces bay in which pieces are stored; the workspace in which the worker has to construct the puzzle; and the target solution, which showed how the puzzle should be constructed. The authors performed a statistical analysis of the eye-movements in the collaborative phase and based on this constructed a Markov model that was predicting succesfully over half of the gaze sequences across conditions.

Ou Predictingfocusattention

Tags: , , , ,

CHI conference report, day 1

CHI2007 was a great experience: the conference gathered 2622 participants from 42 countries. It was my first CHI!

I started on Sunday with a course on HCI History: Trajectories into the Future, given by Jonathan Grudin. The presenter showed a graph putting in relation the size of computers with the technological changes that caused this miniaturization. In the same scale he presents the change of interest in the computer-humans factors. One of the innovations that drove the shift of focus in the HCI community in the 70’s was the shift of focus on non-discretionary use of computers. One interesting thing to notice was that in CHI’83 (the first CHI conference), psychologist were never cited. The current player in HCI is are the mobile applications. The next player in HCI will be marketing: the goal of a web site for an usability expert is to bring fast the user to his goal. For the marketing designer it is to expose the user to the wides range of possibilities. [here are my running notes taken during the course]

On monday I attended the session on “An Exploratory Study of Input Configuration and Group Process in a Negotiation Task Using a Large Display” (J. P. Birnholtz et al.) where the authors were trying to answer this question: how many pointing devices should be provided to a group? The authors conducted a controlled experiment showing that the number of input devices can affect the outcomes. The participants had to edit collaboratively a front page of a journal. Each article contained keywords of a particular kind. Each participant was responsible for a particular kind of keywords. The higher the number of keywords the higher points that the participant received. The author registered an higher quality of discussion in the single mouse condition, while the registered more self-interested actions in the multi-mouse condition. Single mouse forced shared focus, at the cost of reduced productivity. [url]

On thursday U. S. Pawar (Microsoft India) also presented a research on the use of “Multiple Mice for Retention Tasks in Disadvantaged Schools“. Results suggests that shared use is as effective as one-student-per-pc.

On monday there was a session on capturing life experiences that was tremendously interesting: the use of prosthetic memory. The first talk I attended was presented by V. Kalnikaite (University of Sheffield, UK): “Software or Wetware? Discovering When and Why people Use Prosthetic Memory“. The authors developed an application called ChittyChatty where people could write simple annotations and record some audio information along. The text was used to recall the content of the audio message, as an index. Prosthetic Memory designers should focus more on device efficiency over accuracy. In the same session there was another presentation on SenseCam: a prosthetic memory device developed at Microsoft Research Cambridge: “Do Life-Logging Technologies Support Memory for the Past? An Experimental Study Using SenseCam“. The author showed in this presentation how the position of the camera on the user’s chest was compared to a self point of view.

One of the last sessions that I attended on the first day was that of Mobile Applications. Matt Jones presented a note on “Questions not Answers: A Novel Mobile Search Technique“. The authors presented a novel perspective on the mobile search problem using low-cost, incidental information. The note argue that this technique can be used to provide users with insights into the locations they encounter. To the authors, presenting previous queries in situ can give “a sense of place”. In the same sessions Stephen Brewster presented a note on the use of “Tactile Feedback for Mobile Interaction“. The main argument was that on-screen keyboard do not give tactile feedback for key-press. The authors equipped a mobile phone with some vibration actuators and defined the concept of TACTONS, the equivalent of a phonemes, for the tactile perception. The authors reported of a study conducted in the subway of Glasgow (a noisy environment). Results showed that participants were paying more attention in the train situation that in the laboratory setting. 

Finally, Mary Flanagan presented “A Game Design Methodology to Incorporate Activist Themes“. She presented a rigorous, systematic means to take human values into consideration in design at many levels. She introduced the framework that they developed: values at play, and a game: rapunsel, and six circles. One of the main issue presented is that there are many social stereotypes that are difficult to overcome in design, like sexual characters. More here.

Img 5189

Tags:

Let’s go to the whiteboard: How and why software developers use drawings

I am back from CHI2007, where I presented the work I did this summer at Microsoft Research, in Redmond (WA):

M. Cherubini, G. Venolia, R. deLine, and A. J. Ko. Let’s go to the whiteboard: How and why software developers use drawings. In Proceedings of the SIGCHI conference on Human factors in computing systems (CHI2007), pages 557 – 566, San Jose, CA, USA, April 28, May 3 2007. ACM Press. [pdf]

————-

Software developers are rooted in the written form of their code, yet they often draw diagrams representing their code. Unfortunately, we still know little about how and why they create these diagrams, and so there is little research to inform the design of visual tools to support developers’ work. This paper presents findings from semi-structured interviews that have been validated with a structured survey. Results show that most of the diagrams had a transient nature because of the high cost of changing whiteboard sketches to electronic renderings. Diagrams that documented design decisions were often externalized in these temporary drawings and then subsequently lost. Current visualization tools and the software development practices that we observed do not solve these issues, but these results suggest several directions for future research.

Letsgotothewhiteboard Cover

The slides of the presentation can be found here.

The PDF of the paper is available on the ACM digital library or here.

Tags: , ,