HCSNet Workshop on Effective Interactive Interfaces
Important issues and potential points of interaction with other disciplines
In particular, the Braccetto project, which is the first and foundational project of the initiative, undertakes research into how the effective application of ICT in the area of Mixed Presence Groupware can assist teams of co-workers to collaborate more effectively across a distance. Braccetto is undertaking research into the principles underlying effective, intense distributed collaboration and is implementing the results as new capabilities for supporting teams involved in creative activities such as collaborative design, planning, analysis, and decision-making. These new approaches will be evaluated against various underlying teamwork mechanics, focusing specifically on increasing the effectiveness of distributed teams through enhanced awareness.
Work on designs for new musical interfaces (Paine and Stevenson, 2005, Stevenson, 2005, Hewitt and Stevenson, 2003) has identified specific design criteria centring on evaluating models of musical communication in performance and understanding existing modes of physical performance interaction. The next phase of this research involves developing methods of evaluation to measure the effectiveness of these performance interfaces. The research proposes measures of accuracy, precision and ease of use based on control data extracted from the interfaces in performance conditions. A new measure of effectiveness based on a comparison of control data and analysed audio and video will be evaluated.
This presentation hopes to elicit ideas and discussion on evaluating physical interfaces based on analysis of cross-modal data sets.
(References omitted)
Multimodal interfaces expand the communication channel between the system and the user allowing users to express themselves more naturally and interact with complex information with more freedom of expression. One of the many cited advantages of multimodal interfaces is their ability to facilitate effortful complex tasks over unimodal interfaces. These strategies often result in changes to the way multimodal constructions are planned and executed.
Cognitive load refers to the amount of mental effort imposed by a particular task and has been associated to the limited capacity of working memory. I will start with an overview of the state of the art in cognitive load measurement. Recent research has shown that users. multimodal constructions exhibit significant changes as they self-manage their cognitive load when faced with tasks of increasing complexity. Our research focuses on extending the accepted benefits of multimodal interaction by using it to detect fluctuations in cognitive load will be stressed. The primary advantage of this approach is that cognitive load can be determined implicitly by monitoring variations of specific multimodal features during day to day tasks. Such unobtrusive measures may help determine user.s cognitive load in real time and adapt information content selection and presentation (multimodal output generation) accordingly, in order to ensure optimal user performance.
In this talk, I will describe an experiment designed to identify the relationships between combined speech and manual gesture input structures and users. cognitive load. The two input modalities are very familiar to users and psychologically closely interrelated, both in terms of planning and execution. Assessing a user.s cognitive load implicitly through their multimodal behaviour requires identifying a number of indices that reliably reflect fluctuations. Our hypothesis is that variations in redundant and complementary multimodal constructions can reflect cognitive load changes experienced by the user.
The feasibility of using rates of redundant constructions or even complementary constructions in multimodal input as an index of cognitive load is supported by the results of our study. I will illustrate multimodal patterns that may be monitored to detect cognitive load variations based on symptomatic behavioural features. I will conclude with a discussion on the enormous impact such methods may bring to the design of human computer interaction systems, but highlight the current limitations of the pattern acquisition methodology. Directions for future work will also be addressed.
This paper presents the Thummer Mapping Project (ThuMP), an industry partnership project between ThumMotion P/L and The University of Western Sydney (UWS). A crucial step in the development of new musical interfaces is the design of controllable and sometimes externally observable relationships between the performer.s physical gestures and the parameters that dictate the generation of the instrument.s sound, a process called control mapping.
The ThuMP project is engaging in the development of a new electronic musical interface/instrument based on a re-evaluation of the performer.s relationship with the performance interface. It sought to go back to examine musical interfaces that are broadly agreed to be successful and have persisted for a long time; acoustic instruments, namely, string and wind instruments and because of the nature of the Thummer. interface, the piano accordion and concertina. The ThuMP project posits that approaching the challenge of musical interface design from the musician.s perspective might enable a detailed understanding of the subtle mechanisms of feedback and control that allow the support virtuosic technique.
Some important issues:
I was involved in the development of a voice operated home environment for a quadraplegic that was in continuous usage in Bega from 1990-1995. A 286 PC with 250 word recognition was used to control the bed, television, phone, lights and do word processing. When this custom built system broke down the user could not find a replacement and went back to having to call a person every time he had to change the telly or move the bedhead. Now we are about ten doublings in technology sophistication later and the voice operated environment has not become any closer to the practical reality we believed we were on the verge of back then. I am hoping to find out why it hasnt happened, and if it ever will, or why it never will, with the help of the others at this workshop.
From a technology point of view, we find ourselves at a point in time where hardware prices have dropped to such an extent that we find computer technology more and more embedded in everyday items such as mobile phones, cars, digital cameras etc. With such a widespread use of technology the issue arises of how humans can interact most effectively with this technology. In the past, it was a case of the human having to adapt to the particular input and output formats of the computer system. Nowadays, we are at a transition point towards more human-like interfaces, such as automatic speech recognition, gesture recognition and facial expression recognition. In my opinion, this does not mean that traditional means of HCI (keyboards, computer mice, screens) will become obsolete, but that we will see more differentiated and application-specific interfaces that will thus be more effective. The interaction between disciplines such as computer science, psychology, linguistics and speech science, cognitive science and artificial intelligence will play an important role in developing such effective interfaces. Similar to interfaces becoming more application-specific, I also foresee that the collaboration and interaction between disciplines will be more specific towards particular interfaces.
David Powers, Darius Pfitzner, Kenneth Treharne, Martin Luerssen
User interfaces are about transferring information. Even simple interfaces are multimodal and involve two-way communication and multiple dimensions. Miller's well known Magical Number Seven survey (1956) of cognitive limitations has spawned considerable exploration of the information capacity of individual dimensions as well as combinations of dimensions and/or modalities. However, user interface design has tended to neglect the cognitive capacity of the user and certain multimodal and 3D interfaces have been shown to reduce user performance rather than increase it.
The Flinders Artificial Intelligence Laboratory is focusing on this area from a number of perspectives: Human Factors study of utility and combination of static and dynamic visualization attributes for information retrieval; Human Factors study of choice and ranking of keywords for document description and search; EEG analysis of cognitive load during skill acquisition; Thinking Head study of role of emotional and expressional extensions of language in Human Computer Interaction.
David Powers, Trent Lewis and Richard Leibbrandt
Our research on multimodal communication employs both acoustic and non-acoustic cues, using training to different speakers and noise sources, separation and identification of relevant and irrelevant sources, tuning of recognition systems to identified source factors, conditioning to different lighting, signal and noise conditions, use of syntactic patterns and fusion of multimodal information.
There is a synergy between our work with EEG in which muscular artefact is considered undesirable but forms a major part of the signal, and AVSR in which EMG signal correlates with visemic features. Similar techniques can be used to separate and identify multiple sources by combining ICA with source identification. We regard prosody, emotional content, source identity and characteristics as information that should be analysed in tandem with speech recognition/synthesis, each conditioned by the others. New fusion techniques are being applied across all kinds of information whether acoustic, visual, EMG, phonetic, prosodic, syntactic, semantic, emotive or identificatory.
Our solution is the development of a toolkit, LOSUM, which consists of a number of tools to support the user modelling process. It incorporates light-weight ontologies to fulfill a number of roles: aiding in metadata creation, providing structure for large user model visualisation, and as a means to reason across granularities in the user model. In conjunction with this, LOSUM also features a novel visualisation tool, SIV, which performs a dual role of ontology and user model visualisation, supporting the process of ontology creation, metadata annotation, and user model visualisation.
Currently we are conducting a number of user studies to evaluate the prototype. Our experience so far has highlighted a number of issues regarding "effectiveness", including (i) the differences between task-focused usability evaluations and qualitative user satisfaction evaluations, including whether there is any meaning to correlations between task-completion rates and user satisfaction, (ii) the degree to which it is possible to evaluate a mobile application in a laboratory setting, (iii) the difficulty of evaluating specific components of a complex system compared to evaluating a whole system end-to-end, especially where we rely on black-box speech recognition software, and (iv) how to enable users to find the "right" language to interact with the system. Our approach has been to enable fairly "natural", though constrained, language (open vocabulary, extensive use of pronouns, but confined to e-mail and calendar tasks) with the aim that users will accommodate their speech input over time to the type of language accepted by the system; we also expect users will require some initial training. The work thus involves conversational agents, HCI, intelligent user interfaces and user modelling.
Gregor McEwan and Saul Greenberg
Various studies of white collar work sites report that a large portion of peoples' time is spent in unplanned, casual interactions with co-workers [6][9]. These interactions are stimulated by physical proximity: people acquire informal awareness of each other, such as knowledge about presence, activity, and availability, which leads to opportunities for people to engage in light-weight casual interactions. Casual interactions are unplanned, brief, frequent, and usually engage small groups of people familiar with one another. While seemingly mundane, these casual interactions prove important as they make the transition to tightly-coupled collaboration easier. However, the same studies also found that these interactions are severely affected by physical separation. This means that distributed communities of co-workers miss out on these interaction opportunities. In response, are a myriad of tools providing mechanisms for displaying informal awareness information that lead to casual interactions, e.g. Instant Messengers [7], chat rooms / MUDS [3], and video-based media spaces [1].
Our work includes development of an awareness and interaction
tool. Our design perspective was to ground development in social
science theory. In particular, we were motivated by the Locales
Framework [4], a comprehensive theoretical group interaction
framework, as well as the Focus/Nimbus model of awareness [8]. We
derived and combined principles from these theories and applied them
to the design of Community Bar (CB), a groupware tool that supplies ad
hoc groups with rich awareness information leading to casual
interaction. CB also leverages and extends two previously introduced
design ideas. First, media items [5] are used as groupware building
blocks to offer rich multimedia awareness and interaction
capabilities. Second, these items are embedded within the sidebar
metaphor [2], where people see awareness information at the screen's
periphery, and can selectively drill down to more information and
interaction. This work made two contributions: firstly, the derivation
of theory-based design principles for informal awareness and casual
interaction systems, and secondly, the design and implementation of a
system, CB, that follows these principles.
(References omitted)
Individuals differ in the ways they react emotionally to events and situations, which is partly due to personality, and partly to the socialization of emotion. An effective affective interface is one which maintains rapport with a user by responding appropriately for that particular user. User modelling in this manner is especially important in situations of user stress, where the set of possible responses contains candidates that could either mollify the user or further frustrate them, but which one does what differs from user to user. For these reasons, I believe interaction with fields such as Personality and Social Psychology will likely be beneficial.
- improvisation with the toys;
- composition for "traditional" instrument augmented by the toys;
and/or
- exploration of characteristics of the musician's
sound/technique.
A technique we have been using to facilitate user engagement is the use of physicalmodels (ie. mass-spring-damper systems) to mediate between the live music produced by the musician-user and the audio-visual response of the software. This seems to result in amore intuitively understandable interaction paradigm that musicians respond well to.
This research is in progress, but our intention is to use evaluation techniques from HCI to inform an iterative approach to interaction design. We would be very interested to discuss our work with those researching the psychology of music, as we feel that research into the perceived relationships between sound and physical forces may be very relevantto our work.
Anja Wessels and Cara Stitzlein
Since any (computer-mediated) interaction is inherently social, the consideration of human factors is a prerequisite for the success of proper representation of the nature of interaction. Therefore, we would like to present a number of psychologically-derived quality criteria useful in the evaluation of .interaction effectiveness and efficiency.. These criteria such as feedback, cognitive load, and motivation are borne from theory and observable in human-human interactions. For instance, feedback (as information) acts as a stimulus- improving attentional focus on relevant content. Within the interaction itself, it moderates motivational behaviour, realizable in outcomes of task performance. Feedback as an evaluative component implies that the computer-mediation (communication and coordination) provides sufficient channels for the sourcing, transmitting, and receiving of information necessary for successful interaction.
Apart from above mentioned quality criteria, these psychological insights shall be integrated in early developmental phases of new or modified ICT. A model, like the reference model used in Braccetto, enables a fit between several streams such as capability (i.e. technological work practice and content), work context (i.e. activities and artefacts), and teamwork mechanics (i.e. cognitive, motivational and conative processes with respect to the perceiving and acting bodies of the participants in the interaction). Using a model like this allows consideration of additional criteria from organizational and technical disciplines, providing a common language absent in some ICT agendas. In this way, evaluative frameworks include the human factors perspective.
Over the last couple of years we developed an intelligent text editor for writing texts in a machine-oriented controlled natural language. The writing process is guided by text- and menu-based predictive interface techniques. This has the advantage that the user does not need to remember the rules of the language. Using this approach we can guarantee that the resulting text has the same formal properties as the underlying knowledge representation language. I am especially interested in getting feedback on our interface approach to write text in controlled natural language and in hearing about other effective interface strategies for knowledge acquistion, in particular in the context of the Semantic Web. Our ultimate goal is to find an interface strategy for controlled natural languages which supports lay persons in an unobtrusive way. In summary: what can HCI research offer here?
In psychology, the pointing gesture belongs to the class of "deictics" gestures, which serves two main functions: to indicate a direction or to pinpoint a certain object. In collaboration with psychologists, computer scientists can determine the most natural way for human to point at objects from a distance. They can also determine the kind of gestures that is more natural to use, and allow us to interact with computers (HCI) similar to interacting with humans (HHI).
Dialogue interfaces would allow a user to tell the application what to do using natural language. In the case of ambiguous requests, applications would have the opportunity to ask clarification questions. Accomplishing this requires some degree of semantic representation to match a function with the many ways in which it can be described. This may eventually encompass a wide range of user feedback combined with reinforcement learning where systems will learn what feature the user is referring to with various utterances.
Tabletop interaction: social interfaces, metadata markup and pervasive
file system interaction (Trent Apted, Anthony Collins, Judy Kay, Glen
Whitaker)
Learner modelling for reflection: largescale modelling of learning in
the context of an HCI course based on blended elearning; learning to
think like a programmer with design and analysis supported with
reflective activities (Judy Kay, Lichao Li, Andrew Lum)
Keep-in-touch: intergenerational communication with appliances that
support natural asynchronous voice messages plus images with highly
flexible interaction options from magic mirrors, magic wands,
touch.... (Patrick Burns, Veasna Hoy, Judy Kay, Bob Kummerfeld, Geoff
Langdale, Glen Tregoning)
MyPlace: personalised delivery of information about places, people,
sensors and things driven by active user modelling (Mark Assad, David
Carmichael, Judy Kay, Bob Kummerfeld, William Niu)
Never mind the quality, just feel the width: exploiting vast amounts
of learning data to support teaching and learning, making use of
visualisations, data mining of activity, interactions and temporal
patterns (Judy Kay, Irena Koprinksa, Andrew Lum, Nicolas Maisonneuve,
Peter Reimann, Adam Ullman, Kalina Yacef, Osmar Zaine)
- The process of decision making relies on having a significant
amount of data that needs to be analysed in order to make an informed
decision.
- Decision making can be viewed as being equivalent to path planning as
both involve going from one state to another.
- Therefore, it may be possible to apply path planning techniques to
support decision making(depending on the data). However, such data can
be high dimensional and thus making it difficult to apply traditional
path planning techniques
- The Self-Organizing Map algorithm(in this case i use the Geodesic
Self-Organizing Map) may be used to perform dimensionality reduction,
to map n-dimensional data to 2D or 3D, which would then allow the
application of path planning techniques, such as distance
transformations, which have typically been employed in the area of
robotics.
Matthew McGill and Claude Sammut
After developing several multi-modal user interfaces for applications
such as a museum tour guide, portable conversational assistant and
intelligent entertainment centre, we created a scripting language that
would allow the construction of modular scripts that could be reused
in different applications. This feature was originally envisaged as
allowing the development of script libraries that would enable new
multimodal applications to be scripted more easily by allowing common
interactions to be imported from a library. As a result the language
FrameScript was created.
FrameScript is a multi-paradigm language for scripting of multimodal interfaces. Included in the language are rule-based processing, frame representations and simple functional evaluation. FrameScript has been developed for use in the development of multimodal interaction managers in multimodal applications. It does this by extending the language used in ProBot for scripting conversational agents to allow it to respond to any events that a system can detect or generate and can be represented as a frame. This allows scripts to be written that respond to not just spoken inputs but also to clicks from a mouse or touch screen, or even recognized gestures. It also allows scripts to be written to provide a system with a level of proactivity as the scripts can initiate interactions with users in response to system events, such as the arrival of a new email or the detection of a change in environment.
By accepting input in the form of frames FrameScript can provide interaction management in a multimodal system whose messages between components can be represented as frames. This allows it to be used with both EMMA and Mica multimodal architectures.
Cecile L Paris and Nathalie Colineau
In our work, we look at providing users with the information they need, in a manner that is understandable and useful and to them. A major aspect of our approach is to tailor the information to users, in particular to a user's task and context. We see tailoring as contributing to the effectiveness of an interface. With the increasing amount of data and information available, finding what one is looking for is not an easy task. The idea of being able to provide to users information tailored to their needs (or preferences) is thus an attractive proposition. This has led to a substantial body of work in User Modelling, Adaptive and Recommender Systems, Intelligent User Interfaces (IUI) and Intelligent Multi-Media Presentation Systems. It is assumed that providing targeted information will result in a more effective communication.
While there has been a number of evaluations to test whether the customisation of information makes a difference to the user in terms of whether they prefer it to non-tailored information, or, in cases where the information provided was meant to influence behaviour, whether it led to more behavioural change than non-tailored information, the impact of tailoring on the efficiency of information seeking has not been tested. This specific question is the one we addressed in our work. We wanted to learn about the effectiveness of the tailoring on users' information task performance, and, in particular, whether or not having tailored information helps them finding the information they need. More generally, we look at identifying what in a user model (or in the context) can affect the system to make a more effective, useful and natural interface - with information and information spaces, in our case.
This issue arise not only in the context of textual interfaces but also multimodal and spoken interfaces, when a user interfaces with information. A number of elements can play a role, including tasks, the environment, emotions, etc. If they are to be exploited for a more effective interaction, issues of modelling and acquiring them also arise. A number of disciplines can contribute to this problem (as illustrated by the fact that the User Modelling community includes researchers from, at least, HCI, Language Technology, Psychology, Communication Science, Artificial Intelligence, Cognitive Science, Computer Science, Linguistics and Speech Science, Sociology, and other related fields.
Collaboration over a distance has been a major research driver in telecommunication and high-bandwidth application development. If collaboration across remote locations is applied to teams of people rather than groups (a team is a purposeful group of people; in a team, people have specific roles, tasks, and objectives to achieve), the thrust on the mediating technology is generally higher (Daft et al, 1986). In the business project Virtual Media Office within the CSIRO Networking Technologies Lab, we investigate remote collaboration to teams of intensely interacting creative animation artists in the Digital Media Production industry. Based on field observations of how the digital animation artists interact with each other in the current co-located setting in Sydney's inner-city headquarters (Schremmer at al., 2006a), we have determined a first development project to create a new, user-status aware control interface to CSIRO.s current implementation of the Virtual Tearoom high-quality videoconferencing system for use within the confined place of an artist.s personal workspace. A video link within a personal workplace requires a higher degree of ambience of the communications technology; see (Schremmer, 2006b) for a more detailed discussion.
During the workshop, we'd like to discuss the concept of ambience in technology-mediated communication, tools to improve the perception of ambience (e.g., user-status information, design of user control interfaces, auditive feedback, visual .blurring. of a video connection), and methods to evaluate the effectiveness of ambient perception.
The Virtual Media Office project is aligned with the .HxI Initiative. (abstract submitted by Rudi Vernik, Belinda Kellar, Julien Epps, Claudia Schremmer) and .Evaluation Criteria from a Human Factor Psychological Perspective. (abstract submitted by Anja Wessels and Cara Stitzlein).
Two acoustic information feedback interfaces have been developed for training musicians. An acoustic information feedback interface is useful for musicians as it can help supply measurements of the musician's sound which can be used for building technical and aural skills with an instrument or voice. These skills, theoretically at least, should be able to be developed at a faster rate than that achievable with verbal feedback from instrumental teachers, as the feedback loop between production and result is drastically shortened with the use of realtime feedback. A feedback interface also may help alleviate some of the effects inherent in one-on-one teaching, such as access, bandwidth and interpersonal issues, resulting in a hopefully complementary teaching experience. One of these interfaces was developed in the visual domain (Ferguson, Vande Moere and Cabrera, 2005), whilst the other was developed as a auditory interface (Ferguson,2006). Comparisons between these two domains will be discussed, as will the usefulness of each domain for this purpose. Guidelines for future prototypes and the evolution of a more generic approach will also be addressed.