Ht06, tagging paper, taxonomy, flickr, academic article, to read

C. Marlow, M. Naaman, D. Boyd, and M. Davis. Ht06, tagging paper, taxonomy, flickr, academic article, to read. In HYPERTEXT ’06: Proceedings of the seventeenth conference on Hypertext and hypermedia, pages 31–40, New York, NY, USA, 2006. ACM. [PDF]

———

This seminal paper presents a taxonomy of tagging systems based on incentives and contribution models to inform potential evaluative frameworks. First, the authors explain why tagging systems are used and the different strands of research around this area, namely the social bookmarking, the social tagging systems, and folksonomies. They explain how these systems are affected by the vocabulary problem, where different users use different terms to describe the same thing.

The authors model a prototypical tagging system as having three components, a number of resourcer, tags associated to these resources and users who generated the tags. Some systems can have links among the resources and/or links among the users.

The taxonomy they define contains seven categories: tagging rights, or the system’s restriction on group tagging; tagging support, or the mechanism of tag entry being either blind (the user cannot view the tags assigned by others to the same resource), viewable or suggestive; the form of aggregation of tags around a given resource, namely collecting repeated tags generated by different users, or bag-model, or aggregating the most frequent tags, or set-model; another difference is the type of object being tagged; related to this last point is the source of the material, the resources can be supplied by the participants, by the system, or being pubicly available; additionally, the resources can be linked, grouped or separated; finally, the users can be linked, grouped or kept separated.

Concerning the user incentives, tagging is performed for different reasons: for future retrieval; for contribution and sharing; to attract attention; to play or compete; for self presentation; and finally for expressing opinions.

Finally the paper presents an interesting analysis of a Flickr dataset. The authors studied the growth of distinct tags for 10 randomly selected users over the course of uploaded photo. They found a number of different behaviors. In some cases new tags are added consistently as photo are uploaded. Sometimes, only few tags are used at the beginning and more later on. For many users distinct tag growth declines over time. Finaly, they found vocabulary inside a community of users to be more overlapping than the vocabulary of a random sample of unconnected users.

Marlow_distinct-tags.jpg

Interview viz: visualization-assisted photo elicitation

N. A. V. House. Interview viz: visualization-assisted photo elicitation. In CHI ’06: CHI ’06 extended abstracts on Human factors in computing systems, pages 1463–1468, New York, NY, USA, 2006. ACM. [PDF]

———-

This paper describes a Visualization-Assiste photo elicitation techniques. The authors’ main argument is that photo elicitation interviews can be helped with visualization because people tend to forget the context in which a picture was taken.

They developed MMM2, a multimedia system that help organizing a collection of pictures and produces different visualization, organized by time, or by sharing recepient or source.

Using the visualizations, interviewee could better recall the contextual information at the time the picture was taken. Also the evidence sometimes refreshed or contracticted the participants’ memories. Third, patters of use, of which the participants were unaware, became apparent, revealing important events, or lacunae of activity.

vanhouse_viz-photo-elicitation.jpg

Why we tag: motivations for annotation in mobile and online media

M. Ames and M. Naaman. Why we tag: motivations for annotation in mobile and online media. In CHI ’07: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 971–980, New York, NY, USA, 2007. ACM. [PDF]

——-

This paper present a throughout study of people using a mobile application for tagging pictures. The author conducted photo elicitation interviews of Flickr users that were active in uploading pictures from their mobile using ZoneTag, an ad-hoc application for mobile picture tagging and photo sharing.

The authors defined eight intertwined reasons of why people use tags. These could be described by the different function, being either organization or communication, or sociality, directed to self or to others.

Interestingly, the author found that while adding context is the traditional motivation for annotating printed photographs, this was not the primary concern of interviewed participants: ”while it is likely that tags will provide this function as a currently-unanticipated benefit in the future, adding context to facilitate remembering details about photographs was not the primary motivation for tagging in the present.

As the picture taken by ZoneTag were automatically uploaded to Flickr, a community portal for photo sharing, other interesting behavior emerge. For instance, some participant reported coordinating tags with other to form a distributed photo ”pool”.

The authors conclude the article by suggesting a number of implications for design. In particular they suggested that the annotation functionality should be pervasive and multi-functional covering the different aspect of their suggested taxonomy. Also, they suggested to have flexibility in the annotation mecanism, leaving the tagging feature as an option and not mandatory. Finally, they suggested to have both mobile and desktop components to leverage the fast entry and the processing power of the desktop application.

200808120944.jpg

Semi-automatic image annotation

L. Wenyin, S. Dumais, Y. Sun, H. Zhang, M. Czerwinski, and B. Field. Semi-automatic image annotation. In In INTERACT2001, 8th IFIP TC.13 Conference on Human-Computer Interaction, pages 326–333. Press, 2001. [PDF]

——

This article describe a technique to incorporate users’ feedback as annotations in a retrieval system for images. The basic argument of the authors is that manual annotation of image might be tedious to the user. Even direct annotation techniques proposed by Sneiderman do not solve the issue. On the other hand automatic annotation of images is not there yet.

Therefore they propose a ‘middle-earth’ approach to the problem asking users of an information retrieval engine to rate the results returned by the system. When results are marked positively, then the system incorporates the query terms as descriptors of the selected images.

To evaluate this architecture, the authors used two metrics: the retrieval accuracy and the annotation coverage. The annotation coverage is the percentage of annotated images in the database. The retrieval accuracy is how often the laveled items are correct. In their experiment, retrieval accuracy is the same as annotation coverage (positive examples are automatically annotated with the query keywords).

The author found that the annotation strategy is more efficient when there are some initial manual annotations. Additionally, they performed a usability test of the system (called MiAlbum) and they found that getting people to discover and use relevance feedback has been difficult. In addition, to improve the discoverability of feedback, the authors argue that we need to improve the participants’ understanding of the matching process.

200808111638.jpg

Does Organisation by Similarity Assist Image Browsing?

K. Rodden, W. Basalaj, D. Sinclair, and K. Wood. Does organisation by similarity assist image browsing? In Proceedings of CHI 2001, Seattle, Wa, USA, March 31-April 4 2001. Association for Computing Machinery. [PDF] [link to author’s site]

——-

The title of this paper nicely resumes the authors’ research question. They were interested in understanding whether organizing pictures by similarity might be beneficial to an picture retrieval task. Defining the similarity of two pictures is also an interesting problem and many researcher tried to provide solutions based on the image features or the multi-modal information which might be associated with them. The research reported in this paper was concerned with understanding whether a certain organization might have been more effective in helping the retrieval process than another one.

The authors used two kinds of organizations: 1) similarity of visual features; 2) similarity of text annotations. For the retrieval experiments they used information retrieval’s vector model, with binary term weighting, and the cosine coefficient measure. Also, they used a simulated work task situation, in which they asked graphic designers to look for sets of pictures to be used to complement articles for a magazine.

They conducted two experiments. The first one in which they tried to understand whether text-based organization was more useful than visual-based organization or a combination of the two. The majority of participants favoured the textual arrangements of pictures. In the second experiment, they compared more quantitatively a similarity arrangements to a random arrangement of pictures. They considered the time required to complete the task as the main dependent variable and analyzed the results with a linear regression model. Participants were slower with the visual arrangement than with the random selection of pictures. In the analysis the authors suggested that the visual arrangement made easy to find the target pictures however placing similar pictures together cause sometimes them to appear to merge, and therefore more difficult to parse.

Rodden_CHI_image_matrices.png

How do people manage their digital photographs?

K. Rodden and K. R. Wood. How do people manage their digital photographs? In Proceedings of CHI 2003, Fort Lauredale, Florida, USA, April 5-10 2003. [PDF] [PDF2]

———–

This paper describes a longitudinal research on how people manage their collections of digital photographs. The authors asked 13 subjects to used a prototipe software, named Shoebox, during 6 months to catalogue their pictures. The main features of the software was that of enabling text and voice tagging of the pictures. They found that after 6 months, these features were not used because participants relied efficiently on their memory and on the temporal sequence of the pictures to retrieve them.

The paper contains also an interesting argument in that text -based queries can be still reasonable effective even if spoken material is inaccurately transcribed (Brown et al., 1996).

The paper contains interesting comparison between digital photos and printed photos. Interestingly the study wa conducted in 2003, when digital pictures were still relatively new. The paper reports qualitative findings of how people used digital collections. Interestingly, participants adopted Shoebox archival feature, which organized pictures into Rolls and by their timestamps. However, participants did not increase the number of annotations they made on their individual pictures. They felt this feature was uninteresting because it was not helping them to increase their retrieval efficiency.

The availability of text-based indexing and reitrieval did not provide their participant extra motivation to invest the effort in annotating their pictures.

Speech-based annotation and retrieval of digital photographs

T. J. Hazen, B. Sherry, and M. Adler. Speech-based annotation and retrieval of digital photographs. In Proceedings of INTERSPEECH 2007, the 8th Annual Conference of the International Speech Communication Association, pages 2165–2168, Antwerp, Belgium, August 27-31 2007. [PDF]

———

This paper presents an application for supporting pictures retrieval on mobile phones using voice annotations. The authors’ basic assumption is that speech is more efficient than text for operating a mobile device and in general more efficient for conveying complex properties.

The core of the application they proposed is a mixed grammar recognition approach which allows the speech recognition system to construct a single finite-state network combining context-free grammars.

The paper present the evaluation of the application with a combination of a field deployment and a lab study where participants were asked to retrieve a set of pictures which were captured by themselves or by other participants. The retrieval was measures as the number of succesfull attempts to retrieve with the first query and whithin 5 queries.

Results indicated that users’ knowledge of the subject matter of the photographs was not playing a role in the retrieval process.

Expressive richness: a comparison of speech and text as media for revision

B. L. Chalfonte, R. S. Fish, and R. E. Kraut. Expressive richness: a comparison of speech and text as media for revision. In CHI ’91: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 21–26, New York, NY, USA, 1991. ACM. [PDF]

——–

This paper presents an experimental comparison of two modalities sharing annotations on a shared document. The authors designed this experiment under the assumption given by previous research that richer, more informal, and more interactive media should be better suited for ahndling collaborative tasks.

The authors designed an experiment for comparing subject reviewing a paper using text and speech annotations. They found that participant were more likely to make local annotations in the written modality (spelling and grammar changes) and more likely to make global annotations in the speech modality (structure, missing a fundamental point, project status, etc.).

They defined a nice metric: the index of self-correction, the number of times an annotation was corrected before the final submission. They found that speech annotations were more likely to be corrected than written annotations. They also found that speech allowed for non-verbal communication that made communication richer.

From the analysis it appeared that speech was superior because it was more expressive, and because it placed fewer cognitive demands on a communicator. However, this study did not focus in revealing the advantages or trade-offs of annotations in different modalities.

A comparison of speech versus typed input

A. G. Hauptmann and A. I. Rudnicky. A comparison of speech versus typed input. In Proceedings of the Third DARPA Speech and Natural Language Workshop, pages 219–224, San Matteo, 1990. Morgan Kaufmann. [pdf]

——-

This paper describes a controlled experiment in which the author compared two input modalities: text and speech. The author designed a task where the subjects had to input a number of numeric strings to the computer. They either used their voice, the keyboard or a combination of the two. They used a number of custom made metrics, like the transaction error rate (the number of transactions that were absolutely necessary divided by the total number of transactions); aggregate cycle time (the total time a subject needed to enter a number correctly).

The utterance accuracy results showed that speech requires many more interactions to complete the task than typing. According to the metrics defined typing reported better results in comparison to speech. The author discussed several reasons of why it was the case.

The author showed how speech compares with typing for the entry of digit string tasks. However, the authors cautioned how real world task, requiring more keystroke per syllable, would demonstrate the effectiveness of speech much better.

The authors concluded that depending on the task, speech can have a tremendous advantages for casual users. The more a task requires visual moniotoring of input the more preferable speech will become as an input medium. For skilled typist these relation might be reversed.