Spatial navigation of an information space

May 3, 2006Mauro Cherubini Leave a comment

Yesterday I had one of the most controversial meeting with my PhD advisor. The thing is that we have slight different ideas of what constitute the core of the thesis and how to answer the research questions. So it happens, time to time, that we have to step back and rethink together the whole process. So we did.

One of his points is that the current VIR experiment cannot be used in full to sustain some of the central claims of my thesis work. It is a bit of a side experiment. Now, my point was that most of the analysis methodologies that we are taking in place for the VIR will be reused in the rest of the thesis.

For instance we found ourselves in the need of defining whether the navigation of the information space by the users can be considered as “spatial”. In other words: does the user adopt a spatial strategy to explore the points rather than just clicking randomly? We decided to answer affirmatively to this question only if the distance between the clicks that the user does on the map is a function of the pertinence of the articles selected.

Over the last couple of days we discussed a lot on how to measure the fact that the user explores the cluster where s/he finds relevant articles. For instance: does the user try to explore the closest documents to the one selected during the experiment?

To define what is “close” during the exploration we tested different definitions with different results. At first, for instance, we decided to use the average distance of the interaction sequence. This follows the hypothesis that the user will shorten the jumps among the documents whenever s/he finds a relevant article. However this simple calculation may be biased by the fact that some relevant documents are isolated in the lower part of the map.

Another possibility was that of using the notion of density of closest points. Or, using the Minimum Spanning Tree of the documents. This, however resulted in a very restrictive consideration for “close jumps”. Then we thought about using other graph strategies to say which documents are close. One of this was the usage of the b-skeletons. Another was the usage of the k-nearest agglomerative clustering to define documents nearby. We thought to analyze this matrices with the Exploratory Sequential Data Analysis technique.

Finally Pierre came out with the idea of using a granular definition of proximity instead of the binary selection I was trying to implement. This follows the idea that two documents can be separated by a great geographical distance but a small number of other documents, which make the jump comparable to a smaller geographical distance with other documents nearby.

Tags: beta-skeletons, clustering, data mining, information metric, information retrieval, map algorithms, maps, spatial clustering, text data mining

Sequence matrices of the visual information retrieval experiment

April 28, 2006Mauro Cherubini Leave a comment

I completed the script that performs the sequences extraction from the patterns of usage in the visual information retrieval experiment. The first matrix is the cumulative representation of the LSI method across the different tasks. The second matrix is, on the contrary, CNG. To understand the meaning of the cells the following key is needed: the columns are the events at time T+1, while the rows at time T. The first element is the case A, then B then C and D. These states are so summarised:

A: The user read a document and then select a document in the relative neighborhood of the first document;

B: The user read a document and then moves away from the cluster of the first document;

C: The user selects and stays in the cluster;

D: The user selects and moves away from the cluster.

[19, 132, 0 , 0]

[83, 588, 10, 90]

[4 , 49 , 0 , 0]

[14, 104, 0 , 2]

[5 , 88 , 0 , 1]

[48, 537, 17, 98]

[2 , 49 , 0 , 0]

[6 , 109, 1 , 2]

Next step is to verify whether these frequencies are different from the expected. On a first sight it will seem that the most frequent move is the ‘read + long jump’. Now we need to check whether this difference is significative.

P.S. To verify the move within the cluster we used the sequence determined by the Minimal Spanning Tree, which is extremely selective in defining the “in-cluster” movements.

More sense from audit trials: eploratory sequential data analysis

April 27, 2006Mauro Cherubini Leave a comment

T. Judd and G. E. Kennedy. More sense from audit trials: eploratory sequential data analysis. In R. Atkinsons, C. McBeath, D. Jonas-Dwyer, and R. Phillips, editors, Beyond the comfort zone: Proceedings of the 21st ASCILITE Conference, pages 476–484, Perth, Australia, December, 5-8 2004. [pdf]

——————

This paper introduces the use of exploratory sequential data analysis (ESDA) to detect, quantify and correlate patterns within audit trail data. We describe four sequence analysis techniques and use them to analyse data from 34 students’ attempts at an interactive drag and drop task. Using a model sequence of events based on the task’s underlying educational design as reference, we employed these techniques to: (i) calculate an ‘average’ sequence of events based on individual user sequences, (ii) characterise individual sequences in terms of their similarity to the design model, (iii) identify common partial sequences within individual sequences, and (iv).

characterise transitions between two disparate actions within the task. We then used the results of these analyses to explore why most students failed to complete all components of the task.

We suggest that it was not because the task was too long or that it lacked challenge but that students intentionally and selectively ignored certain non-key steps in the task. It is our contention that ESDA techniques, in conjunction with judiciously collected audit trail data, represent a powerful and compelling tool for educational designers and researchers.

Tags: data mining, information metric, information retrieval, user experience

Found some effects in the information retrieval expreriment

April 26, 2006Mauro Cherubini Leave a comment

Today I spent some more time exploring the Visual Information Retrieval experiment. Apparently, the users scored better using CNG for the second task (see the picture below). This task was much more complex than the first one because of the smaller number of correct results in the dataset (5 out of 1000). Surprisingly I found also another effect, the time required to select the relevant results. It seems that the users spent less time selecting the relevant results in the first task using CNG.

Discussing this results with my advisor, we agreed that the results I have are interesting but not sufficient for explaining the effects and to decide on which method gives better results.

Some other explorations are needed: I have to use the Wilcoxon test for repeated measure for binding the results of the first task with that of the second. Secondarily it will be interesting to compute the average distance between consequent read items for each algorithm. This might be a nice parameter to represent the ability of the algorithm to group together relevant or irrelevant results.

Google Maps now covers all europe: Yipeeh!

April 25, 2006Mauro Cherubini Leave a comment

Finally the technological choice I operated some time ago now start to make fully sense. Now it is possible to use STAMPS everywhere in Europe with a decent degree of details. In the picture below, Neuchatel, in Switzerland. So, please join the first trial: shoutspace [at] gmail.com !

Tags: new technology

Further analysis of the Visual Information Retrieval experiment

April 25, 2006Mauro Cherubini Leave a comment

Ok, so far the analysis on the map interaction shows that there is no huge difference in the way the users interact with the map. LSI seems to spread the results all over the available space. On the contrary, CNG returns rankings which are much wider than LSI, with the resulting effect of pushing the points of the map on the top part of the quadrant.

The interesting fact which emerges is that CNG offers the relevant points much closer to the query origin, in the central lower part of the map. The subsequent analysis of the items selected shows exactly this point: while LSI mixes the most relevant items in the jungle of the other results, CNG keeps them much closer to the query origin in a space that is less populated by other results (see the map below).

So, where to go next? Two ideas so far: (a) Mapping the final selected points to see whether they are more dispersed with CNG or LSI; (b) Trace the interaction trails to assess if there are common patterns of exploration of the points in the map.

Apart from this extra analysis on the map then I am more concerned with the numerical results from the experiment. Our goal is always to check whether there is a difference between CNG and LSI in the user experience. Is there any outscore of one of the two methods concerning: (1) relevant results selected; (2) time needed to select the relevant results; (3) user perception of appropriateness; (4) number of query composed; (5) average time spent for each query; …

Tags: information metric, information retrieval, information visualization, interaction design, map algorithms, maps, data mining

heatmap of the visual information retrieval experiment – part 3

April 24, 2006Mauro Cherubini Leave a comment

Finally I corrected a couple of bugs in the script that I was using for the visualization. Then I spent a considerable amount of time to try to overlay the obtained heatmap to the original results map I obtained before. PIL is a great resource in this sense. I found a couple of hacks that were pointing in that direction [1], [2]. Unfortunately, they did not work for my case.

Finally I decided to give a try to the embedded function called blend. It worked well for my situation.

So what I am doing in the script is to:

1. divide the map into a certain number of cells;

2. count the frequency with which the items pile in each cell;

3. export the obtained matrix in csv (comma separated values);

4. open this file with R via RPy in the script;

5. process this file so to obtain the wanted heatmap;

6. save the heatmap in an external file;

7. process the heatmap with PIL so to make it right for the other images;

8. blend the map over the other existing images.

The part of the code that do the “interesting” part of this is reported below. The results is showed in the pictures below.

Tags: clustering, map algorithms, maps

Continue reading →

Supporting Language Learning Communities Using Mobile Blogs

April 24, 2006Mauro Cherubini Leave a comment

S. A. Petersen, G. Chabert, and M. Divitini. Supporting language learning communities using mobile blogs. In Proceedings of IADIS Mobile Learning’06, Dublin, Ireland, 14-16 July 2006.

———–

Communities are important for language learners to learn and practice the language. A classroom provides a sense of community for a group of students learning a language. We aim to extend the learning arena outside of the classroom and maintain the sense of community while the students are mobile. We propose the use of a mobile community blog to encourage collaboration among the students and to bridge the disconnection that is caused when some of the students travel abroad to improve their language. We discuss considerations that have been taking into account in adapting standard blog functionalities to support a community of mobile language learners.

Tags: mobile learning, new technology, virtual online communities

heatmap of visual information retrieval experiment

April 21, 2006Mauro Cherubini Leave a comment

After a couple of hacks to make my python installation work with R, I finally managed to export an heatmap of the cereals task. The map is upside down, but shows in essence that the most populated part of the map is the upper-central.

The plan is to do the same for the number of items selected in relation to the total in the spot. This might show a different trend of the usage. Maybe parts of the map that have been “explored” less were the most important in terms of the number of correct items retrieved in that spot. We will see…

Useful links:

[1] A nice tutorial on how to draw an heatmap with Python and R;

[2] Draw a Heat Map, the R manual page.

Tags: clustering, information visualization, map algorithms, maps

STAMPS in the press

April 20, 2006Mauro Cherubini Leave a comment

The journal “La Tribune de Genève” published today an article on STAMPS and the field trial we are organising. The article spans on the theme of location based services and why is so interesting to know where we are. The journalist, Emmanuel Grandjean, reports a few quotes of mine that I made during the interview. One of the main claims I raised is that communication is more and more fluid and LBSs add a new dimension to this flexibility.

Also, that communication tend to be maximally efficient. So, knowing where our partner is located can already prevent lots of different inferences on what is possible to communicate and how with our partner being in that location, and with us being in our location.

Tags: field tools, location awareness, Location Based Services, mobile learning, Symbian

Mauro Cherubini

Professor at the University of Lausanne, Switzerland

Author: Mauro Cherubini