Yesterday I had one of the most controversial meeting with my PhD advisor. The thing is that we have slight different ideas of what constitute the core of the thesis and how to answer the research questions. So it happens, time to time, that we have to step back and rethink together the whole process. So we did.
One of his points is that the current VIR experiment cannot be used in full to sustain some of the central claims of my thesis work. It is a bit of a side experiment. Now, my point was that most of the analysis methodologies that we are taking in place for the VIR will be reused in the rest of the thesis.
For instance we found ourselves in the need of defining whether the navigation of the information space by the users can be considered as “spatial”. In other words: does the user adopt a spatial strategy to explore the points rather than just clicking randomly? We decided to answer affirmatively to this question only if the distance between the clicks that the user does on the map is a function of the pertinence of the articles selected.
Over the last couple of days we discussed a lot on how to measure the fact that the user explores the cluster where s/he finds relevant articles. For instance: does the user try to explore the closest documents to the one selected during the experiment?
To define what is “close” during the exploration we tested different definitions with different results. At first, for instance, we decided to use the average distance of the interaction sequence. This follows the hypothesis that the user will shorten the jumps among the documents whenever s/he finds a relevant article. However this simple calculation may be biased by the fact that some relevant documents are isolated in the lower part of the map.
Another possibility was that of using the notion of density of closest points. Or, using the Minimum Spanning Tree of the documents. This, however resulted in a very restrictive consideration for “close jumps”. Then we thought about using other graph strategies to say which documents are close. One of this was the usage of the b-skeletons. Another was the usage of the k-nearest agglomerative clustering to define documents nearby. We thought to analyze this matrices with the Exploratory Sequential Data Analysis technique.
Finally Pierre came out with the idea of using a granular definition of proximity instead of the binary selection I was trying to implement. This follows the idea that two documents can be separated by a great geographical distance but a small number of other documents, which make the jump comparable to a smaller geographical distance with other documents nearby.
Tags: beta-skeletons, clustering, data mining, information metric, information retrieval, map algorithms, maps, spatial clustering, text data mining