As I am proceeding with my thesis work, I came to develop an information retrieval system for virtual post-its attached to a shared map. One of the idea I am currently exploring consists in building a procedure that puts together a keyword by keyword matrix, having them extracted from the messages left in the system.
One idea is to preserve the keyword order as extracted by our part of speech tagger, to form kind of n-grams. We plan to use this sequences to convoy more or less “energy” in the retrieval process.
Now, with Pierre, we discussed a bit this idea and he raised the concern that is difficult to defend the fact that word order is predictive of semantical structure and in turn to meaning.
If somebody in the audience has a pointer to this matter, please comment on.
(and yes, thanks in advance 🙂 )
Tags: clustering, context, Latent Semantic Analysis, text data mining