Anjewierden, A., Kollöffel, B., and Hulshof, C. (2007). Towards educational data mining: Using data mining methods for automated chat analysis to understand and support inquiry learning processes. In Proceedings of International Workshop on Applying Data Mining in e-Learning (ADML 2007) as part of the 2nd European Conference on Technology Enhanced Learning (EC-TEL 2007), Crete, Greece. [pdf]
——-
The premise of this paper is that learners cannot be expected to oversee the whole of their communication and also that chat communication tend to be less structured than face-to-face communication (Stromso et al., 2007). Therefore they aim to build a real-time feedback system that can regulate the collaborative interactions.
This workshop paper presents a nice appriach to use a part-of-speech tagger and a bayesian classifier to categorize chat messages into 4 functional categories: regulatory, domain specific, social and technical messages. The authors used manual coders to assign each message to a category. Then they used this corpus to train the bayesian classifier, showing high accuracy results.