Algorithmic stemmers continue to have great utility in IR, despite the promise of out-performance by dictionary-based stemmers. Nevertheless, there are few algorithmic descriptions of stemmers, and even when they exist they are liable to misinterpretation. Snowball, is a language defined by Porter, in which stemmers can be exactly defined, and from which fast stemmer programs in ANSI C or Java can be generated. A range of stemmers is presented in parallel algorithmic and Snowball form, including the original Porter stemmer for English.
PyStemmer provides stemmer functionality in Python for English, German, Norwegian, Italian, Dutch, Portuguese, French, Swedish. PyStemmer is based on the Snowball stemmer.
Tags: information retrieval, natural language processing, text data mining