Combining Local and Global Word Embeddings for Microblog Stemming

Abstract

Stemming is a vital step employed to improve retrieval performance through efficient unification of morphological variants of a word. We propose an unsupervised, context-specific stemming algorithm for microblogs, based on both local and global word embeddings, which is capable of handling the informal, noisy vocabulary of microblogs. Experiments on two standard microblog data collections (TREC 2016 and FIRE 2016) show that, the proposed stemmer enables significantly better retrieval performance than several state-of-the-art stemming algorithms, for the same queries.

Publication
Proceedings of the 2017 ACM on Conference on Information and Knowledge Management
comments powered by Disqus