Improving Vietnamese Dependency Parsing Using Distributed Word Representations
Authors: Cam Vu-Manh, Anh Tuan Luong, Phuong Le-Hong
Abstract: Dependency parsing has become an important line of research in natural language processing in recent years. This is due to its usefulness in a wide variety of real world applications. This paper presents the improvement of Vietnamese dependency parsing using distributed word representations. Our parser achieves an accuracy of 76.29% of unlabelled attachment score or 69.25% of labelled attachment score. This is the most accurate dependency parser for the Vietnamese language in comparison to others which are trained and tested on the same dependency treebank. The distributed word representations are produced by two recent unsupervised learning models, the Skip-gram model and the GloVe model. We also show that distributed representations produced by the GloVe model are better than those produced by the Skip-gram model when being used in dependency parsing. Our dependency parsing system, including software, corpus and distributed word representations, is released as an open source project, freely available for research purpose.
Published: 03 December 2015