MDParser Web PageMDParser
MDParser stands for multilingual dependency parser and is a data-driven system, which can be used to parse text of an arbitrary language for which training data is available. The parser is able to create both unlabeled and labeled dependency structures.
The number of possible relation types depends on the granularity of the training data.

The models of the system are based on various features, which are extracted from the words of a sentence, including word forms and part of speech tags. Therefore in order to process previously unannotated text MDParser additionally includes some preprocessing components:
• a sentence splitter, since the parser constructs a dependency structure for individual sentences

• a tokenizer, in order to recognise the elements between the dependency relations will be built

• a part of speech tagger, in order to determine the part of speech tags, which are one of the most important influencing factors for constructing the dependency structure.

MDParser is an especially fast system and therefore it is particularly suitable for processing very large amounts of data. Thus it can be used as a part of bigger (Big Data) applications in which dependency structures are desired.

MDParser has already been tested for several languages, including German and English. It is currently able to achieve quite competitive results, considering that it is based on a fast linear classification approach and a deterministic parsing strategy.

MDParser can be run
• as a jar file the command line (java -Xmx1g -jar mdp.jar props.xml)
• from your own java project by adding the jar to the build path and addressing the methods of the de.dfki.lt.mdparser.test.MDParser class (e.g. parseSentence(String text, String language, String inputFormat) or parseText(String text, String language, String inputFormat))
text = text string to be parsed; language = {english,german} inputFormat = {text,conll} (text = string has to be preprocessed (sentence splitting, tokenisation, pos tagging), conll = preprocessing is done, only dependency parsing is necessary

News
17th December 2014: training, testing and evaluation is now supported via specific command mdpTrainTest; see README.txt for more details!

Development
The parser has been developed at LT-lab of DFKI, by Alexander Volokh and Günter Neumann.

Links:
Latest version of the system (December 17, 2014)[zip]
Dissertation[link]
Latest technial report[link]


Latest updates by G. Neumann, 17th December 2014