Papers by Svetoslav Marinov
Language and Technology Conference, 2012
Structural Similarities in MT A Bulgarian-Polish case
This paper shows that although it seems relatively easy to translate between closely related lang... more This paper shows that although it seems relatively easy to translate between closely related languages, not every framework manages to capture important details in the argument structure. By combining methods tested for translation between Swedish and Norwegian and assuming a compact theory of argument structure, I think that we can achieve better results in an MT system that deals with Slavic Languages (exemplified here by Bulgarian and Polish).
Automatic Extraction of Subcategorization Frames for Bulgarian
Knowledge of verb’s valency or subcategorization is essential for many NLP tasks. The present pap... more Knowledge of verb’s valency or subcategorization is essential for many NLP tasks. The present paper describes an attempt to learn this kind of information from a corpus of parsed sentences of Bulgarian. Our program acquired the subcategorization information for 38 verbs and achieved 87.7% precision and 68.3% recall. We did not use predefined sets of frames but automatically induced such from a treebank.
In this paper we propose a new Coreference Resolution system for Swedish, based on supervised mac... more In this paper we propose a new Coreference Resolution system for Swedish, based on supervised machine learning methods trained on the SUC-core dataset. Our method improves on state-of-the-art results for the data, achieving an average F1-score of 50.9 using the standard CoNLL 2012 metrics.
Three versions of the Covington algorithm for non-projective dependency parsing have been tested ... more Three versions of the Covington algorithm for non-projective dependency parsing have been tested on the ten different languages for the Multilingual track of the CoNLLX Shared Task. The results were achieved by using only information about heads and daughters as features to guide the parser which obeys strict incrementality.
A data-driven parser for Bulgarian
The BulTreeBank: Parsing and conversion
Current Issues in Linguistic Theory, 2009

Intent-aware temporal query modeling for keyword suggestion
Proceedings of the 5th Ph.D. workshop on Information and knowledge, 2012
ABSTRACT This paper presents a data-driven approach for capturing the temporal variations in user... more ABSTRACT This paper presents a data-driven approach for capturing the temporal variations in user search behaviour by modeling the dynamic query relationships using query-log data. The dependence between different queries (in terms of the query words and latent user intent) is represented using hypergraphs which allows us to explore more complex relationships compared to graph-based approaches. This time-varying dependence is modeled using the framework of probabilistic graphical models. The inferred interactions are used for query keyword suggestion - a key task in web information retrieval. Preliminary experiments using query logs collected from internal search engine of a large health care organization yield promising results. In particular, our model is able to capture temporal variations between queries relationships that reflect known trends in disease occurrence. Further, hypergraph-based modeling captures relationships significantly better compared to graph-based approaches.
Text Dependent and Text Independent Speaker Verification System: Technology and Application
Overview article, 2003
Proc. of the 4th Workshop on Treebanks and Linguistic Theories (TLT), 2005
One of the main motivations for building treebanks is that they facilitate the development of syn... more One of the main motivations for building treebanks is that they facilitate the development of syntactic parsers, by providing realistic data for evaluation as well as inductive learning. In this paper we present what we believe to be the first robust data-driven parser for Bulgarian, trained and evaluated on data from BulTreeBank (Simov et al., 2002). The parser uses dependency-based representations and employs a deterministic algorithm to construct dependency structures in a single pass over the input string, guided by a memory-based ...
Proceedings of the Tenth Conference on Computational Natural Language Learning - CoNLL-X '06, 2006
We use SVM classifiers to predict the next action of a deterministic parser that builds labeled p... more We use SVM classifiers to predict the next action of a deterministic parser that builds labeled projective dependency graphs in an incremental fashion. Non-projective dependencies are captured indirectly by projectivizing the training data for the classifiers and applying an inverse transformation to the output of the parser. We present evaluation results and an error analysis focusing on Swedish and Turkish.
Natural Language Engineering, 2005
Parsing unrestricted text is useful for many language technology applications but requires parsin... more Parsing unrestricted text is useful for many language technology applications but requires parsing methods that are both robust and efficient. MaltParser is a language-independent system for data-driven dependency parsing that can be used to induce a parser for a new language from a treebank sample in a simple yet flexible manner. Experimental evaluation confirms that MaltParser can achieve robust, efficient and accurate parsing for a wide range of languages without language-specific enhancements and with rather limited amounts of training data.
Uploads
Papers by Svetoslav Marinov