Papers by Franck Thollard
Symbolic answers to an eye-tracking problem
ABSTRACT We provide in this article experiments made on the eye-tracking chal- lenge proposed by ... more ABSTRACT We provide in this article experiments made on the eye-tracking chal- lenge proposed by the PASCAL European network. We concentrate here on symbolic approaches mainly based on finite states machine s. Our ex- perimental study opens many questions mentioned as a conclusion.
Int�gration de la structure dans un mod�le probabiliste de documents
F Egc, 2008
Intégration de la structure dans un modèle probabiliste de document
In databases or in the World Wide Web, many documents are in a structured format (e.g. XML). We p... more In databases or in the World Wide Web, many documents are in a structured format (e.g. XML). We propose in this article to extend the classical IR probabilistic model in order to take into account the structure through the weighting of tags. Our approach includes a learning step in which the weight of each tag is computed. This weight estimates the probability that the tag distinguishes the terms which are the most relevant. Our model has been evaluated on a large collection during INEX IR evaluation campaigns.
Integrating Structure in the Probabilistic Model for Information Retrieval
2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, 2008
Most of the available information either in textual databases or on the Internet is strongly stru... more Most of the available information either in textual databases or on the Internet is strongly structured. This is for example the case for scientific articles or for the documents available on Internet when they are written in markup languages (eg HTML or XML). For all these ...
Position Models and Language Modeling
Lecture Notes in Computer Science, 2008
In statistical language modelling the classic model used is n-gram. This model is not able howeve... more In statistical language modelling the classic model used is n-gram. This model is not able however to capture long term dependencies, i.e. dependencies larger than n. An alternative to this model is the probabilistic automaton. Unfortunately, it appears that preliminary experiments on the use of this model in language modelling is not yet competitive, partly because it tries to model too long term dependencies. We propose here to improve the use of this model by restricting the dependency to a more reasonable value. Experiments shows an improvement of 45% reduction in the perplexity obtained on the Wall Street Journal language modeling task.
Intégration de la structure dans un modèle probabiliste de document
Use of grammatical inference in natural speech recognition
We provide in this article experiments made on the eye-tracking challenge proposed by the PASCAL ... more We provide in this article experiments made on the eye-tracking challenge proposed by the PASCAL European network. We concentrate here on symbolic approaches mainly based on finite states machines. Our experimental study opens many questions mentioned as a conclusion.
Probabilistic document model integrating XML structure

ABM25t, une extension de BM25 pour la recherche d'information ciblée
Document numérique, 2010
ABSTRACT Cet article traite de l'intégration des balises XML dans la fonction de pondérat... more ABSTRACT Cet article traite de l'intégration des balises XML dans la fonction de pondération des termes, pour la Recherche d'Information (RI) XML ciblée. Notre modèle permet de considérer un certain type d'information structurelle : les balises qui représentent la structure logique des documents (titre, section, paragraphe, etc.), ainsi que les balises liées à la mise en forme (gras, italique, centré, etc.). Nous prenons en compte l'influence des balises sous forme d'un poids en estimant la probabilité pour une balise de mettre en évidence les termes pertinents. Ensuite, ces poids sont intégrés à la fonction de pondération des termes. Des expérimentations sur une collection de grande taille dans le cadre de la compétition de RI XML, INEX 2008, ont montré une amélioration de la qualité des résultats en RI ciblée.
UJM at INEX 2008: Pre-impacting of Tags Weights
Lecture Notes in Computer Science, 2009
This paper addresses the impact of structure on terms weighting function in the context of focuse... more This paper addresses the impact of structure on terms weighting function in the context of focused Information Retrieval (IR). Our model considers a certain kind of structural information: tags that represent logical structure (title, section, paragraph, etc.) and tags ...
UJM at INEX 2007: Document Model Integrating XML Tags
Lecture Notes in Computer Science, 2008
Different approaches have been used to represent textual documents, based on boolean model, vecto... more Different approaches have been used to represent textual documents, based on boolean model, vector space model or probabilistic models. In text mining as in information retrieval (IR), these models have shown good results about textual documents modeling. They ...
Improving probabilistic grammatical inference core algorithms with post-processing techniques
MACHINE LEARNING-INTERNATIONAL …, 2001
... In par-ticular, for PPTA{I+) the probability estimates are Figure 1 shows PPTA{I+) built from... more ... In par-ticular, for PPTA{I+) the probability estimates are Figure 1 shows PPTA{I+) built from 1+ ={aac, aac, abd, aac, aac, abd,abd,a, ab } Figure 1. a PPTA 0 [(0/9) Ì ' '-'ГТТ^П ' Í -im) Ì м ' ( «« a (4/9) \ c (1/,|) ( 3 (0/4) ) -l^i [ 5 (4/4) j We now present the second tool used by the ...
Artificial data and language theory
Fifth International Colloquium on Grammatical Inference (ICGI-2000), Second Learning Language and Logic Workshop (LLL-2000), Fourth Conference on Computational Natural Language Learning (CoNLL-2000), Lisbon, Portugal, 11-14 September 2000
Proceedings of the Seventeenth International Conference on Machine Learning, Jun 29, 2000
Identification in the Limit with Probability One of Stochastic Deterministic Finite Automata
Lecture Notes in Computer Science, 2000
The current formal proof that stochastic deterministic finite automata can be identified in the l... more The current formal proof that stochastic deterministic finite automata can be identified in the limit with probability one makes use of a simplified state-merging algorithm. We prove in this paper that the Alergia algorithm, and its extensions, which may use some blue fringe type of ordering, can also identify distributions generated by stochastic deterministic finite automata. We also give a
The importance of smoothing in learning deterministic stochastic finite automata
Inférence grammaticale probabiliste utilisant la divergence de Kullback-Leibler et un principe de minimalité
Uploads
Papers by Franck Thollard