Papers by Jordan Boyd Graber
A lot of data are relational; they express relationships between pairs of entities (people, place... more A lot of data are relational; they express relationships between pairs of entities (people, places, genes, businesses). Despite the inherently relational nature of data of interest, they are often not expressed relationally but rather as free text. To uncover this latent network structure from text, we posit a topic model that discovers common ways that relationships between entities are expressed. Using these automatically identified relationships, we are able to reconstruct social networks from free text.
We develop latent Dirichlet allocation with WORDNET (LDAWN), an unsupervised probabilistic topic ... more We develop latent Dirichlet allocation with WORDNET (LDAWN), an unsupervised probabilistic topic model that includes word sense as a hidden variable. We develop a probabilistic posterior inference algorithm for simultaneously disambiguating a corpus and learning the domains in which to consider each word. Using the WORDNET hierarchy, we embed the construction of in the topic model and show that automatically learned domains improve WSD accuracy compared to alternative contexts.
We extend on McCarthy et al.'s predominant sense method to create an unsupervised method of word ... more We extend on McCarthy et al.'s predominant sense method to create an unsupervised method of word sense disambiguation that uses automatically derived topics using Latent Dirichlet allocation. Using topicspecific synset similarity measures, we create predictions for each word in each document using only word frequency information. It is hoped that this procedure can improve upon the method for larger numbers of topics by providing more relevant training corpora for the individual topics. This method is evaluated on SemEval-2007 Task 1 and Task 17.

In this paper, we describe the design and preliminary evaluation of a hybrid desktop-handheld sys... more In this paper, we describe the design and preliminary evaluation of a hybrid desktop-handheld system developed to support individuals with aphasia, a disorder which impairs the ability to speak, read, write, or understand language. The system allows its users to develop speech communication through images and sound on a desktop computer and download this speech to a mobile device that can then support communication outside the home. Using a desktop computer for input addresses some of this population's difficulties interacting with handheld devices, while the mobile device addresses stigma and portability issues. A modified participatory design approach was used in which proxies, that is, speech-language pathologists who work with aphasic individuals, assumed the role normally filled by users. This was done because of the difficulties in communicating with the target population and the high variability in aphasic disorders. In addition, the paper presents a case study of the proxy-use participatory design process that illustrates how different interview techniques resulted in different user feedback.
We present an overview of the issues and questions we confronted while designing a desktop-PDA sy... more We present an overview of the issues and questions we confronted while designing a desktop-PDA system for people with aphasia through the use of proxies.
WORDNET, a ubiquitous tool for natural language processing, suffers from sparsity of connections ... more WORDNET, a ubiquitous tool for natural language processing, suffers from sparsity of connections between its component concepts (synsets). Through the use of human annotators, a subset of the connections between 1000 hand-chosen synsets was assigned a value of "evocation" representing how much the first concept brings to mind the second. These data, along with existing similarity measures, constitute the basis of a method for predicting evocation between previously unrated pairs.
The central character of modern bioinformatics is the analysis of data to find patterns that repr... more The central character of modern bioinformatics is the analysis of data to find patterns that represent relevant biological information. The success, however, of this field may one day lead to serious problems as too much data inundates the methods and resources traditionally used. A world where every individual has his or her genome mapped and stored in a central medical database might lead to more information than even future computers following the trajectory of Moore's law could handle.
Uploads
Papers by Jordan Boyd Graber