Tweet Segmentation and Named Entity Recognition

IJSART JOURNAL

Tweet Segmentation and Named Entity Recognition

IJSART JOURNAL

visibility

…

description

3 pages

link

1 file

Twitter has involved lots of users to share and distribute most recent information, resulting in a large sizes of data produced every day. However, a variety of application in Natural Language Processing and Information Retrieval (IR) suffer harshly from the noisy and short character of tweets. Here, we suggest a framework for tweet segmentation in a batch mode, called HybridSeg. By dividing tweets into meaningful segments, the semantic or background information is well preserved and without difficulty retrieve by the downstream application. HybridSeg finds the best segmentation of a tweet by maximizing the addition of the adhesiveness scores of its applicant segments. The stickiness score considering the probability of a segment being a express in English (i.e, global context and local context). latter, we propose and evaluate two models to derive with local context by involving the linguistic structures and term-dependency in a batch of tweets, respectively. Experiments on two tweet data sets illustrate that tweet segmentation value is significantly increased by learning both global and local contexts compared by global context only. Through analysis and assessment, we show that local linguistic structures are extra reliable for understanding local context compare with term-dependency.

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Figures (1)

Tweets are sent for information communication and sharing. The named entities and semantic phrase is wel conserved in tweets. The global context taken from W eb pages or Wikipedia helps to recognizing the meaningful segments in tweets. The method realizing the planned framework that solely relies on global context is represented by HybridSegWeb. Tweets are highly time-sensitive lots of emerging phrases such as “he Dancin” cannot be got in external knowledge bases. Though, considering a large number of tweets published within a short time period (e.g., a day) having the phrase, “he Dancin” is easy to identify the segment and valid. We therefore investigate two local contexts, specifically local collocation and local linguistic features .The well conserved linguistic features in these tweets assist named entity recognition with more accuracy. Each named entity is a valid segment. The method utilizing local linguistic features is represented by HybridSegNER. System architecture components 3.5, Summarization 3.1. User Module

Christopher Fang

Twitter has attracted millions of users to share and disseminate most up-to-date information, resulting in large volumes of data produced everyday. However, many applications in Information Retrieval (IR) and Natural Language Processing (NLP) suffer severely from the noisy and short nature of tweets. In this paper, we propose a novel framework for tweet segmentation in a batch mode, called HybridSeg. By splitting tweets into meaningful segments, the semantic or context information is well preserved and easily extracted by the downstream applications. HybridSeg finds the optimal segmentation of a tweet by maximizing the sum of the stickiness scores of its candidate segments. The stickiness score considers the probability of a segment being a phrase in English (i.e., global context) and the probability of a segment being a phrase within the batch of tweets (i.e., local context). For the latter, we propose and evaluate two models to derive local context by considering the linguistic features and term-dependency in a batch of tweets, respectively. HybridSeg is also designed to iteratively learn from confident segments as pseudo feedback. Experiments on two tweet data sets show that tweet segmentation quality is significantly improved by learning both global and local contexts compared with using global context alone. Through analysis and comparison, we show that local linguistic features are more reliable for learning local context compared with term-dependency. As an application, we show that high accuracy is achieved in named entity recognition by applying segment-based part-of-speech (POS) tagging.

Log In

Tweet Segmentation and Named Entity Recognition

Sign up for access to the world's latest research

Figures (1)

Related papers

Related papers

Related topics