Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2016, International Conference on Computational Linguistics
…
7 pages
1 file
Twitter named entity recognition is the process of identifying proper names and classifying them into some predefined labels/categories. The paper introduces a Twitter named entity system using a supervised machine learning approach, namely Conditional Random Fields. A large set of different features was developed and the system was trained using these. The Twitter named entity task can be divided into two parts: i) Named entity extraction from tweets and ii) Twitter name classification into ten different types. For Twitter named entity recognition on unseen test data, our system obtained the second highest F 1 score in the shared task: 63.22%. The system performance on the classification task was worse, with an F 1 measure of 40.06% on unseen test data, which was the fourth best of the ten systems participating in the shared task.
2015
Twitter has allowed millions of users to share and spread most up-to-date information which results into large volume of data generated every day. Due to extremely useful business information obtained from these tweets, it is necessary to understand tweets language for downstream applications, such as Named Entity Recognition (NER). Real time applications like Traffic detection system, Early crisis detection and response with target twitter stream required good NER system, which automatically find emerging named entities that are potentially linked to the crisis and traffic, but tweets are infamous for their error-prone and short nature. This leads to failure of much conventional NER techniques, which heavily depend on local linguistic features, such as capitalization, POS tags of previous words etc. Recently segment-based tweet representation has showed effectiveness in NER.The goal of this survey is to provide a comprehensive review of NER system over twitter data and different NE...
Natural Language processing (NLP) in its pure sense, is a platform that provides the ability for transforming natural language text to useful information. Named Entity Recognition (NER) is a key task in NLP for classification of named entities in natural languages. Though, there are several algorithms for named entity classification, identifying named entities in twitter data is a demanding task. Loads of information are being shared by people in twitter on a daily basis. This information is unstructured and often contains important information about organizations, politics, disasters, promotional advertisements etc. In this paper, we provide a NER that can effectively classify named entities in twitter data for Indian Languages such as English, Hindi and Tamil. POS, Chunk, Suffix, Prefix information has been used for training in Conditional Random Fields (CRF) based NER Model. CRF is a popular model for labeling and classification in text mining. Performance analysis was done using n-fold validation and F-measure. A maximum precision of 93.82 for English, 92.28 for Hindi and 86.94 for Tamil twitter data was achieved through N fold validation. Results provided by ESM-IL share task in terms of precision for English is 50.48, for Hindi is 81.49 and for Tamil 70.42. The proposed algorithm has a higher classification accuracy and it is achieved through n-fold validation.
2014
Entries in microblogging sites are very short. For example, a 'tweet' (a post or status update on the popular microblogging site Twit- ter) can contain at most 140 characters. To comply with this restric- tion, users frequently use abbreviations to express their thoughts, thus producing sentences that are often poorly structured or ungrammatical. As a result, it becomes a challenge to come up with methods for au- tomatically identifying named entities (names of persons, organizations, locations etc.). In this study, we use a four-step approach to automatic named entity recognition from microposts. First, we do some preprocess- ing of the micropost (e.g. replace abbreviations with actual words). Then we use an off-the-shelf part-of-speech tagger to tag the nouns. Next, we use the Google Search API to retrieve sentences containing the tagged nouns. Finally, we run a standard Named Entity Recognizer (NER) on the retrieved sentences. The tagged nouns are returned along with the ...
Twitter has involved numbers of users to share and distribute current information, resulting in a huge amount of data produced per day. No. of private and public organizations have been reported to create and control targeted Twitter streams to gather and know users opinions about the organizations. However the complexity and hybrid nature of the tweets are always challenging for the Information retrieval and natural language processing. Targeted Twitter stream is normally constructed by filtering and rending tweets with certain criteria with the help proposed framework. By splitting the tweet into no. of parts Targeted tweet is then analyzed to know users opinions about the organizations. There is a promising need for early rending and categorize such tweet, and then it get preserved in two format and used for downstream application. The proposed architecture shows that, by dividing the tweet into number of parts the standard phrases are divided and filtered so the topic of this tweet can be good captured in the sub sequent processing of this tweet Our proposed system on large scale real tweets demonstrate the efficiency and effectiveness of our framework.
Twitter has become one of the most important communication channels with its ability providing the most up-to-date and newsworthy information. Considering wide use of twitter as the source of information, reaching an interesting tweet for user among a bunch of tweets is challenging. A huge amount of tweets sent per day by hundred millions of users, information overload is inevitable. For extracting information in large volume of tweets, Named Entity Recognition (NER), methods on formal texts. However, many applications in Information Retrieval (IR) and Natural Language Processing (NLP) suffer severely from the noisy and short nature of tweets. In this paper, we propose a novel framework for tweet segmentation in a batch mode, called HybridSeg by splitting tweets into meaningful segments, the semantic or context information is well preserved and easily extracted by the downstream applications. HybridSeg finds the optimal segmentation of a tweet by maximizing the sum of the stickiness scores of its candidate segments. The stickiness score considers the probability of a segment being a phrase in English (i.e., global context) and the probability of a segment being a phrase within the batch of tweets (i.e., local context). For the latter, we propose and evaluate two models to derive local context by considering the linguistic features and term-dependency in a batch of tweets, respectively. HybridSeg is also designed to iteratively learn from confident segments as pseudo feedback. As an application, we show that high accuracy is achieved in named entity recognition by applying segment-based part-of-speech (POS) tagging.
Twitter has attracted millions of users to share and disseminate most up-to-date information, resulting in large volumes of data produced everyday. However, many applications in Information Retrieval (IR) and Natural Language Processing (NLP) suffer severely from the noisy and short nature of tweets. In this paper, we propose a novel framework for tweet segmentation in a batch mode, called HybridSeg. By splitting tweets into meaningful segments, the semantic or context information is well preserved and easily extracted by the downstream applications. HybridSeg finds the optimal segmentation of a tweet by maximizing the sum of the stickiness scores of its candidate segments. The stickiness score considers the probability of a segment being a phrase in English (i.e., global context) and the probability of a segment being a phrase within the batch of tweets (i.e., local context). For the latter, we propose and evaluate two models to derive local context by considering the linguistic features and term-dependency in a batch of tweets, respectively. HybridSeg is also designed to iteratively learn from confident segments as pseudo feedback. Experiments on two tweet data sets show that tweet segmentation quality is significantly improved by learning both global and local contexts compared with using global context alone. Through analysis and comparison, we show that local linguistic features are more reliable for learning local context compared with term-dependency. As an application, we show that high accuracy is achieved in named entity recognition by applying segment-based part-of-speech (POS) tagging.
Applying natural language processing for mining and intelligent information access to tweets (a form of microblog) is a challenging, emerging research area. Unlike carefully authored news text and other longer content, tweets pose a number of new challenges, due to their short, noisy, context-dependent, and dynamic nature. Information extraction from tweets is typically performed in a pipeline, comprising consecutive stages of language identification, tokenisation, part-of-speech tagging, named entity recognition and entity disambiguation (e.g. with respect to DBpedia). In this work, we describe a new Twitter entity disambiguation dataset, and conduct an empirical analysis of named entity recognition and disambiguation, investigating how robust a number of state-of-the-art systems are on such noisy texts, what the main sources of error are, and which problems should be further investigated to improve the state of the art.
International Journal for Research in Applied Science & Engineering Technology (IJRASET), 2022
In recent years, a lot of research is being carried out in the field of Named Entity Recognition of Social Media data. It is now easier for anyone to convey their opinions and information without any kind of authentication, rumours have multiplied as social media is now one of the commonly used media in the entire world. The fields in which Named Entity Recognition is used for analysis of data provided from various social media sites, include rumour detection, controversy detection, sentiment analysis, medical field (such as visualisation of the spread of COVID-19 or for various kinds of dietary concerns), topic detection and event detection. The purpose here is to present a summary of the present state of social media research and the impact created by information extracted from it. Through this paper, different methods proposed for the purpose of social media data extraction using Named Entity Recognition, have been studied in detail and a comparison has been provided for the same. Most of these papers use the most common metrics for evaluation of their performance, which includes precision, recall and accuracy. The proposed models have been tested on certain datasets extracted from social media networking sites such as twitter, facebook, etc. and their evaluated performance has been compared to the models proposed by several other similar approaches.
In this paper, we describe our approach for Named Entity rEcognition and Linking Challenge (NEEL) at the #Micro-posts2016. The task is to automatically recognize entities and their types from English microposts, and link them to corresponding DBpedia 2015 entries. If the resources do not exist, we use NIL identifiers instead. The task is unique as twitter data is informal in nature with non-conformational spellings, random contractions and various other noises. For this task, we developed our system using a hybrid model. We have used various existing named entity recognition (NER) systems and combined them with our classifier to improve the results.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Current Trends in Web Engineering, 2018
Proceedings of the 3rd Workshop on Noisy User-generated Text, 2017
Lecture Notes in Computer Science, 2017
International Journal of Intelligent Information Technologies
Proceedings of the Seventh Named Entities Workshop, 2018
Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching
ArXiv, 2016