{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,6]],"date-time":"2026-03-06T05:09:36Z","timestamp":1772773776513,"version":"3.50.1"},"reference-count":38,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2018,7,21]],"date-time":"2018-07-21T00:00:00Z","timestamp":1532131200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100006041","name":"Innovate UK","doi-asserted-by":"crossref","award":["103652"],"award-info":[{"award-number":["103652"]}],"id":[{"id":"10.13039\/501100006041","id-type":"DOI","asserted-by":"crossref"}]},{"name":"the National Key Research and Development Program of China","award":["2016YFC1306704"],"award-info":[{"award-number":["2016YFC1306704"]}]},{"name":"the Jiangsu Natural Science Funds","award":["BK20161430"],"award-info":[{"award-number":["BK20161430"]}]},{"DOI":"10.13039\/501100001809","name":"the National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61772132, 61528302, 61573104"],"award-info":[{"award-number":["61772132, 61528302, 61573104"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2018,12,31]]},"abstract":"<jats:p>Weakly supervised part-of-speech (POS) tagging is to learn to predict the POS tag for a given word in context by making use of partial annotated data instead of the fully tagged corpora. Weakly supervised POS tagging would benefit various natural language processing applications in such languages where tagged corpora are mostly unavailable.<\/jats:p>\n          <jats:p>\n            In this article, we propose a novel framework for weakly supervised POS tagging based on a dictionary of words with their possible POS tags. In the constrained error-correcting output codes (ECOC)-based approach, a unique\n            <jats:italic>L<\/jats:italic>\n            -bit vector is assigned to each POS tag. The set of bitvectors is referred to as a coding matrix with value { 1, -1}. Each column of the coding matrix specifies a dichotomy over the tag space to learn a binary classifier. For each binary classifier, its training data is generated in the following way: each pair of words and its possible POS tags are considered as a positive training example only if the whole set of its possible tags falls into the positive dichotomy specified by the column coding and similarly for negative training examples. Given a word in context, its POS tag is predicted by concatenating the predictive outputs of the\n            <jats:italic>L<\/jats:italic>\n            binary classifiers and choosing the tag with the closest distance according to some measure. By incorporating the ECOC strategy, the set of all possible tags for each word is treated as an entirety without the need of performing disambiguation. Moreover, instead of manual feature engineering employed in most previous POS tagging approaches, features for training and testing in the proposed framework are automatically generated using neural language modeling. The proposed framework has been evaluated on three corpora for English, Italian, and Malagasy POS tagging, achieving accuracies of 93.21%, 90.9%, and 84.5% individually, which shows a significant improvement compared to the state-of-the-art approaches.\n          <\/jats:p>","DOI":"10.1145\/3214707","type":"journal-article","created":{"date-parts":[[2018,7,23]],"date-time":"2018-07-23T13:02:15Z","timestamp":1532350935000},"page":"1-19","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":26,"title":["Weakly Supervised POS Tagging without Disambiguation"],"prefix":"10.1145","volume":"17","author":[{"given":"Deyu","family":"Zhou","sequence":"first","affiliation":[{"name":"School of Computer Science and Engineering, Key Laboratory of Computer Network and Information Integration, Ministry of Education, Southeast University, Nanjing, China"}]},{"given":"Zhikai","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Key Laboratory of Computer Network and Information Integration, Ministry of Education, Southeast University, Nanjing, China"}]},{"given":"Min-Ling","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Key Laboratory of Computer Network and Information Integration, Ministry of Education, Southeast University, Nanjing, China"}]},{"given":"Yulan","family":"He","sequence":"additional","affiliation":[{"name":"School of Engineering and Applied Science, Aston University, UK"}]}],"member":"320","published-online":{"date-parts":[[2018,7,21]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 1298--1307","author":"Abend Omri","year":"2010","unstructured":"Omri Abend , Roi Reichart , and Ari Rappoport . 2010 . Improved unsupervised POS induction through prototype discovery . In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 1298--1307 . Omri Abend, Roi Reichart, and Ari Rappoport. 2010. Improved unsupervised POS induction through prototype discovery. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 1298--1307."},{"key":"e_1_2_1_2_1","volume-title":"Proceedings of the 17th International Conference on Machine Learning (ICML\u201900)","author":"Allwein Erin L.","year":"2000","unstructured":"Erin L. Allwein , Robert E. Schapire , and Yoram Singer . 2000 . Reducing multiclass to binary: A unifying approach for margin classifiers . In Proceedings of the 17th International Conference on Machine Learning (ICML\u201900) . Morgan Kaufmann Publishers Inc., San Francisco, CA, 9--16. http:\/\/dl.acm.org\/citation.cfm?id&equals;645529.658120. Erin L. Allwein, Robert E. Schapire, and Yoram Singer. 2000. Reducing multiclass to binary: A unifying approach for margin classifiers. In Proceedings of the 17th International Conference on Machine Learning (ICML\u201900). Morgan Kaufmann Publishers Inc., San Francisco, CA, 9--16. http:\/\/dl.acm.org\/citation.cfm?id&equals;645529.658120."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.3115\/1220355.1220435"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.5555\/1857999.1858082"},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop","author":"Biemann Chris","unstructured":"Chris Biemann . 2006. Unsupervised part-of-speech tagging employing efficient graph clustering . In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop . Association for Computational Linguistics , 7--12. Chris Biemann. 2006. Unsupervised part-of-speech tagging employing efficient graph clustering. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. Association for Computational Linguistics, 7--12."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.3115\/1075527.1075553"},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the 3rd Workshop on Very Large Corpora","volume":"30","author":"Brill Eric","year":"1995","unstructured":"Eric Brill . 1995 . Unsupervised learning of disambiguation rules for part of speech tagging . In Proceedings of the 3rd Workshop on Very Large Corpora , Vol. 30 . Somerset, New Jersey : Association for Computational Linguistics, 1--13. Eric Brill. 1995. Unsupervised learning of disambiguation rules for part of speech tagging. In Proceedings of the 3rd Workshop on Very Large Corpora, Vol. 30. Somerset, New Jersey: Association for Computational Linguistics, 1--13."},{"key":"e_1_2_1_8_1","first-page":"4","article-title":"Class-based N-gram models of natural language","volume":"18","author":"Brown Peter F.","year":"1992","unstructured":"Peter F. Brown , Peter V. deSouza , Robert L. Mercer , Vincent J. Della Pietra , and Jenifer C. Lai . 1992 . Class-based N-gram models of natural language . Computational Linguistics 18 , 4 (Dec. 1992), 467--479. http:\/\/dl.acm.org\/citation.cfm?id&equals;176313.176316. Peter F. Brown, Peter V. deSouza, Robert L. Mercer, Vincent J. Della Pietra, and Jenifer C. Lai. 1992. Class-based N-gram models of natural language. Computational Linguistics 18, 4 (Dec. 1992), 467--479. http:\/\/dl.acm.org\/citation.cfm?id&equals;176313.176316.","journal-title":"Computational Linguistics"},{"key":"e_1_2_1_9_1","volume-title":"International Conference on Language Resources and Evaluation, Lrec 2010","author":"Cer Daniel M.","year":"2010","unstructured":"Daniel M. Cer , Marie Catherine De Marneffe , Daniel Jurafsky , and Christopher D. Manning . 2010. Parsing to stanford dependencies: Trade-offs between speed and accuracy . In International Conference on Language Resources and Evaluation, Lrec 2010 , 17-23 May 2010 , Valletta, Malta. Daniel M. Cer, Marie Catherine De Marneffe, Daniel Jurafsky, and Christopher D. Manning. 2010. Parsing to stanford dependencies: Trade-offs between speed and accuracy. In International Conference on Language Resources and Evaluation, Lrec 2010, 17-23 May 2010, Valletta, Malta."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1961189.1961199"},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP\u201910)","author":"Christodoulopoulos Christos","year":"2010","unstructured":"Christos Christodoulopoulos , Sharon Goldwater , and Mark Steedman . 2010 . Two decades of unsupervised POS induction: How far have we come? In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP\u201910) . Association for Computational Linguistics, Stroudsburg, PA, 575--584. http:\/\/dl.acm.org\/citation.cfm?id&equals; 1870658.1870714. Christos Christodoulopoulos, Sharon Goldwater, and Mark Steedman. 2010. Two decades of unsupervised POS induction: How far have we come? In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP\u201910). Association for Computational Linguistics, Stroudsburg, PA, 575--584. http:\/\/dl.acm.org\/citation.cfm?id&equals;1870658.1870714."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.3115\/1067807.1067817"},{"key":"e_1_2_1_13_1","article-title":"Natural language processing (almost) from scratch","author":"Collobert Ronan","year":"2011","unstructured":"Ronan Collobert , Jason Weston , L\u00e9on Bottou , Michael Karlen , Koray Kavukcuoglu , and Pavel Kuksa . 2011 . Natural language processing (almost) from scratch . Journal of Machine Learning Research 12 ( Nov. 2011), 2493--2537. Ronan Collobert, Jason Weston, L\u00e9on Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12 (Nov. 2011), 2493--2537.","journal-title":"Journal of Machine Learning Research 12"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.5555\/1622826.1622834"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.5555\/2390948.2391036"},{"key":"e_1_2_1_16_1","unstructured":"Dan Garrette and Jason Baldridge. 2013. Learning a part-of-speech tagger from two hours of annotation. In HLT-NAACL. Citeseer 138--147.  Dan Garrette and Jason Baldridge. 2013. Learning a part-of-speech tagger from two hours of annotation. In HLT-NAACL. Citeseer 138--147."},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 746--754","author":"Goldberg Yoav","year":"2008","unstructured":"Yoav Goldberg , Meni Adler , and Michael Elhadad . 2008 . EM can find pretty good HMM POS-taggers (when given a good start) . In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 746--754 . Yoav Goldberg, Meni Adler, and Michael Elhadad. 2008. EM can find pretty good HMM POS-taggers (when given a good start). In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 746--754."},{"key":"e_1_2_1_18_1","volume-title":"ACL 2007, Proceedings of the Meeting of the Association for Computational Linguistics, June 23--30","author":"Goldwater Sharon","year":"2007","unstructured":"Sharon Goldwater and Tom Griffiths . 2007 . A fully Bayesian approach to unsupervised part-of-speech tagging . In ACL 2007, Proceedings of the Meeting of the Association for Computational Linguistics, June 23--30 , 2007, Prague, Czech Republic. 744--751. Sharon Goldwater and Tom Griffiths. 2007. A fully Bayesian approach to unsupervised part-of-speech tagging. In ACL 2007, Proceedings of the Meeting of the Association for Computational Linguistics, June 23--30, 2007, Prague, Czech Republic. 744--751."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.3115\/1220835.1220876"},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL\u201907)","author":"Johnson Mark","year":"2007","unstructured":"Mark Johnson . 2007 . Why doesn\u2019t EM find good HMM POS-taggers? In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL\u201907) , June 28 --30 , 2007, Prague, Czech Republic. 296--305. Mark Johnson. 2007. Why doesn\u2019t EM find good HMM POS-taggers? In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL\u201907), June 28--30, 2007, Prague, Czech Republic. 296--305."},{"key":"e_1_2_1_21_1","first-page":"2","article-title":"Building a large annotated corpus of english: The penn treebank","volume":"19","author":"Marcus Mitchell P.","year":"1993","unstructured":"Mitchell P. Marcus , Mary Ann Marcinkiewicz , and Beatrice Santorini . 1993 . Building a large annotated corpus of english: The penn treebank . Computational Linguistics 19 , 2 (June 1993), 313--330. Mitchell P. Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. 1993. Building a large annotated corpus of english: The penn treebank. Computational Linguistics 19, 2 (June 1993), 313--330.","journal-title":"Computational Linguistics"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.5555\/972525.972526"},{"key":"e_1_2_1_23_1","volume-title":"Tagging english text with a probabilistic model. Computational linguistics 20, 2","author":"Merialdo Bernard","year":"1994","unstructured":"Bernard Merialdo . 1994b. Tagging english text with a probabilistic model. Computational linguistics 20, 2 ( 1994 ), 155--171. Bernard Merialdo. 1994b. Tagging english text with a probabilistic model. Computational linguistics 20, 2 (1994), 155--171."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.5555\/1734953.1734961"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2007.04.008"},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP","author":"Ravi Sujith","unstructured":"Sujith Ravi and Kevin Knight . 2009. Minimized models for unsupervised part-of-speech tagging . In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP . Association for Computational Linguistics , 504--512. Sujith Ravi and Kevin Knight. 2009. Minimized models for unsupervised part-of-speech tagging. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Association for Computational Linguistics, 504--512."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00169"},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 940--948","author":"Ravi Sujith","year":"2010","unstructured":"Sujith Ravi , Ashish Vaswani , Kevin Knight , and David Chiang . 2010 . Fast, greedy model minimization for unsupervised tagging . In Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 940--948 . Sujith Ravi, Ashish Vaswani, Kevin Knight, and David Chiang. 2010. Fast, greedy model minimization for unsupervised tagging. In Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 940--948."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P14-2044"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.3115\/1219840.1219884"},{"key":"e_1_2_1_31_1","unstructured":"Kristina Toutanova Mark Johnson etal 2007. A Bayesian LDA-based model for semi-supervised part-of-speech tagging. Advances in Neural Information Processing Systems. 1521--1528.   Kristina Toutanova Mark Johnson et al. 2007. A Bayesian LDA-based model for semi-supervised part-of-speech tagging. Advances in Neural Information Processing Systems. 1521--1528."},{"key":"e_1_2_1_32_1","volume-title":"Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 1391--1398","author":"Yatbaz Mehmet Ali","year":"2010","unstructured":"Mehmet Ali Yatbaz and Deniz Yuret . 2010 . Unsupervised part of speech tagging using unambiguous substitutes from a statistical language model . In Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 1391--1398 . Mehmet Ali Yatbaz and Deniz Yuret. 2010. Unsupervised part of speech tagging using unambiguous substitutes from a statistical language model. In Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 1391--1398."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611973440.5"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/E14-1062"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.5555\/1699571.1699602"},{"key":"e_1_2_1_36_1","volume-title":"Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI\u201915)","author":"Zhou Deyu","year":"2015","unstructured":"Deyu Zhou , Liangyu Chen , and Yulan He . 2015 . An unsupervised framework of exploring events on twitter: Filtering, extraction and categorization . In Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI\u201915) . 2468--2474. Deyu Zhou, Liangyu Chen, and Yulan He. 2015. An unsupervised framework of exploring events on twitter: Filtering, extraction and categorization. In Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI\u201915). 2468--2474."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btu061"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1201\/b12207"}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3214707","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3214707","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T01:39:27Z","timestamp":1750210767000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3214707"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,7,21]]},"references-count":38,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2018,12,31]]}},"alternative-id":["10.1145\/3214707"],"URL":"https:\/\/doi.org\/10.1145\/3214707","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"value":"2375-4699","type":"print"},{"value":"2375-4702","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,7,21]]},"assertion":[{"value":"2017-05-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-05-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-07-21","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}