History of natural language processing
The history of natural language processing describes the advances of natural language
processing. There is some overlap with the history of machine translation, the history of speech
recognition, and the history of artificial intelligence.
Early history
The history of machine translation dates back to the seventeenth century, when philosophers such
as Leibniz and Descartes put forward proposals for codes which would relate words between
languages. All of these proposals remained theoretical, and none resulted in the development of an
actual machine.
The first patents for "translating machines" were applied for in the mid-1930s. One proposal, by
Georges Artsrouni was simply an automatic bilingual dictionary using paper tape. The other
proposal, by Peter Troyanskii, a Russian, was more detailed. Troyanski proposal included both the
bilingual dictionary, and a method for dealing with grammatical roles between languages, based on
Esperanto.
Logical period
In 1950, Alan Turing published his famous article "Computing Machinery and Intelligence" which
proposed what is now called the Turing test as a criterion of intelligence. This criterion depends on
the ability of a computer program to impersonate a human in a real-time written conversation with
a human judge, sufficiently well that the judge is unable to distinguish reliably — on the basis of the
conversational content alone — between the program and a real human.
In 1957, Noam Chomsky’s Syntactic Structures revolutionized Linguistics with 'universal
grammar', a rule-based system of syntactic structures.[1]
The Georgetown experiment in 1954 involved fully automatic translation of more than sixty
Russian sentences into English. The authors claimed that within three or five years, machine
translation would be a solved problem.[2] However, real progress was much slower, and after the
ALPAC report in 1966, which found that ten years long research had failed to fulfill the
expectations, funding for machine translation was dramatically reduced. Little further research in
machine translation was conducted until the late 1980s, when the first statistical machine
translation systems were developed.
Some notably successful NLP systems developed in the 1960s were SHRDLU, a natural language
system working in restricted "blocks worlds" with restricted vocabularies.
In 1969 Roger Schank introduced the conceptual dependency theory for natural language
understanding.[3] This model, partially influenced by the work of Sydney Lamb, was extensively
used by Schank's students at Yale University, such as Robert Wilensky, Wendy Lehnert, and Janet
Kolodner.
In 1970, William A. Woods introduced the augmented transition network (ATN) to represent
natural language input.[4] Instead of phrase structure rules ATNs used an equivalent set of finite-
state automata that were called recursively. ATNs and their more general format called
"generalized ATNs" continued to be used for a number of years. During the 1970s many
programmers began to write 'conceptual ontologies', which structured real-world information into
computer-understandable data. Examples are MARGIE (Schank, 1975), SAM (Cullingford, 1978),
PAM (Wilensky, 1978), TaleSpin (Meehan, 1976), QUALM (Lehnert, 1977), Politics (Carbonell,
1979), and Plot Units (Lehnert 1981). During this time, many chatterbots were written including
PARRY, Racter, and Jabberwacky.
In recent years, advancements in deep learning and large language models have significantly
enhanced the capabilities of natural language processing, leading to widespread applications in
areas such as healthcare, customer service, and content generation. [5]
Statistical period
Up to the 1980s, most NLP systems were based on complex sets of hand-written rules. Starting in
the late 1980s, however, there was a revolution in NLP with the introduction of machine learning
algorithms for language processing. This was due both to the steady increase in computational
power resulting from Moore's law and the gradual lessening of the dominance of Chomskyan
theories of linguistics (e.g. transformational grammar), whose theoretical underpinnings
discouraged the sort of corpus linguistics that underlies the machine-learning approach to
language processing.[6] Some of the earliest-used machine learning algorithms, such as decision
trees, produced systems of hard if-then rules similar to existing hand-written rules. Increasingly,
however, research has focused on statistical models, which make soft, probabilistic decisions based
on attaching real-valued weights to the features making up the input data. The cache language
models upon which many speech recognition systems now rely are examples of such statistical
models. Such models are generally more robust when given unfamiliar input, especially input that
contains errors (as is very common for real-world data), and produce more reliable results when
integrated into a larger system comprising multiple subtasks.
Datasets
The emergence of statistical approaches was aided by both increase in computing power and the
availability of large datasets. At that time, large multilingual corpora were starting to emerge.
Notably, some were produced by the Parliament of Canada and the European Union as a result of
laws calling for the translation of all governmental proceedings into all official languages of the
corresponding systems of government.
Many of the notable early successes occurred in the field of machine translation. In 1993, the IBM
alignment models were used for statistical machine translation.[7] Compared to previous machine
translation systems, which were symbolic systems manually coded by computational linguists,
these systems were statistical, which allowed them to automatically learn from large textual
corpora. Though these systems do not work well in situations where only small corpora is available,
so data-efficient methods continue to be an area of research and development.
In 2001, a one-billion-word large text corpus, scraped from the Internet, referred to as "very very
large" at the time, was used for word disambiguation.[8]
To take advantage of large, unlabelled datasets, algorithms were developed for unsupervised and
self-supervised learning. Generally, this task is much more difficult than supervised learning, and
typically produces less accurate results for a given amount of input data. However, there is an
enormous amount of non-annotated data available (including, among other things, the entire
content of the World Wide Web), which can often make up for the inferior results.
Neural period
In 1990, the Elman network, using a recurrent neural
network, encoded each word in a training set as a vector,
called a word embedding, and the whole vocabulary as a
vector database, allowing it to perform such tasks as
sequence-predictions that are beyond the power of a
simple multilayer perceptron. A shortcoming of the static
embeddings was that they didn't differentiate between
multiple meanings of homonyms.[9]
Software
Timeline of natural language processing
models
Software Year Creator Description Reference
Georgetown Georgetown involved fully automatic translation of more
1954
experiment University and IBM than sixty Russian sentences into English.
could solve high school algebra word
STUDENT 1964 Daniel Bobrow
problems.[10]
a simulation of a Rogerian psychotherapist,
Joseph
ELIZA 1964 rephrasing her response with a few grammar
Weizenbaum
rules.[11]
a natural language system working in
SHRDLU 1970 Terry Winograd restricted "blocks worlds" with restricted
vocabularies, worked extremely well
PARRY 1972 Kenneth Colby A chatterbot
a knowledge representation system in the
KL-ONE 1974 Sondheimer et al. tradition of semantic networks and frames; it
is a frame language.
MARGIE 1975 Roger Schank
TaleSpin (software) 1976 Meehan
QUALM Lehnert
a natural language interface to a database
LIFER/LADDER 1978 Hendrix
of information about US Navy ships.
SAM (software) 1978 Cullingford
PAM (software) 1978 Robert Wilensky
Politics (software) 1979 Carbonell
Plot Units (software) 1981 Lehnert
chatterbot with stated aim to "simulate
Jabberwacky 1982 Rollo Carpenter natural human chat in an interesting,
entertaining and humorous manner".
MUMBLE (software) 1982 McDonald
William Chamberlain chatterbot that generated English language
Racter 1983
and Thomas Etter prose at random.
MOPTRANS [12] 1984 Lytinen
KODIAK (software) 1986 Wilensky
Absity (software) 1987 Hirst
Dr. Sbaitso 1991 Creative Labs
Watson (artificial A question answering system that won the
intelligence 2006 IBM Jeopardy! contest, defeating the best human
software) players in February 2011.
Siri 2011 Apple A virtual assistant developed by Apple.
Cortana 2014 Microsoft A virtual assistant developed by Microsoft.
Amazon Alexa 2014 Amazon A virtual assistant developed by Amazon.
Google Assistant 2016 Google A virtual assistant developed by Google.
References
1. "SEM1A5 - Part 1 - A brief history of NLP" (http://www.cs.bham.ac.uk/~pjh/sem1a5/pt1/pt1_hist
ory.html). Retrieved 2010-06-25.
2. Hutchins, J. (2005)
3. Roger Schank, 1969, A conceptual dependency parser for natural language Proceedings of the
1969 conference on Computational linguistics, Sång-Säby, Sweden, pages 1-3
4. Woods, William A (1970). "Transition Network Grammars for Natural Language Analysis".
Communications of the ACM 13 (10): 591–606 [1] (http://www.eric.ed.gov/ERICWebPortal/cust
om/portlets/recordDetails/detailmini.jsp?_nfpb=true&_&ERICExtSearch_SearchValue_0=ED03
7733&ERICExtSearch_SearchType_0=no&accno=ED037733)
5. Gruetzemacher, Ross (2022-04-19). "The Power of Natural Language Processing" (https://hb
r.org/2022/04/the-power-of-natural-language-processing). Harvard Business Review.
ISSN 0017-8012 (https://search.worldcat.org/issn/0017-8012). Retrieved 2024-12-07.
6. Chomskyan linguistics encourages the investigation of "corner cases" that stress the limits of
its theoretical models (comparable to pathological phenomena in mathematics), typically
created using thought experiments, rather than the systematic investigation of typical
phenomena that occur in real-world data, as is the case in corpus linguistics. The creation and
use of such corpora of real-world data is a fundamental part of machine-learning algorithms for
NLP. In addition, theoretical underpinnings of Chomskyan linguistics such as the so-called
"poverty of the stimulus" argument entail that general learning algorithms, as are typically used
in machine learning, cannot be successful in language processing. As a result, the Chomskyan
paradigm discouraged the application of such models to language processing.
7. Brown, Peter F. (1993). "The mathematics of statistical machine translation: Parameter
estimation". Computational Linguistics (19): 263–311.
8. Banko, Michele; Brill, Eric (2001). "Scaling to very very large corpora for natural language
disambiguation" (https://doi.org/10.3115%2F1073012.1073017). Proceedings of the 39th
Annual Meeting on Association for Computational Linguistics - ACL '01. Morristown, NJ, USA:
Association for Computational Linguistics: 26–33. doi:10.3115/1073012.1073017 (https://doi.or
g/10.3115%2F1073012.1073017). S2CID 6645623 (https://api.semanticscholar.org/CorpusID:6
645623).
9. Elman, Jeffrey L. (March 1990). "Finding Structure in Time" (http://doi.wiley.com/10.1207/s1551
6709cog1402_1). Cognitive Science. 14 (2): 179–211. doi:10.1207/s15516709cog1402_1 (http
s://doi.org/10.1207%2Fs15516709cog1402_1). S2CID 2763403 (https://api.semanticscholar.or
g/CorpusID:2763403).
10. McCorduck 2004, p. 286, Crevier 1993, pp. 76−79, Russell & Norvig 2003, p. 19
11. McCorduck 2004, pp. 291–296, Crevier 1993, pp. 134−139
12. Janet L. Kolodner, Christopher K. Riesbeck; Experience, Memory, and Reasoning; Psychology
Press; 2014 reprint
Bibliography
Crevier, Daniel (1993). AI: The Tumultuous Search for Artificial Intelligence. New York, NY:
BasicBooks. ISBN 0-465-02997-3.
McCorduck, Pamela (2004), Machines Who Think (2nd ed.), Natick, MA: A. K. Peters, Ltd.,
ISBN 978-1-56881-205-2, OCLC 52197627 (https://search.worldcat.org/oclc/52197627).
Russell, Stuart J.; Norvig, Peter (2003), Artificial Intelligence: A Modern Approach (http://aima.c
s.berkeley.edu/) (2nd ed.), Upper Saddle River, New Jersey: Prentice Hall,
ISBN 0-13-790395-2.
Retrieved from "https://en.wikipedia.org/w/index.php?
title=History_of_natural_language_processing&oldid=1292009362"