0% found this document useful (0 votes)

24 views13 pages

Precursor-Induced Conditional Random

Uploaded by

l320660680

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views13 pages

Precursor-Induced Conditional Random

Uploaded by

l320660680

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Lee and Choi BMC Medical Informatics and Decision Making (2019) 19:132

https://doi.org/10.1186/s12911-019-0865-1

TECHNICAL ADVANCE Open Access

Precursor-induced conditional random

fields: connecting separate entities by
induction for improved clinical named
entity recognition
Wangjin Lee1 and Jinwook Choi1,2,3*

Abstract
Background: This paper presents a conditional random fields (CRF) method that enables the capture of specific
high-order label transition factors to improve clinical named entity recognition performance. Consecutive clinical
entities in a sentence are usually separated from each other, and the textual descriptions in clinical narrative
documents frequently indicate causal or posterior relationships that can be used to facilitate clinical named
entity recognition. However, the CRF that is generally used for named entity recognition is a first-order model
that constrains label transition dependency of adjoining labels under the Markov assumption.
Methods: Based on the first-order structure, our proposed model utilizes non-entity tokens between separated
entities as an information transmission medium by applying a label induction method. The model is referred to as
precursor-induced CRF because its non-entity state memorizes precursor entity information, and the model’s structure
allows the precursor entity information to propagate forward through the label sequence.
Results: We compared the proposed model with both first- and second-order CRFs in terms of their F1-scores, using
two clinical named entity recognition corpora (the i2b2 2012 challenge and the Seoul National University Hospital
electronic health record). The proposed model demonstrated better entity recognition performance than both the
first- and second-order CRFs and was also more efficient than the higher-order model.
Conclusion: The proposed precursor-induced CRF which uses non-entity labels as label transition information
improves entity recognition F1 score by exploiting long-distance transition factors without exponentially increasing the
computational time. In contrast, a conventional second-order CRF model that uses longer distance transition factors
showed even worse results than the first-order model and required the longest computation time. Thus, the proposed
model could offer a considerable performance improvement over current clinical named entity recognition methods
based on the CRF models.
Keywords: Clinical named entity recognition, Conditional random fields, High-order dependency, Clinical natural
language processing, Induction method

* Correspondence: [email protected]
1
Interdisciplinary Program for Bioengineering, Graduate School, Seoul
National University, 103 Daehak-ro, Jongno-gu, Seoul 03080, South Korea
2
Department of Biomedical Engineering, Seoul National University College of
Medicine, 103 Daehak-ro, Jongno-gu, Seoul 03080, South Korea
Full list of author information is available at the end of the article

© The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to
the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Lee and Choi BMC Medical Informatics and Decision Making (2019) 19:132 Page 2 of 13

Background model’s ability to capture dependencies between NE la-

With the recent application of artificial intelligence to the bels when the two entities are separated by non-entities
medical field, health information systems are expected to that are outside tokens [17]. Therefore, CRF models
handle medical data in the form of unstructured text. The using the first-order transition factor have difficulty in
unstructured clinical text conveys descriptions of patients’ capturing higher-order interdependencies of NEs.
health information, including their histories of illness and More specifically, we assumed that 1) named entities
hospital treatment. Salient concepts that express a patient’s are prevalent in clinical texts, 2) the entities in clinical
health status are represented by named entities (NEs) in texts are semantically related, thus the information of
the text. The identifying textual mentions of health-related preceding entity’s label would be an important feature
concepts, termed clinical named entity recognition (NER), for an NER model’s prediction of a certain label (at a
is a sub-problem in the field of clinical natural language time step), 3) the labeling mechanism of the CRF model
processing (NLP) [1]. The health information that requires that uses label transition information as one important
identification can range from a single entity to an elaborate feature would be suitable for clinical NER. However,
description containing many entities. Heterogeneous clas- according to the study published by Dan Roth’s group
ses of clinical entities have been employed in recent stud- (CoNLL 2009), it is limited to use the transition infor-
ies; these are strongly related to clinical activities, such as mation, especially for NER in the first- or second-order
medical examination, medication, and diagnosis [2–7]. model based on Markov assumption because the named
The NER problem consists of identifying spans of entities entities are generally separated each other in a text.
and attaching labels indicating the appropriate semantic Previous NER studies have focused on methods of
class, as shown in Fig. 1. In the NER task, the text can be exploring long-distance dependencies in NER while main-
seen as a word (or token) sequence, and the most advanced taining computational tractability. Conventional high-
NER models are therefore based on sequence labeling ap- order CRFs is known to be intractable in practice because
proaches that use machine learning methods [3, 4, 8, 9]. they multiply the feature space and require more training
The concept of conditional random fields (CRFs) [10] has data to prevent the data sparseness problem [18]. Sarawagi
demonstrated promising results in many sequence labeling and Cohen proposed a semi-Markov CRF [19] that treated
problems, including NER [3, 10–14], as well as a deep the same consecutive labels as a segment and used the
learning architecture applied to the NER task [15, 16]. CRF label transition between adjoining segments. Subsequent
models are particularly effective for text processing because studies have proposed using pre-defined label patterns to
they learn transition factors between labels of single tokens, implement high-order CRFs [20–22]. However, these
assuming that the current label is conditioned on both methods suffer from limitations associated with the
current observations and the immediately preceding label. management of entity transitions within non-entity labels
The first-order constraint is applied in order to reduce of arbitrary length.
computational complexity and to maintain the model’s This study focuses on using the interdependency of
simplicity. NEs separated by an arbitrary number of non-entity to-
However, the constraint on the labels’ adjacency pre- kens, a condition that is predominant in clinical texts
vents the model from expressing transition dependency but rarely captured by first-order CRF models. In order
between entities separated by a long distance. NEs tend to minimize the increase in the model’s computational
to be separated by non-entity words in NER problems, complexity associated with the extraction of long-
and this innate attribute inhibits the first-order CRF distance label transition information, this study proposes

Fig. 1 NER perspective of a text; the label O represents a non-entity

Lee and Choi BMC Medical Informatics and Decision Making (2019) 19:132 Page 3 of 13

an induction method that allows information to propa- summing the numerator for all possible y sequences
gate from one state to state between two entities [35]. The learning objective is to find the weight set that
through non-entity sequence within a single instance. maximizes the conditional distribution. The function fk is
Concentrating on the CRF model study rather than a binary indicator function that has a value of 1 only if the
the medical NER, this paper would briefly introduce re- function matches the target condition, and is otherwise 0.
cent studies in medical NER. Deep-learning based Dependencies between random variables are presented in
methods for clinical concept identification are actively the form of feature function fk in the CRF; the feature
studied especially based on recurrent neural network functions are either transition factors or observation factor
structures [16, 23–28]. In the long short-term memory functions. The transition factors in the CRF model take
and CRF architecture, the CRF is still used for labeling the form of fk ij(y, y’, x) = 1{y = i}1{y’ = j} where i and j are cer-
of a sequence because the CRF model can jointly use tain label symbols having transition relationship according
neighboring tags in its output decision [15]. In order to to this function. The observation factors takes the form as
automate medical NER a research [29] has been pro- Eq. (2) where i and o are certain symbols having an expli-
posed to incorporate active learning. Once named en- cit relationship according to this function:
tities are extracted, the identified terms can be utilized
in order to derive more information beyond textual data,
f k io ðy; y’; xÞ ¼ 1fy¼ig 1fx¼og :: ð2Þ
such as temporal information extraction [3, 30], drug-
disease relationship recognition from large scale medical
literature [31], and identification of risk factors related Based on this definition of the feature function, the CRF
to a particular disease [32]. In order to support re- model explicitly represents not only observation informa-
searchers requiring NER modules, off-the-shelf medical tion but also label transition information for sequence
NER programs are recently published such as CLAMP labeling. For instance, presume a set {A, B, O} as the label
[33] and MetaMap Lite [34]. symbol set; assign A or B to NEs, assign a label symbol O
The remainder of this paper is organized as follows. to non-entity tokens, and presume a label sequence of
The Methods section details the proposed CRF model length 4, [A, B, O, B], where the first occurrence of entity B
and the model evaluation method. The Results section follows entity A, and a single non-entity token exists
presents the evaluation results, and the Discussion sec- between the two entity Bs. The first-order CRF models only
tion considers several observations related to the use of those label transitions between adjoining state labels, that
the proposed model in clinical NER. The Conclusion is, the label transition data {(A, B), (B, O), (O, B)}, in which
section summarizes the study’s main findings. the transition between labels A and B is explicitly
expressed. Presume another label sequence [A, O, …O, B]
Methods where entity A precedes entity B by some distance and an
Conditional random fields arbitrary length of consecutive non-entity tokens are be-
In the conventional CRF model applied to NER, a textual tween the two NEs. The first-order CRF model learns only
instance (i.e., sentence) can be represented as a pair (x, y) the label transitions {(A, O), (O, O), (O, B)} from the data, in
where x is an observed feature sequence including one or which the dependency (A, B) is not explicitly cap-
more words (tokens) and y is the feature sequence’s corre- tured by the model and the fact that entity A pre-
sponding label sequence. Because the text is a linear cedes entity B is not learned during the training time.
sequence of tokens, the CRF for NER takes the form of a Because the CRF model treats single observation to-
linear chain. The length of x is the number of tokens, and kens as single time steps in a sequence, the gap size
the sequence y has the same length as x. The label is hid- between two separate entities is broadened by the
den, and a hidden state value set consists of the target entity number of intermediary non-entities, as shown in
labels and a single non-entity label for non-entity tokens. Fig. 2.
The CRF model then represents the conditional distribu- In Fig. 2, each circle denotes a random variable for la-
tion P(y|x) as an equation of feature functions as follows: bels, and each edge denotes that there is a dependency
between connected random variables. In this structure,
1 YT nXK o
pðyjxÞ ¼ ∙ t¼1 exp θ f ð y ; y ; x Þ ; labels have dependency only between neighbors. Thus a
k t
Z ð xÞ k¼1 k t t−1
dependency for entity prediction between the label sym-
ð1Þ bols ‘Symptom’ and ‘Drug’ for predicting the word ‘ASA’
seems to be ignored. In the case of the ‘ASA,’ we sus-
where fk is a kth arbitrary feature function having the pected that the preceding label information could
corresponding weight θk , K is the number of feature provide additional information for prediction of a par-
functions, t is the time step, T is the number of tokens ticular label for the word if the information can be
in an instance of x, and Z(x) is a partition function delivered forward.
Lee and Choi BMC Medical Informatics and Decision Making (2019) 19:132 Page 4 of 13

Fig. 2 Example of entities separated by non-entity words in the CRF model (S: symptom; D: drug; O: non-entity)

Precursor-induced conditional random fields the next outside label, which in turn propagates
In order to improve the CRF model for NER applica- the information to the next outside label, as
tions, this study introduces a precursor-induced CRF shown in Fig. 3 (b); and
(pi-CRF) model to capture specific long-distance transi- Uses an induction process to transmit the
tion dependencies between two NEs separated by mul- information from the first entity through multiple
tiple non-entities. The pi-CRF model: outside label sequence to the second entity state,
even though the model uses the first-order depend-
Uses non-entity labels to propagate transition infor- ency (Fig. 3 (b)).
mation between separated NEs; Modifies the observation feature functions of the
Retains the first-order model structure to reduce the CRF in order to share observation symbols among
model’s computational complexity than the second- outside label symbols (Eq. 4).
order or higher-order CRF;
Focuses on label subsequences with the [entity,
outside+, entity] pattern, as shown in Fig. 3 (a), Label induction
where the outside+ notation denotes one or In the pi-CRF, a state with an outside label binds with an
successive non-entity label symbols; additional memory element and behaves as an informa-
Adds a memory element to the hidden state tion transmission medium, delivering information about
variables to represent those states labeled as non- the presence or absence of the preceding entity forward,
entities, such that the initial outside label in a which requires the expansion of the hidden state value
non-entity subsequence propagates its explicit set (label symbols). The entity label symbols are col-
first-order dependency on its adjacent entity to lected from the training data, and the expanded state
value set is eventually derived by a concatenation of

Fig. 3 The transformation from conventional first-order CRF to precursor-induced CRF

Lee and Choi BMC Medical Informatics and Decision Making (2019) 19:132 Page 5 of 13

entity label symbol and the outside label symbol. The is engaged within only adjacent tokens (i.e., yt and yt-1)
concatenated outside label symbols thus indicate that because this model is designated to keep the first-order
the outside label follows a specific entity label. As a structure. Thus, the information exists flows forward
naming convention, we use label[O]+ to implicitly indi- with the induced outside label by the first-order transi-
cate that the sequence of O (outside) labels follows the tion. This structure makes the conveyed information
concatenated label series. In the example, the symbol flows forward regardless of the distance.
A[O]+ is one outside label symbol that indicates that an This induction process subsequently expands the ori-
entity A precedes itself, and O[O]+ is one fragmented ginal label symbol set inside the model, producing newly
outside label symbol indicating that no entity has induced and multiple outside label symbols instead of
occurred before this non-entity state. The CRF models the single outside label symbol. For example, the process
distinguish the features for observation symbols and the modifies an original label sequence [A, O, ⋯O, B] to [A,
label symbols. Thus, any types of label symbols do not A[O]+, ⋯A[O]+, B] according to Code 1. This transform-
violate the token symbols, and any label naming conven- ation helps the model learn long-distance transitions be-
tion can be used. tween successive NEs even in the first-order form: from
The form of the pi-CRF is derived from Eq. (1), and the modified example sequence, the model can learn
the conditional probability distribution of the CRF label transition data {(A[O]+, B)} where the entity B de-
model extension takes the form of feature functions as pends on the non-entity taking entity A as its precursor.
follows: This process also generates a trellis structure (Fig. 4 (c))
nXK o that is slightly more complex than the trellis generated
1 YT
pðy; ajxÞ ¼ ∙ t¼1 exp θk f k ðyt ; yt−1 ; xt ; at ; at−1 Þ ; by the conventional first-order CRF model (Fig. 4 (a)),
Z ðxÞ k¼1
but simpler than the trellis generated by a conventional
ð3Þ second-order CRF model (Fig. 4 (b)). The CRF models
generally have as many hidden state options (represented
by the nodes in Fig. 4) as there are variables at each time
step, and each combination of hidden states denotes a
path forward. If N is the number of hidden states in the
original first-order CRF model, the pi-CRF model intro-
duces N additional new states; however, this increase in
computational complexity is relatively moderate com-
pared to the increase induced by second- or higher-order
CRF models. In addition, if the IOB2 tagging scheme [36]
where the variable a stores the induced label informa- is applied to the pi-CRF model, the increase in the num-
tion, and the value of at is activated by the value of at-1 ber of newly induced hidden states is halved.
and yt. The conjoined variables a and y are eventually One of the main factors determining the CRF model’s
used to derive a newly induced label sequence: once at is complexity is the model’s graphical structure. The struc-
activated, at transmutes the value of yt (see the Code 1). ture can be presented in the form of a tuple. Thus, the
Based on this model, the dependency of label transition structures of the first-order CRF can be presented in

Fig. 4 Trellis graphs generated by different CRFs; each circle indicates random hidden state variables at each time step, and lines indicate the
transition paths among the labels. The small circles in (c) are the memory elements added to the hidden states for the non-entity label
Lee and Choi BMC Medical Informatics and Decision Making (2019) 19:132 Page 6 of 13

(yt-1, y t, xt). Because the relationship between ys is re- outside symbol. Unlike the feature functions in the con-
lated to transition, the number of transition pair (yt-1, yt) ventional CRF constrain ‘one-to-one’ relationship be-
can be N2. It means that at least N2 calculations are re- tween a label symbol and an observation symbol in a
quired for each time step of a sequence in both of the feature function, the third indicator term allows ‘many-
training and testing time. In the same way, the graphical to-one’ relationship between whole outside label symbols
structure of the second-order CRF can be presented in and one observation symbol.
(yt-2, yt-1, y t, xt) and the transition pair (yt-2, yt-1, y t) de- In the pi-CRF, the model used the Eq. (4) for its obser-
rives at least N4 (=N2 times N2) calculations for each vation feature function instead of using the Eq. (2) that
time step in training and testing the second-order is used in the conventional CRF. By way of illustration,
model. According to the formulation of the pi-CRF (Eq. presume a token, “doctor,” occurred with three outside
2), the variable a does not act as a hidden variable but label symbols (O[O]+, A[O]+, and B[O]+) in the training
interacts with the variable y in order to expand the pos- set. According to the definition of the observational
sible values of the variable y. This system allows the pi- feature function constraining one-to-one relationship, a
CRF to operate in the first-order structure and it keeps first-order CRF has three distinct feature functions
the model’s complexity feasible. faio(x = doctor, y = O[O]+), fbio(x = doctor, y = A[O]+), and
fcio(x = doctor, y = B[O]+). Although the original CRF
Observation symbol sharing treats the three feature functions independently, the pi-
It is worth addressing one of the attributes of the pi- CRF has one single feature function for the observation
CRF. The model uses modified observation feature func- symbol and the outside label symbols, for instance,
tions. The observation feature function fkio (Eq. 2) dir- fkio(x = doctor, y = outside symbol).
ectly implies that a certain label i has ‘one-to-one’
relationship with a certain observation symbol o. If a Model implementation
label symbol does not have a relationship with a particu- Both the original and the pi-CRF models were imple-
lar observation symbol, its relationship is not trained. mented using Java. The basic CRF structure and algo-
The label induction process makes multiple outside rithms were implemented in MALLET [38]. The pi-CRF
label symbols (i.e., ‘label[O]+’ symbols), instead of using model was trained using the original linear chain CRF al-
one single outside symbol (i.e., ‘O’ symbol for the outside gorithms without modification because the graphical
label). This induction process would interrupt an outside architecture of the pi-CRF model is fixed as a template for
label symbol to have relationships with whole observa- each time step in the same manner as in the original CRF
tion symbols related to non-entities. model. In order to train the pi-CRF model, the L-BFGS
Finally, each outside label symbol has relationships optimization method [12] and l2-regularization [39] were
with only a portion of observation symbols. For the same used to exploit the conventional CRF model’s most advan-
training data, it is generally known that machine learn- tageous features [35]. Furthermore, the Viterbi algorithm
ing models with more hidden states are more likely to was used for inferences from unlabeled sequences. The
experience data sparseness problems because of their in- executable files are available online.1
creased feature dimensions [37]. Likewise, in our devel-
opment period, we observed that the first-order CRF Parameter tuning
performs worse if the conventional model was trained In order to train both models properly, the model pa-
with the induced label pattern. rameters were regularized during the development
In order to prevent the performance decrease, the phase. In both the original and the pi-CRF models, l2-
multiple outside symbols are allowed to share an obser- regularization [39] was used in order to avoid overfitting,
vation symbol each other in the pi-CRF model, accord- and the form of regularization is as that in Eq. 5:
ing to the following observation feature function:
0 XK θ2k
f k io y; y ; x ¼ 1fx¼og ∙ 1fi∈¬outside and y¼ig þ 1fi∈outside and y∈outsideg − k¼1 2σ 2
; ð5Þ
ð4Þ
where K is the number of feature functions and θk is
The second and the third indicator terms in the right- the weight of the kth feature function fk , and σ is the
hand side determine whether the y value is an outside hyper-parameter for the regularization that adjusts the
label symbol or not. If the i (the corresponding label amount of penalty. The regularization term is applied to
symbol of the function fk) is not outside symbol, then a log-likelihood form of the CRF models and penalizes
this equation tests whether the y value is equal to i. Con- large weights.
trary, if the i is an outside symbol, then the third indica-
1
tor term has value 1 as long as the value of the y is an The executable jar files are available in https://github.com/jinsamdol
Lee and Choi BMC Medical Informatics and Decision Making (2019) 19:132 Page 7 of 13

During the model development process, the training Table 2 Annotation statistics
data were split by 8:2 for each training and development a) i2b2 2012
set and the parameter σ was chosen to provide the best Set Problem Test Treatment
F1-score for the development set. The parameter tuning Train 4,962 2,558 3,719
was independently performed on each data set, and the
Test 4,270 2,140 3,213
third feature set was used during the tuning process.
b) SNUH
Results Set Symptom Test Disease Medication Procedure
Dataset description Train 3,923 4,559 5,084 3,642 1,175
All the experiments were performed on the NER sets in Test 3,737 3,917 4,828 3,496 1,147
clinical and general domains: English clinical texts (i2b2
c) CoNLL 2003
2012 NLP shared task data [3]), rheumatism patients’ dis-
Set Location Person Organization Miscellaneous
charge summaries obtained from Seoul National University
Hospital (SNUH) [40], and the CoNLL-2003 NER shared Train 7,140 6,600 6,321 3,438
task corpus [41]. The documents in the SNUH set were Test 1,656 1,617 1,662 694
written using English and Korean. The discharge summar-
ies were annotated using the IOB2 tagging scheme [36]. order range is less than the number of entities within
Although the original annotation in the i2b2 2012 data the second- or higher-order ranges. In addition, the ra-
contains more semantic classes, this evaluation was con- tios of the number of entities having transition depend-
ducted using the problem, test, and treatment entities. ency to the total number of entities were 0.85, 0.73, and
For the SNUH corpus, the entities of symptom, disease, 0.78 for i2b2 2012, SNUH, and CoNLL2003 data sets,
clinical lab test, medication, and procedure/operation respectively. These values indicate that in most cases,
were used. We are interested in identifying clinical entities tend to be interrelated in an instance, rather
events related to a patient’s clinical events. Thus, we than being present as single entities.
used the clinical semantic classes listed above in our
evaluation. For the CoNLL-2003 data, the entities of Feature settings
location, person, organization, and miscellaneous were Three types of feature settings were investigated in this
annotated from the general domain news articles. evaluation, as summarized in Table 4. The setting #1 is
Tables 1 and 2 show the data and annotation statistics for the simplest available, and the setting #2 is the configur-
each data set. The training and testing sets in the i2b2 2012 ation in which character-wise prefixes and suffixes could
and the CoNLL-2003 NER sets were divided following the be exploited. Although these two settings use only simple
official distribution set by the data source administrators. features, these configurations reduce the potential bias
As we assumed that a significant portion of the NEs is that the features could exert on the performance compari-
separated in sentences, we measured the word distance son. The setting #3 implemented features used in previous
between the entities in the data sets. The distance de- evaluations of NER methods for each data set [17, 40, 42];
pendency was measured within each instance. Table 3 some particular features that are easy to implement were
shows examples of the distances between entities in the selected for use here. Also, “Token” and “n-gram” are typ-
i2b2 corpus and Fig. 5 shows the distributions of dis- ical features used in NER. The morphologic information
tances between entities in the entire data set for each used included character-wise affixes (i.e., the first two
corpus. The median distance value between entities was characters of a token), capitalization patterns (e.g., all
3 and the mean values were within the range of from 3
to 5, indicating that the NEs in the data sets tended to
Table 3 Example sentences of the entity distances (single:
be separated by 3 to 5 non-entity tokens. The data also entity not having a precursor)
indicates that the number of entities within the first-
Type Example sentence with entity annotation
Table 1 Data specification single The patient is a 28-year-old woman who is
Corpus Domain Set Article Sentence Token Entity [HIV positive]problem for 2 years .

i2b2 2012 Clinical Train 190 7,258 94,836 11,239 distance With [intravenous hydration]treatment [the BUN]test
0 and …
Test 120 5,547 78,564 9,623
distance … because of [pancytopenia]problem and
SNUH Clinical Train 196 11,669 116,402 18,383 1 [vomiting]problem on [DDI]treatment
Test 193 11,042 107,666 17,125 distance She was brought in for [an esophagogastroduodenoscopy]test
CoNLL 2003 General Train 946 14,987 203,621 23,499 8 on 9/26 but she basically was not sufficiently
[sedated]treatment and readmitted at this time for
Test 231 3,684 46,435 5,629 [a GI work-up]test .
Lee and Choi BMC Medical Informatics and Decision Making (2019) 19:132 Page 8 of 13

Fig. 5 Histograms of distances between named entities in each corpus. The number ‘n’ on the x-axis means n non-entities exist within the
two entities

capitalized or capitalization at the word beginning) [17]. for all feature settings on both the i2b2 2012 and the
Matching indicates whether a token matches a controlled SNUH data sets.
vocabulary, e.g., the previous token is an obvious modifier In addition, the first-order CRF with induced labels shows
of the current token, or a token is matched to a list consist- the worst performance than others. Even though the in-
ing of the first entity tokens in the training data (fre- duced label patterns can be easily obtained in the first-
quency > 10) as performed by Li, et al. [43]. order model, we can see that the use of the label induction
without the ‘observation symbol sharing’ in the conven-
tional model rather negatively affects its performance.
Performance evaluation We also evaluated higher-order CRF models such as
We used the three NER datasets to compare the proposed the conventional second-order CRF, semi-Markov CRF
model structure with the first- and the second-order linear [19] and the high-order CRF [18, 20] implemented by A
chain CRFs, and semi-Markov CRF [19], high-order CRF Allam and M Krauthammer [44]. The semi-Markov CRF
[18] that are variants of the CRF leveraging higher-order and the high-order CRF are CRF variants using higher-
label transition dependency. order transition dependencies. The two CRF variants
At first, we compared the pi-CRF with the first-order were trained with the stochastic gradient descent for 50
models. Table 5 shows the F1 scores of the first-order epochs. The results are reported in Table 6. As shown in
CRF, the first-order CRF trained with the induced labels, the table, the pi-CRF shows a bit better performance
and the pi-CRF for each test set. F1 score is harmonic than the other models in several settings and the pi-CRF
mean of the precision and recall scores. We first tested also shows similar performance with the variants in a
the models on all instances in each data set, and then complex feature set.
tested the models on only those instances having two or In addition, we may observe the performance of the
more entities. The table shows that the proposed model higher-order models including the pi-CRF were de-
structure offers a demonstrable improvement over the creased in the general domain set (CoNLL 2003) in the
first-order models. The pi-CRF showed higher F1 scores simple feature settings. When we compare this result

Table 4 Summary of the feature settings. (The w denotes the window size. If the value is absent, only feature of the current token is
used. The n denotes the n of the n-gram. The ‘len’ denotes the length of affixes. The matching features denote the result of
controlled vocabulary matching)
Set Token Norm-token n-gram character affix capitalization POS/Chunk Matching
#1-context w=3 w=3
#2-morph w=3 w=3 len = 2~3
w=3
#3-i2b2 w=5 w=5 n=2 len = 2~7 w=1
w=5 w=3
#3-snuh w=5 w=3 n=2 len = 2~3 modifier /control
w=5
#3-conll w=5 len = 3~4 w=5 n=1
w=5
Lee and Choi BMC Medical Informatics and Decision Making (2019) 19:132 Page 9 of 13

Table 5 F1 scores of the first-order models and the pi-CRF for each corpora. The first value (‘whole instance’) is F1 score with whole
test set and the second value (‘distanced instance’) is F1 score evaluated only with instances having transition dependency between
NEs. (bold: best performance, shaded: pi-CRF)
Feature Models i2b2 2012 SNUH CoNLL 2003
whole distanced whole distanced whole distanced
instance instance instance instance instance instance
Set 1 1st-order CRF 67.22 68.24 74.75 73.20 60.68 62.19
1st-order CRF with induced 66.60 67.69 74.09 72.85 23.38 15.24
labels
pi-CRF 67.29 68.43 75.50 74.43 45.54 43.41
Set 2 1st-order CRF 71.61 72.85 75.81 75.04 68.43 72.93
1st-order CRF with induced 70.73 71.98 75.24 74.36 44.90 41.89
labels
pi-CRF 71.99 73.35 76.04 75.29 69.61 72.31
Set 3 1st-order CRF 72.55 73.97 76.18 75.06 82.57 83.13
1st-order CRF with induced 71.25 72.75 75.37 74.18 80.81 81.55
labels
pi-CRF 72.58 74.04 76.24 75.33 82.08 82.76

with the corresponding tests in Table 5, the pi-CRF per- exploiting the transition information between NEs sepa-
forms worse than the conventional models for the rated by long and arbitrary distances.
CoNLL data, though, we may interpret the performance
decrease of the higher-order models in naïve feature set- Result analysis
ting might be expected. We also examined the model’s behavior on the test data set.
Table 7 compares the proposed model’s training and in- Table 8 shows the numbers of predicted entities and correct
ference times using the feature setting #3 with the conven- predictions on each held-out data set, using feature setting
tional models. The table shows the numbers of parameters, #1. For the clinical data sets, the models that used long-
states, elapsed training time, training time per iteration, distance transition dependency (i.e., the second-order and
and elapsed inference time. These values indicate that the pi-CRF) tended to predict more entities than the first-order
pi-CRF design was slightly more complicated than the model, and the pi-CRF model correctly predicted more en-
first-order CRF, although the proposed design was less tities than both the first- and second-order CRF models,
complicated than the second-order CRF while still resulting in an improvement in recall performance: + 0.7

Table 6 F1 scores of higher-order CRF models and pi-CRF for each corpora. The first value (‘whole instance’) is F1 score with whole
test set and the second value (‘distanced instance’) is F1 score evaluated only with instanced having transition dependency between
NEs. (bold: best performance, shaded: pi-CRF)
Feature Models i2b2 2012 SNUH CoNLL 2003
whole instance distanced instance whole instance distanced instance whole instance distanced instance
Set 1 2nd-order CRF 69.46 70.88 73.43 72.21 58.34 54.52
semi-Markov CRF 67.87 68.91 73.44 71.61 37.31 34.13
high-order CRF 68.38 69.52 73.50 71.69 36.97 33.87
pi-CRF 67.29 68.43 75.50 74.43 45.54 43.41
Set 2 2nd-order CRF 70.99 72.31 74.31 73.27 73.21 72.26
semi-Markov CRF 72.19 73.54 76.01 74.87 63.19 63.32
high-order CRF 71.50 72.74 76.11 74.97 63.56 63.76
pi-CRF 72.30 73.61 76.20 75.47 69.61 72.31
Set 3 2nd-order CRF 71.75 73.01 75.17 74.05 83.13 83.96
semi-Markov CRF 69.30 70.73 76.70 75.79 82.47 83.29
high-order CRF 69.26 70.64 76.73 75.91 82.18 82.80
pi-CRF 72.58 74.04 76.28 75.45 82.08 82.76
Lee and Choi BMC Medical Informatics and Decision Making (2019) 19:132 Page 10 of 13

Table 7 Efficiency test results. The numbers of parameters and states indicate the model’s size. The elapsed training/inference times
indicate the model’s speed. (shaded: pi-CRF)
Data Model Parameter State Elapsed training time (sec) Training time per iteration (sec) Elapsed
inference time (sec)
i2b2 1st-order CRF 442,705 8 1,550 12.5 1.7
2nd-order CRF 581,604 64 6,819 55.4 5.7
pi-CRF 442,768 11 3,751 17.0 2.1
SNUH 1st-order CRF 396,245 12 2,946 19.5 1.9
2nd-order CRF 495,772 144 27,388 139.7 9.3
pi-CRF 396,400 17 6,231 23.6 2.1
CoNLL 1st-order CRF 313,672 10 4,031 19.1 0.6
2nd-order CRF 431,044 100 24,828 173.6 2.6
pi-CRF 313,776 14 13,512 29.4 0.7

and + 1.13 for the i2b2 and SNUH, respectively. The final along the distances from 0 to the maximum distance
F1-score of the pi-CRF was improved than the first-order for each data set. Figure 6 shows the analysis result.
model, and we may indicate that the improvement of the re- The graph of the models moved similarly along with
call consequently affects the improvement of the F1-score of the distance between entities: according to this figure,
the pi-CRF. However, the models that used long-distance we can observe the recall scores of the CRF decrease
transition dependency (the second-order and the pi-CRF) as distance increases. The CRF models seem to miss
showed the opposite behavior on the general data set, pre- the entities following when two entities are consecu-
dicting noticeably fewer entities than the first-order model, tive. We could not observe a significant performance
although most of the higher-order models’ predictions were improvement of the pi-CRF compared to other
correct. Thus, the precision performance of the pi-CRF models. However, the pi-CRF shows better results in
showed an improvement of + 16.4 for the CoNLL set, even this result when this model was compared with the
though the recall performance was relatively low. first-order CRF that uses a similar graphical structure
The models’ expectation performance were add- with the pi-CRF. Especially, the performance of the
itionally analyzed along the distances from the pre- first-order model, which was trained with induced la-
ceding entities. Trying to analyze the models bels, was remarkably decreased according to the dis-
according to the distance between the entities, we in- tance. The use of the induced label is easy in the
evitably used the recall. Because this evaluation of the conventional model, but, it would not guarantee the
models with recall alone has its limitations, so this performance improvement in the model without the
result was presented as an auxiliary indicator. The observation symbol sharing. The models’ recall scores
initial recall scores were calculated only for the en- have risen sharply at the points where distance is 1
tities not having precursors, and then the recall in the i2b2 2012 and CoNLL. There is a small num-
scores were updated sequentially by adding entities ber of the entities having gap (order) value as 0 in

Table 8 The numbers of the models’ expectation and the correct on each held-out set. (shaded: pi-CRF)
Data Model Whole instances Distanced instances
gold expected correct gold expected correct
i2b2 (clinical) 1st-order CRF 9,623 7,361 5,708 8,552 6,188 4,927
2nd-order CRF 7,785 6,046 6,547 5,245
pi-CRF 7,542 5,775 6,397 5,012
SNUH (clinical) 1st-order CRF 17,125 15,326 12,128 12,520 10,813 8,540
2nd-order CRF 15,702 12,053 11,088 8,524
pi-CRF 15,516 12,322 11,012 8,758
CoNLL (general) 1st-order CRF 5,629 3,785 2,856 4,331 2,693 2,184
2nd-order CRF 2,778 2,529 1,986 1,799
pi-CRF 1,855 1,704 1,280 1,218
Lee and Choi BMC Medical Informatics and Decision Making (2019) 19:132 Page 11 of 13

Fig. 6 Recalls along the distances between named entities in each corpus. The y-axis denotes recall score, numeric labels on the x-axis denote
sets of entities having outside labels between the entity and its precursors as much as the numbers. (feature set: set #3)

both data collections: the numbers of entities having model was derived from a CRF model that used virtual
gap value as zero are 50, 30, and 707 in the i2b2, evidence [45], which incorporates prior knowledge of
CoNLL, and SNUH data respectively. prototypes to make the model prefer to label consecutive
values for a subsequence that matches a predefined
Discussion pattern.
In this study, we investigated the performance of the pi- In contrast, our model used the formula to extend
CRF model which is a newly proposed variant of the the hidden variables by joining two variables, y and a.
CRF model designed particularly for extracting clinical The two hidden variables are conjoined in Eq. 3: the
NEs: the proposed model utilizes long-distance depend- variables are multiplied, and they are merged into a
ency relationships between the NEs separated by mul- new hidden variable instead of using two hidden vari-
tiple non-entities in the CRF. The model fragments the ables in the mathematics form. Because the variable a
non-entity state into fine-grained non-entity states and has values only if the value of the corresponding y is
treats them as an information transmission medium the non-entity state, the multiplication implies that
based on the first-order linear chain CRF structure. The the newly derived hidden variable y’ has multiplied
evaluation results showed that the proposed pi-CRF non-entity hidden states and the total number of the
model is more effective at clinical NER. Although the pi- hidden states is expanded compared to the conven-
CRF model was slower than the first-order CRF, it was tional CRF.
significantly faster than the second-order CRF model The design of the pi-CRF model improves the CRF
even while expressing higher-order transition dependen- model’s expressive power according to the evaluation re-
cies between NEs. sults. The transition information is implemented as fea-
Higher-order transitions are expressed as fixed-size ture functions, and thus the transition information
label transitions in the conventional CRF model. Because ultimately affects the model as one of many features. Le-
the NEs tend to be separated by arbitrary distances, the veraging the high-order label transition information, the
conventional higher-order CRF model using a fixed-size pi-CRF shows better performance than other higher-
state transition dependency has limited ability to express order CRF models in many evaluation settings. It could
the desired information. One study of a semi-Markov be the model’s advantageous attribute that the proposed
CRF [19] proposed that consecutive units with the same model preserves relatively compact model complexity
label can be presented as a group although the model than other higher-order models.
could not convey the information from the separated Avoiding the data sparseness problem was another sig-
NEs. Based on this idea, we developed an induction nificant concern in the model design. We expected the
method to present consecutive non-entity labels grouped data sparseness problem to occur because the induction
by their precursor information. Besides, the mathemat- algorithm divides a single non-entity state into multiple
ical formula (Eq. 3) used to express the proposed CRF states, and thus the frequency of observation features
Lee and Choi BMC Medical Informatics and Decision Making (2019) 19:132 Page 12 of 13

related to the outside label symbols was divided. In the Availability of data and materials
model development phase, we observed that the model’s The executable Java file is available at the GitHub repository https://github.
com/jinsamdol/precursor-induced_CRF. However, all data were extracted
performance was inferior without the feature sharing im- from the medical record of patients who had been admitted at SNUH, so
plemented by Eq. (4). For the clinical NER tasks, the re- the clinical data cannot be shared with other research groups without
sults showed that the pi-CRF design increased the F1 permission.

score compared with the first- and second-order CRF

models while reducing the model’s speed loss. Further im- Ethics approval and consent to participate
This study was approved by the Institutional Review Board (IRB) of Seoul
provement could be achieved by testing models trained National University Hospital (IRB No.1612–086-815). Informed consent was
with more sophisticated features on various data sets, or waived due to the retrospective nature of this study.
porting the model onto the state-of-the-art neural NER
architecture with long short-term memory [15]. Consent for publication
Not applicable.

Conclusion Competing interests

This study proposed a variant of the CRF model to im- The authors declare that they have no competing interests.
prove the model’s expressive power for clinical NER
Author details
problems, in which NEs tend to be separated from non- 1
Interdisciplinary Program for Bioengineering, Graduate School, Seoul
entities. The proposed pi-CRF utilizes non-entity labels National University, 103 Daehak-ro, Jongno-gu, Seoul 03080, South Korea.
2
between NEs as an information transmission medium Department of Biomedical Engineering, Seoul National University College of
Medicine, 103 Daehak-ro, Jongno-gu, Seoul 03080, South Korea. 3Institute of
that delivers the preceding entity information forward to Medical and Biological Engineering, Medical Research Center, Seoul National
the following entity. Our evaluation results showed that University, 101 Daehak-ro, Jongno-gu, Seoul 03080, South Korea.
the proposed model improves clinical NER performance
Received: 24 August 2018 Accepted: 3 July 2019
and reduces the computational complexity of the
second-order CRF. Despite some inherent limitations,
the results suggest that the utilization of non-entity la- References
bels could enable higher-order CRF model implementa- 1. Doan S, Conway M, Phuong TM, Ohno-Machado L. Natural language
tion while limiting the model’s complexity growth. We processing in biomedicine: a unified system architecture overview. Clin
Bioinformatics, Methods Mol Biol. 2014;1168:275–94. https://doi.org/10.1007/
plan to test the model on various NER datasets and also 978-1-4939-0847-9.
to port the model onto a neural NER architecture [15] 2. Patrick J, Li M. High accuracy information extraction of medication information
to further advance the clinical NER field. from clinical notes: 2009 i2b2 medication extraction challenge. J Am Med
Inform Assoc. 2010;17:524–7. https://doi.org/10.1136/jamia.2010.003939.
3. Sun W, Rumshisky A, Uzuner O. Evaluating temporal relations in clinical text:
Abbreviations 2012 i2b2 challenge. J Am Med Inform Assoc. 2013:1–8. https://doi.org/1
CoNLL: Conference on Natural Language Learning; CRF: Conditional Random 0.1136/amiajnl-2013-001628.
Field; EHR: Electronic Health Record; i2b2: Information for Integrating Biology 4. Uzuner Ö, South BR, Shen S, DuVall SL. 2010 i2b2/VA challenge on
and the Bedsides; NE: Named Entity; NER: Named Entity Recognition; concepts, assertions, and relations in clinical text. J Am Med Inform Assoc.
NLP: Natural Language Processing; pi-CRF: Precursor-Induced Conditional 2011;18:552–6. https://doi.org/10.1136/amiajnl-2011-000203.
Random Fields; POS: Part-of-Speech; SNUH: Seoul National University Hospital 5. Zhang Y, Zhang O, Wu Y, Lee H-J, Xu J, Xu H, et al. Psychiatric symptom
recognition without labeled data using distributional representations of
phrases and on-line knowledge. J Biomed Inform. 2017;75S:S129–37.
Acknowledgments 6. Jiang M, Chen Y, Liu M, Rosenbloom ST, Mani S, Denny JC, et al. A study of
The de-identified English clinical documents were provided by the i2b2 pro- machine-learning-based approaches to extract clinical entities and their
ject (US government project: U54LM008748) designed for the NLP Shared assertions from discharge summaries. J Am Med Inform Assoc. 2011;18:601–
Tasks. The authors would like to thank Dr. Sungbin Choi, Jong Won Lim, and 6. https://doi.org/10.1136/amiajnl-2011-000163.
Ha Young Jo who actively participated in the generation of Korean clinical 7. Murff HJ, FitzHenry F, Matheny ME, Gentry N, Kotter KL, Crimin K, et al.
NER data. Automated identification of postoperative complications within an
electronic medical record using natural language processing. JAMA - J Am
Authors’ contributions Med Assoc. 2011;306:848–55. https://doi.org/10.1001/jama.2011.1204.
Both authors contributed to the design of the study, analysis and 8. Nadkarni PM, Ohno-Machado L, Chapman WW. Natural language
interpretation of the data, drafting of the article, and final approval for processing: an introduction. J Am Med Inform Assoc. 2011;18:544–51.
submission. WL contributed to the conception of the study; carried out data https://doi.org/10.1136/amiajnl-2011-000464.
collection, model implementation, and analysis; and drafted the manuscript, 9. Yeh A, Morgan A, Colosimo M, Hirschman L. BioCreAtIvE task 1A: gene
with significant contributions from JC. WL and JC performed data mention finding evaluation. BMC Bioinformatics. 2005;6(Suppl 1):S2. https://
interpretation. Both authors read and approved the final manuscript. doi.org/10.1186/1471-2105-6-S1-S2.
10. Lafferty J, McCallum A, Pereira F. Conditional random Fields : probabilistic
models for segmenting and labeling sequence data. In: Proceedings of the
Funding 18th international conference on machine learning, vol. 2001; 2001. p. 282–9.
This work was supported by the Basic Science Research Program through 11. McCallum A, Li W. Early results for named entity recognition with
the National Research Foundation of Korea (NRF) funded by the Ministry of conditional random fields , feature induction and web-enhanced lexicons.
Education [No. NRF-2015R1D1A1A01058075]; and also supported by a grant In: Proceeding of CoNLL, vol. 2003; 2003. p. 188–91.
of the Korea Health Technology R&D Project through the Korea Health Indus- 12. Sha F, Pereira F. Shallow parsing with conditional random fields. In: Proceedings
try Development Institute (KHIDI), funded by the Ministry of Health &Welfare, of the 2003 conference of the north American chapter of the Association for
Republic of Korea [grant number HI14C1277]. Computational Linguistics on human language technology; 2003. p. 134–41.
Lee and Choi BMC Medical Informatics and Decision Making (2019) 19:132 Page 13 of 13

13. McDonald R, Pereira F. Identifying gene and protein mentions in text using 38. Andrew Kachites McCallum. MALLET: a machine learning for language
conditional random fields. BMC Bioinformatics. 2005;6(Suppl 1):S6. https:// toolkit. 2002. http://mallet.cs.umass.edu. Accessed 27 Mar 2013.
doi.org/10.1186/1471-2105-6-S1-S6. 39. Ng AY. Feature selection, L1 vs. L2 regularization, and rotational invariance.
14. Bethard S, Savova G, Chen W-T, Derczynski L, Pustejovsky J, Verhagen M. In: ICML 2004; 2004.
SemEval-2016 Task 12: Clinical TempEval. Proc 10th Int Conf Semant Eval 40. Lee W, Kim K, Lee EY, Choi J. Conditional random fields for clinical named
(SemEval 2016); 2016. p. 1052–62. https://doi.org/10.18653/v1/S16-1165. entity recognition: a comparative study using Korean clinical texts. Comput
15. Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C. Neural Biol Med. 2018;101:7–14.
architectures for named entity recognition. In: Proceedings of NAACL-HLT 41. Tjong EF, Sang K, De MF. Introduction to the CoNLL-2003 shared Task : language-
2016; 2016. p. 260–70. independent named entity recognition. In: Proceedings of the seventh
16. Liu Z, Yang M, Wang X, Chen Q, Tang B, Wang Z, et al. Entity recognition conference on natural language learning at HLT-NAACL 2003; 2003. p. 142–7.
from clinical texts via recurrent neural network. BMC Med Inform Decis Mak. 42. Xu Y, Wang Y, Liu T, Tsujii J, EI-C C. An end-to-end system to identify
2017;17(Suppl 2):53–60. temporal relation in discharge summaries: 2012 i2b2 challenge. J Am Med
17. Ratinov L, Roth D. Design challenges and misconceptions in named entity Inform Assoc. 2013;20:849–58. https://doi.org/10.1136/amiajnl-2012-001607.
recognition. In: Proceedings of the thirteenth conference on computational 43. Li L, Zhou R, Huang D. Two-phase biomedical named entity recognition
natural language learning; 2009. p. 147–55. using CRFs. Comput Biol Chem. 2009;33:334–8.
18. Ye N, Lee WS, Chieu HL, Wu D. Conditional random fields with high-order 44. Allam A, Krauthammer M. PySeqLab an open source Python package for
features for sequence labeling. In: Advances in neural information sequence labeling and segmentation. https://pyseqlab.readthedocs.io.
processing systems; 2009. p. 2196–204. 45. Li X. On the Use of Virtual Evidence in Conditional Random Fields; 2009. p.
19. Sarawagi S, Cohen WW. Semi-Markov conditional random fields for 1289–97.
information extraction. In: Advances in neural information processing
systems; 2005. p. 1185–92.
20. Cuong NV, Ye N, Lee WS, Chieu HL. Conditional random field with high-
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in
order dependencies for sequence labeling and segmentation. ACM JMLR.
published maps and institutional affiliations.
2014;15:981–1009.
21. Fersini E, Messina E, Felici G, Roth D. Soft-constrained inference for named
entity recognition. Inf Process Manag. 2014;50:807–19. https://doi.org/10.1
016/j.ipm.2014.04.005.
22. Li X, Wang Y-Y, Acero A. Extracting structured information from user queries
with semi-supervised conditional random fields. In: Proc 32nd Int ACM SIGIR
Conf res dev Inf Retr - SIGIR ‘09; 2009. p. 572. https://doi.org/10.1145/1571
941.1572039.
23. Li L, Jin L, Jiang Z, Song D, Huang D. Biomedical named entity recognition
based on extended Recurrent Neural Networks. In: Proc - 2015 IEEE Int Conf
Bioinforma biomed BIBM 2015; 2015. p. 649–52.
24. Chalapathy R, Borzeshi EZ, Piccardi M. Bidirectional LSTM-CRF for clinical
concept extraction. In: Proceedings of the clinical natural language
processing workshop; 2016. p. 7–12. http://arxiv.org/abs/1611.08373.
25. Dernoncourt F, Lee JY, Uzuner O, Szolovits P. De-identification of patient notes
with recurrent neural networks. J Am Med Informatics Assoc. 2017;24:596–606.
26. Jauregi Unanue I, Zare Borzeshi E, Piccardi M, et al. J Biomed Inform. 2017;
76:102–9. https://doi.org/10.1016/j.jbi.2017.11.007.
27. Jagannatha A, Yu H. Bidirectional recurrent neural networks for medical
event detection in electronic health records. In: NAACL-HLT; 2016. p. 473–
82. http://arxiv.org/abs/1606.07953.
28. Sahu SK, Anand A. Recurrent neural network models for disease name
recognition using domain invariant features. In: Proceedings of the 54th
annual meeting of the Association for Computational Linguistics; 2016. p.
2216–25. http://arxiv.org/abs/1606.09371.
29. Kholghi M, Sitbon L, Zuccon G, Nguyen A. Active learning: a step towards
automating medical concept extraction. J Am Med Informatics Assoc. 2016;23:
289–96.
30. Hao T, Pan X, Gu Z, Qu Y, Weng H. A pattern learning-based method for
temporal expression extraction and normalization from multi-lingual
heterogeneous clinical texts. BMC Med Inform Decis Mak. 2018;18(Suppl 1):22.
31. Wang P, Hao T, Yan J, Jin L. Large-scale extraction of drug–disease pairs
from the medical literature. J Assoc Inf Sci Technol. 2017;68:2649–61.
32. Stubbs A, Kotfila C, Xu H, Uzuner Ö. Identifying risk factors for heart disease
over time: overview of 2014 i2b2/UTHealth shared task track 2. J Biomed
Inform. 2015;58:S67–77.
33. Soysal E, Wang J, Jiang M, Wu Y, Pakhomov S, Liu H, et al. CLAMP - a toolkit
for efficiently building customized clinical natural language processing
pipelines. J Am Med Informatics Assoc. 2018;25:331–6.
34. Demner-Fushman D, Rogers WJ, Aronson AR. MetaMap lite: an evaluation of
a new Java implementation of MetaMap. J Am Med Informatics Assoc.
2017;24:841–4.
35. Sutton C, McCallum A. An introduction to conditional random fields. Found
Trends Mach Learn. 2011;4:267–373.
36. Tjong EF, Sang K. Representing text chunks; 1995. p. 173–9.
37. Freitag D, McCallum A. Information extraction with HMM structures learned
by stochastic optimization. In: AAAI; 2000.

Bidirectional LSTM-CRF For Named Entity Recognition
No ratings yet
Bidirectional LSTM-CRF For Named Entity Recognition
10 pages
Med7: A Transferable Clinical Natural Language Processing Model For Electronic Health Records
No ratings yet
Med7: A Transferable Clinical Natural Language Processing Model For Electronic Health Records
17 pages
Drug Specification Named Entity Recognition Base On BiLSTM-CRF Model PDF
No ratings yet
Drug Specification Named Entity Recognition Base On BiLSTM-CRF Model PDF
5 pages
DKhurana NERTask
No ratings yet
DKhurana NERTask
14 pages
Multi-Task Learning For Chinese Clinical Named Entity Recognition With External Knowledge
No ratings yet
Multi-Task Learning For Chinese Clinical Named Entity Recognition With External Knowledge
11 pages
Future of AI in Biomedicine and Biotechnology - (Chapter 12 Shaping The Future of Healthcare With BERT in Clinical Text... )
No ratings yet
Future of AI in Biomedicine and Biotechnology - (Chapter 12 Shaping The Future of Healthcare With BERT in Clinical Text... )
20 pages
Clincal Name Entity
No ratings yet
Clincal Name Entity
8 pages
Named Entity Recognition Using Ensemble
No ratings yet
Named Entity Recognition Using Ensemble
5 pages
Medinform 2018 4 E50
No ratings yet
Medinform 2018 4 E50
14 pages
Annexe 3.1 - Article Hybrid Medical Named Entity Recognition... (2023)
No ratings yet
Annexe 3.1 - Article Hybrid Medical Named Entity Recognition... (2023)
31 pages
A EFFECTIVE APPROACH FOR SIMPLIFYING AGGREGATE MENTIONS IN BIOMEDICAL TEXT Ijariie21056
No ratings yet
A EFFECTIVE APPROACH FOR SIMPLIFYING AGGREGATE MENTIONS IN BIOMEDICAL TEXT Ijariie21056
6 pages
Medical Concept Representation Learning From Electronic Health Records and Its Application On Heart Failure Prediction
No ratings yet
Medical Concept Representation Learning From Electronic Health Records and Its Application On Heart Failure Prediction
45 pages
Final Synopsis
No ratings yet
Final Synopsis
15 pages
What Is CRF?
No ratings yet
What Is CRF?
3 pages
DipanshuKhurana NERTask
No ratings yet
DipanshuKhurana NERTask
8 pages
Conditional Random Field Model (CRF)
No ratings yet
Conditional Random Field Model (CRF)
31 pages
Week 7 1 02 20 2025
No ratings yet
Week 7 1 02 20 2025
35 pages
Zhang Et Al
No ratings yet
Zhang Et Al
11 pages
Reference
No ratings yet
Reference
18 pages
BiLSTM-CRF for Clinical Text Classification
No ratings yet
BiLSTM-CRF for Clinical Text Classification
5 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
13 pages
Healthcare 1
No ratings yet
Healthcare 1
8 pages
Example-Based Named Entity Recognition
No ratings yet
Example-Based Named Entity Recognition
15 pages
A Survey On Named Entity Recognition
No ratings yet
A Survey On Named Entity Recognition
8 pages
1 s2.0 S2001037023002933 Main
No ratings yet
1 s2.0 S2001037023002933 Main
9 pages
Bidirectional LSTM-CRF For Biomedical Named Entity Recognition
No ratings yet
Bidirectional LSTM-CRF For Biomedical Named Entity Recognition
4 pages
Clinical Text Classification With Rule-Based Features and Knowledge-Guided Convolutional Neural Networks
No ratings yet
Clinical Text Classification With Rule-Based Features and Knowledge-Guided Convolutional Neural Networks
9 pages
Clinical Concept Annotation With Contextual Word e
No ratings yet
Clinical Concept Annotation With Contextual Word e
31 pages
Making Medical Experts Fit4ner: Transforming Domain Knowledge Through Machine Learning-Based Named Entity Recognition
No ratings yet
Making Medical Experts Fit4ner: Transforming Domain Knowledge Through Machine Learning-Based Named Entity Recognition
20 pages
Mining and Classifying Medical Documents
No ratings yet
Mining and Classifying Medical Documents
4 pages
Assignment 3: Named Entity Recognition: Training Dataset
No ratings yet
Assignment 3: Named Entity Recognition: Training Dataset
4 pages
A Survey On Named Entity Recognition
No ratings yet
A Survey On Named Entity Recognition
12 pages
Pone 0192360
No ratings yet
Pone 0192360
19 pages
Exploiting Unlabeled Texts With Clustering-Based Instance Selection For Medical Relation Classification
No ratings yet
Exploiting Unlabeled Texts With Clustering-Based Instance Selection For Medical Relation Classification
10 pages
Improving Large Language Models For Clinical Named Entity Recognition Via Prompt Engineering
No ratings yet
Improving Large Language Models For Clinical Named Entity Recognition Via Prompt Engineering
10 pages
Urdu Ner Bilstm
No ratings yet
Urdu Ner Bilstm
15 pages
Beyond Accuracy: Automated De-Identification of Large Real-World Clinical Text Datasets
No ratings yet
Beyond Accuracy: Automated De-Identification of Large Real-World Clinical Text Datasets
13 pages
Grounded Recurrent Neural Networks for Medical Text
No ratings yet
Grounded Recurrent Neural Networks for Medical Text
11 pages
Author Postprint
No ratings yet
Author Postprint
8 pages
2021 Acl-Long 216
No ratings yet
2021 Acl-Long 216
13 pages
Paper 007
No ratings yet
Paper 007
11 pages
Unified Named Entity Recognition As Word-Word Relation Classification
No ratings yet
Unified Named Entity Recognition As Word-Word Relation Classification
12 pages
A Hybrid Deep Learning Approach For Phenotype Prediction From Clinical Notes
No ratings yet
A Hybrid Deep Learning Approach For Phenotype Prediction From Clinical Notes
11 pages
Effective Matching Patients Paper
No ratings yet
Effective Matching Patients Paper
12 pages
基于自然语言理解的医学知识图谱构建与疾病辅助诊断系统张厚昌
No ratings yet
基于自然语言理解的医学知识图谱构建与疾病辅助诊断系统张厚昌
103 pages
P Final
No ratings yet
P Final
5 pages
Nested Ner
No ratings yet
Nested Ner
10 pages
Wa0067.
No ratings yet
Wa0067.
10 pages
A Hybrid Named Entity Recognition System For Aviat
No ratings yet
A Hybrid Named Entity Recognition System For Aviat
10 pages
Joint Recognition of Handwritten Text and Named Entities With A Neural End-To-End Model
No ratings yet
Joint Recognition of Handwritten Text and Named Entities With A Neural End-To-End Model
6 pages
FINAL
No ratings yet
FINAL
16 pages
Hand Written Recognition
No ratings yet
Hand Written Recognition
10 pages
Medical Coding With Clinical Notes
No ratings yet
Medical Coding With Clinical Notes
13 pages
GenAI NLP Project
No ratings yet
GenAI NLP Project
20 pages
Multitask Transformer
No ratings yet
Multitask Transformer
16 pages
CIE - Paper - AICS - 2023 - FineTuneIt - BHartmann - Example Paper
No ratings yet
CIE - Paper - AICS - 2023 - FineTuneIt - BHartmann - Example Paper
8 pages
s12911 024 02793 9
No ratings yet
s12911 024 02793 9
13 pages
Annexe 3.5 - Article GREED... (2024)
No ratings yet
Annexe 3.5 - Article GREED... (2024)
8 pages
Digital Product PPT by Varun Gaur 10 E
No ratings yet
Digital Product PPT by Varun Gaur 10 E
12 pages
Project Scope Statement For Wilmonts Case
88% (8)
Project Scope Statement For Wilmonts Case
4 pages
MODULE-4 RM Vipul 2
No ratings yet
MODULE-4 RM Vipul 2
18 pages
MCS 202
No ratings yet
MCS 202
6 pages
Spruce Up Mobile Application
No ratings yet
Spruce Up Mobile Application
5 pages
Score Report
No ratings yet
Score Report
3 pages
All The Serials For Microsoft Office 2010 Taringa
No ratings yet
All The Serials For Microsoft Office 2010 Taringa
8 pages
ZT SB Ox
No ratings yet
ZT SB Ox
5 pages
Cpac Puritan Bennet 420E PDF
No ratings yet
Cpac Puritan Bennet 420E PDF
56 pages
Designers' Guide to Lorem Ipsum
No ratings yet
Designers' Guide to Lorem Ipsum
3 pages
DLL Tle-Ict 9 q2 w1
No ratings yet
DLL Tle-Ict 9 q2 w1
10 pages
Restore Drawings After Bad Numbering - Tekla User Assistance
No ratings yet
Restore Drawings After Bad Numbering - Tekla User Assistance
7 pages
Understanding Relations and Functions
No ratings yet
Understanding Relations and Functions
19 pages
12 CS EM Kalviexpress Practical Hand Book
100% (1)
12 CS EM Kalviexpress Practical Hand Book
29 pages
Nema Receptacles Wiring Diagram
100% (1)
Nema Receptacles Wiring Diagram
5 pages
Mr. Cua Rice - Authentic ST25 Rice - English
No ratings yet
Mr. Cua Rice - Authentic ST25 Rice - English
20 pages
Parapet User Guide
No ratings yet
Parapet User Guide
17 pages
Data Can Be Classified As Qualitative or Quantitative.: Recall From Yesterday
No ratings yet
Data Can Be Classified As Qualitative or Quantitative.: Recall From Yesterday
5 pages
SOA OSB Interview Questions
No ratings yet
SOA OSB Interview Questions
5 pages
R08791 Modelingandsimulat Objec VE:: L TP C 3 3 To
No ratings yet
R08791 Modelingandsimulat Objec VE:: L TP C 3 3 To
1 page
Intellipaat Data Science Course Overview
No ratings yet
Intellipaat Data Science Course Overview
13 pages
OSI Model Questions
No ratings yet
OSI Model Questions
13 pages
Free PowerPoint Business Model Presentation Slide Template
No ratings yet
Free PowerPoint Business Model Presentation Slide Template
15 pages
Stock Maintenance System Overview
No ratings yet
Stock Maintenance System Overview
8 pages
OS Lab5
No ratings yet
OS Lab5
8 pages
GNS3 Network Simulation Project
No ratings yet
GNS3 Network Simulation Project
6 pages
How To Use Cash App in 6 Steps, With Photos - His 2
No ratings yet
How To Use Cash App in 6 Steps, With Photos - His 2
4 pages
Steps Geogeb
No ratings yet
Steps Geogeb
10 pages
Cisco Switch Layer 2 VLAN Setup Guide
No ratings yet
Cisco Switch Layer 2 VLAN Setup Guide
6 pages
Linked List Node Deletion Guide
No ratings yet
Linked List Node Deletion Guide
13 pages

Precursor-Induced Conditional Random

Uploaded by

Precursor-Induced Conditional Random

Uploaded by

Lee and Choi BMC Medical Informatics and Decision Making (2019) 19:132

TECHNICAL ADVANCE Open Access

Precursor-induced conditional random

Background model’s ability to capture dependencies between NE la-

Fig. 1 NER perspective of a text; the label O represents a non-entity

Fig. 3 The transformation from conventional first-order CRF to precursor-induced CRF

score compared with the first- and second-order CRF

Conclusion Competing interests

You might also like