{"id":92042,"date":"2020-07-10T11:00:00","date_gmt":"2020-07-10T08:00:00","guid":{"rendered":"https:\/\/examples.javacodegeeks.com\/?p=92042"},"modified":"2020-07-06T16:27:57","modified_gmt":"2020-07-06T13:27:57","slug":"apache-solr-opennlp-tutorial-part-1","status":"publish","type":"post","link":"https:\/\/examples.javacodegeeks.com\/apache-solr-opennlp-tutorial\/","title":{"rendered":"Apache Solr OpenNLP Tutorial &#8211; Part 1"},"content":{"rendered":"<p>This is an article about Apache Solr OpenNLP.<\/p>\n<h2 class=\"wp-block-heading\"><a name=\"introduction\"><\/a>1. Introduction<\/h2>\n<p><a aria-label=\"undefined (opens in a new tab)\" href=\"https:\/\/en.wikipedia.org\/wiki\/Natural_language_processing\" target=\"_blank\" rel=\"noreferrer noopener\">Natural Language Processing (NLP)<\/a> is a field focusing on processing and analyzing human languages by using computers. Using NLP in a search will help search service providers to have a better understanding of what their customers really mean in their searches, thus to run search queries more efficiently and to return better search results to meet customer&#8217;s needs. <\/p>\n<p>In this example, we are going to show you how Apache Solr OpenNLP integration works and how customer&#8217;s search experience can be improved by using OpenNLP.<\/p>\n<\/p>\n<div class=\"toc\">\n<h3>Table Of Contents<\/h3>\n<dl>\n<dt><a href=\"#introduction\">1. Introduction<\/a><\/dt>\n<dt><a href=\"#technologies_used\">2. Technologies Used<\/a><\/dt>\n<dt><a href=\"#solr_opennlp_integration\">3. Solr OpenNLP Integration<\/a><\/dt>\n<dd>\n<dl>\n<dt><a href=\"#the_basics\">3.1. The Basics<\/a><\/dt>\n<\/dl>\n<dl>\n<dt><a href=\"#setting_up_the_integration\">3.2. Setting Up The Integration<\/a><\/dt>\n<\/dl>\n<dl>\n<dt><a href=\"#examples\">3.3. Examples<\/a><\/dt>\n<\/dl>\n<\/dd>\n<dt><a href=\"#download\">4. Download the Sample Data File<\/a><\/dt>\n<\/dl>\n<\/div>\n<h2 class=\"wp-block-heading\"><a name=\"technologies_used\"><\/a>2. Technologies Used<\/h2>\n<div class=\"wp-block-image is-style-default\">\n<figure class=\"alignright size-large\"><img decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/examples.javacodegeeks.com\/wp-content\/uploads\/2015\/07\/apache-solr-logo.jpg\" alt=\"Apache Solr OpenNLP\" class=\"wp-image-25294\" srcset=\"https:\/\/examples.javacodegeeks.com\/wp-content\/uploads\/2015\/07\/apache-solr-logo.jpg 150w, https:\/\/examples.javacodegeeks.com\/wp-content\/uploads\/2015\/07\/apache-solr-logo-70x70.jpg 70w\" sizes=\"(max-width: 150px) 100vw, 150px\" \/><\/figure>\n<\/div>\n<p>The steps and commands described in this example are for <a aria-label=\"undefined (opens in a new tab)\" href=\"https:\/\/lucene.apache.org\/solr\/downloads.html#solr-852\" target=\"_blank\" rel=\"noreferrer noopener\">Apache Solr 8.5<\/a> on Windows 10. <a aria-label=\"undefined (opens in a new tab)\" href=\"http:\/\/opennlp.sourceforge.net\/models-1.5\/\" target=\"_blank\" rel=\"noreferrer noopener\">Pre-trained models for OpenNLP 1.5<\/a> are used in this example. To train your own models, please refer to Apache OpenNLP for details. The JDK version we use to run the SolrCloud in this example is <a aria-label=\"undefined (opens in a new tab)\" href=\"https:\/\/jdk.java.net\/java-se-ri\/13\" target=\"_blank\" rel=\"noreferrer noopener\">OpenJDK 13<\/a>.<\/p>\n<p>Before we start, please make sure your computer meets the <a aria-label=\"undefined (opens in a new tab)\" href=\"https:\/\/lucene.apache.org\/solr\/8_5_0\/SYSTEM_REQUIREMENTS.html\" target=\"_blank\" rel=\"noreferrer noopener\">system requirements<\/a>. Also, please download the binary release of <a aria-label=\"undefined (opens in a new tab)\" href=\"https:\/\/lucene.apache.org\/solr\/downloads.html#solr-852\" target=\"_blank\" rel=\"noreferrer noopener\">Apache Solr 8.5<\/a>.<\/p>\n<h2 class=\"wp-block-heading\"><a name=\"solr_opennlp_integration\"><\/a>3. Solr OpenNLP Integration<\/h2>\n<h3 class=\"wp-block-heading\"><a name=\"the_basics\"><\/a>3.1 The Basics<\/h3>\n<p>NLP processes and analyzes natural languages. To understand how it works with Solr, we need to know where analysis takes place. There are two places in which text analysis happens in Solr: index time and query time. Analyzers consist of tokenizers and filters are used in both places. At index time, token stream generated from analysis is added to an index, and terms are defined for a field. At query time, terms generated from analysis of the values being searched for are matched against those stored in the index.<\/p>\n<p>Solr OpenNLP integration provides several analysis components: an OpenNLP tokenizer, an OpenNLP part-of-speech tagging filter, an OpenNLP phrase chunking filter, and an OpenNLP lemmatization filter. In addition to these analysis components, Solr also provides an update request processor to extract named entities using an OpenNLP NER model. Let&#8217;s see how to set up the OpenNLP integration in the next section.<\/p>\n<h3 class=\"wp-block-heading\"><a name=\"setting_up_the_integration\"><\/a>3.2 Setting Up The Integration<\/h3>\n<h4 class=\"wp-block-heading\">3.2.1 Putting jars on classpath<\/h4>\n<p>To use the OpenNLP components, we must add additional jars to Solr\u2019s classpath. There are a few options to make other plugins available to Solr as described in <a href=\"https:\/\/lucene.apache.org\/solr\/guide\/8_5\/solr-plugins.html#installing-plugins\" target=\"_blank\" aria-label=\"undefined (opens in a new tab)\" rel=\"noreferrer noopener\">Solr Plugins<\/a>. We use the standard approach the directive in <code>solrconfig.xml<\/code> as shown below:<\/p>\n<pre class=\"brush:xml\">  &lt;lib dir=\"${solr.install.dir:..\/..\/..\/..\/..\/}\/contrib\/analysis-extras\/lucene-libs\" regex=\".*\\.jar\" \/&gt;\n  &lt;lib dir=\"${solr.install.dir:..\/..\/..\/..\/..\/}\/contrib\/analysis-extras\/lib\" regex=\".*\\.jar\"\/&gt;\n  &lt;lib path=\"${solr.install.dir:..\/..\/..\/..\/..\/}\/dist\/solr-analysis-extras-8.5.2.jar\"\/&gt;<\/pre>\n<h4 class=\"wp-block-heading\">3.2.2 Adding required resources to configset<\/h4>\n<p>We need to go to the <a href=\"http:\/\/opennlp.sourceforge.net\/models-1.5\/\" target=\"_blank\" aria-label=\"undefined (opens in a new tab)\" rel=\"noreferrer noopener\">Apache OpenNLP<\/a> website to download the pre-trained models for the OpenNLP 1.5. They are fully compatible with Apache OpenNLP 1.9.2.<\/p>\n<p>Also, we need to Download and unzip <a aria-label=\"undefined (opens in a new tab)\" href=\"https:\/\/www.apache.org\/dyn\/closer.cgi\/opennlp\/opennlp-1.9.2\/apache-opennlp-1.9.2-bin.zip\" target=\"_blank\" rel=\"noreferrer noopener\">apache-opennlp-1.9.2-bin.zip<\/a>. Then go to <a aria-label=\"undefined (opens in a new tab)\" href=\"https:\/\/raw.githubusercontent.com\/richardwilly98\/elasticsearch-opennlp-auto-tagging\/master\/src\/main\/resources\/models\/en-lemmatizer.dict\" target=\"_blank\" rel=\"noreferrer noopener\">the URL<\/a> for the lemmatizer training file and save it as <code>en-lemmatizer.dict<\/code>. Next, let&#8217;s train the lemmatizer model by going to the apache-opennlp bin directory we just unzipped and execute the following command:<\/p>\n<pre class=\"brush:bash\">opennlp LemmatizerTrainerME -model en-lemmatizer.bin -lang en -data \/path\/to\/en-lemmatizer.dict -encoding UTF-8<\/pre>\n<p>The output will be:<\/p>\n<pre class=\"brush:bash\">D:\\java\\apache-opennlp-1.9.2\\bin\\opennlp  LemmatizerTrainerME -model en-lemmatizer.bin -lang en -data en-lemmatizer.dict -encoding UTF-8\nIndexing events with TwoPass using cutoff of 5\n\n        Computing event counts...  done. 301403 events\n        Indexing...  done.\nSorting and merging events... done. Reduced 301403 events to 297776.\nDone indexing in 12.63 s.\nIncorporating indexed data for training...\ndone.\n        Number of Event Tokens: 297776\n            Number of Outcomes: 431\n          Number of Predicates: 69122\n...done.\nComputing model parameters ...\nPerforming 100 iterations.\n  1:  ... loglikelihood=-1828343.1766817758     0.6328968192088333\n  2:  ... loglikelihood=-452189.7053988425      0.8768227257193857\n  3:  ... loglikelihood=-211064.45129182754     0.9506474719893299\n  4:  ... loglikelihood=-132189.41066218202     0.9667289310325379\n  5:  ... loglikelihood=-95473.57210099498      0.9743997239576249\n  6:  ... loglikelihood=-74894.1935626126       0.9794693483475613\n  7:  ... loglikelihood=-61926.78603360762      0.9831056757895575\n  8:  ... loglikelihood=-53069.688593599894     0.9856438058015348\n  9:  ... loglikelihood=-46655.871988011146     0.9877439839683082\n 10:  ... loglikelihood=-41801.50242291499      0.9893265826816587\n 11:  ... loglikelihood=-37998.3432302135       0.9905608106090517\n 12:  ... loglikelihood=-34935.28330041361      0.9915196597246876\n 13:  ... loglikelihood=-32412.054562775495     0.9923325248919221\n 14:  ... loglikelihood=-30294.265898838632     0.9930259486468284\n 15:  ... loglikelihood=-28488.56869622921      0.9936132022574427\n 16:  ... loglikelihood=-26928.219836178196     0.9941340995278747\n 17:  ... loglikelihood=-25564.30190282366      0.9945521444710238\n 18:  ... loglikelihood=-24360.17747454469      0.9948806083549268\n 19:  ... loglikelihood=-23287.876071165214     0.9951924831537842\n 20:  ... loglikelihood=-22325.67856216146      0.9954744975995594\n 21:  ... loglikelihood=-21456.463866609512     0.9956437062670246\n 22:  ... loglikelihood=-20666.55205863062      0.9958195505685079\n 23:  ... loglikelihood=-19944.878511734943     0.9959953948699913\n 24:  ... loglikelihood=-19282.394080308608     0.9961845104395112\n 25:  ... loglikelihood=-18671.622759799964     0.9963570369239855\n 26:  ... loglikelihood=-18106.330904658702     0.9965196099574324\n 27:  ... loglikelihood=-17581.276656339858     0.9966357335527516\n 28:  ... loglikelihood=-17092.017845561142     0.9967551749650799\n 29:  ... loglikelihood=-16634.763075140218     0.9968712985603992\n 30:  ... loglikelihood=-16206.255072812444     0.9969675152536637\n 31:  ... loglikelihood=-15803.678430914795     0.9970902744830011\n 32:  ... loglikelihood=-15424.585970349774     0.9971964446272931\n 33:  ... loglikelihood=-15066.839470007333     0.9972860256865392\n 34:  ... loglikelihood=-14728.561581223981     0.9973722889287764\n 35:  ... loglikelihood=-14408.0965283682       0.9974618699880227\n 36:  ... loglikelihood=-14103.977768763696     0.9975381797792324\n 37:  ... loglikelihood=-13814.901208117759     0.997581311400351\n 38:  ... loglikelihood=-13539.702883330643     0.9976509855575426\n 39:  ... loglikelihood=-13277.340262355141     0.9976941171786611\n 40:  ... loglikelihood=-13026.876491519615     0.997747202250807\n 41:  ... loglikelihood=-12787.467059226115     0.997770426969871\n 42:  ... loglikelihood=-12558.348451930819     0.9978069229569713\n 43:  ... loglikelihood=-12338.828461585104     0.9978401011270625\n 44:  ... loglikelihood=-12128.277868995287     0.9978799149311719\n 45:  ... loglikelihood=-11926.123279039519     0.9979164109182722\n 46:  ... loglikelihood=-11731.840924598388     0.9979263643692996\n 47:  ... loglikelihood=-11544.951288710525     0.9979595425393908\n 48:  ... loglikelihood=-11365.01442068802      0.9979993563435002\n 49:  ... loglikelihood=-11191.625843150192     0.9980557592326553\n 50:  ... loglikelihood=-11024.41296410639      0.9980955730367648\n 51:  ... loglikelihood=-10863.031922256287     0.9981320690238651\n 52:  ... loglikelihood=-10707.16480518142      0.998158611559938\n 53:  ... loglikelihood=-10556.517189551667     0.9981917897300292\n 54:  ... loglikelihood=-10410.81596029103      0.998211696632084\n 55:  ... loglikelihood=-10269.807372149957     0.9982249679001204\n 56:  ... loglikelihood=-10133.255322511463     0.998241556985166\n 57:  ... loglikelihood=-10000.939808806212     0.998268099521239\n 58:  ... loglikelihood=-9872.655547678738      0.9982913242403029\n 59:  ... loglikelihood=-9748.21073625716       0.9983311380444123\n 60:  ... loglikelihood=-9627.425938565784      0.9983609983974944\n 61:  ... loglikelihood=-9510.13308241278       0.9983941765675856\n 62:  ... loglikelihood=-9396.174554023093      0.9984140834696403\n 63:  ... loglikelihood=-9285.40237935212       0.9984240369206677\n 64:  ... loglikelihood=-9177.677482426574      0.9984306725546859\n 65:  ... loglikelihood=-9072.869012278017      0.9984638507247772\n 66:  ... loglikelihood=-8970.853731087096      0.9984738041758044\n 67:  ... loglikelihood=-8871.515457047639      0.9984804398098227\n 68:  ... loglikelihood=-8774.74455624773       0.9985036645288866\n 69:  ... loglikelihood=-8680.437478540607      0.9985136179799139\n 70:  ... loglikelihood=-8588.496332961782      0.9985268892479504\n 71:  ... loglikelihood=-8498.82849876398       0.9985401605159869\n 72:  ... loglikelihood=-8411.346268577978      0.9985467961500052\n 73:  ... loglikelihood=-8325.966520610862      0.9985633852350507\n 74:  ... loglikelihood=-8242.610417120377      0.9985799743200964\n 75:  ... loglikelihood=-8161.203126709595      0.9985832921371055\n 76:  ... loglikelihood=-8081.67356824808       0.9985932455881328\n 77:  ... loglikelihood=-8003.954174455548      0.9986197881242058\n 78:  ... loglikelihood=-7927.98067338463       0.9986264237582241\n 79:  ... loglikelihood=-7853.691886230994      0.9986463306602787\n 80:  ... loglikelihood=-7781.029540039709      0.9986463306602787\n 81:  ... loglikelihood=-7709.938094037545      0.9986496484772879\n 82:  ... loglikelihood=-7640.364578431137      0.9986695553793427\n 83:  ... loglikelihood=-7572.258444629405      0.9986927800984065\n 84:  ... loglikelihood=-7505.5714259522365     0.9986994157324247\n 85:  ... loglikelihood=-7440.257407963147      0.998706051366443\n 86:  ... loglikelihood=-7376.272307657644      0.9987093691834521\n 87:  ... loglikelihood=-7313.57396080075       0.9987259582684976\n 88:  ... loglikelihood=-7252.12201677264       0.9987458651705524\n 89:  ... loglikelihood=-7191.877840340969      0.9987525008045707\n 90:  ... loglikelihood=-7132.80441983102       0.9987657720726071\n 91:  ... loglikelihood=-7074.866281202995      0.9987823611576527\n 92:  ... loglikelihood=-7018.029407597901      0.9987989502426983\n 93:  ... loglikelihood=-6962.261163947286      0.9988022680597074\n 94:  ... loglikelihood=-6907.530226271331      0.9988055858767165\n 95:  ... loglikelihood=-6853.806515329603      0.9988221749617622\n 96:  ... loglikelihood=-6801.061134311805      0.9988221749617622\n 97:  ... loglikelihood=-6749.266310279299      0.9988321284127896\n 98:  ... loglikelihood=-6698.39533909719       0.998845399680826\n 99:  ... loglikelihood=-6648.422533612705      0.9988487174978351\n100:  ... loglikelihood=-6599.323174858488      0.9988586709488625\nWriting lemmatizer model ... done (1.541s)\n\nWrote lemmatizer model to\npath: D:\\en-lemmatizer.bin\n\nExecution time: 339.410 seconds<\/pre>\n<p>In this example, we only have English in our test data so we just need to download English pre-trained models and train the English lemmatizer model as described above. Now all required resources are ready and we just need to copy these resources to the <code>jcg_example_configs<\/code> configSet under the directory <code>${solr.install.dir}\\server\\solr\\configsets\\jcg_example_configs\\conf\\opennlp<\/code>. This is because resources are typically resolved from the configSet by Solr. And as we are going to run Solr in standalone mode, the configSet is on the file system. If we run Solr in SolrCloud mode, the configSet and resources are stored in ZooKeeper and shared by Solr instances in SolrCloud. The output below shows models in the opennlp directory:<div style=\"display:inline-block; margin: 15px 0;\"> <div id=\"adngin-JavaCodeGeeks_incontent_video-0\" style=\"display:inline-block;\"><\/div> <\/div><\/p>\n<pre class=\"brush:bash\">D:\\Java\\solr-8.5.2\\server\\solr\\configsets\\jcg_example_configs\\conf\\opennlp&gt;dir\n Volume in drive D is Data\n Volume Serial Number is 24EC-FE37\n\n Directory of D:\\Java\\solr-8.5.2\\server\\solr\\configsets\\jcg_example_configs\\conf\\opennlp\n\n06\/30\/2020  11:28 PM    &lt;DIR&gt;          .\n06\/30\/2020  11:28 PM    &lt;DIR&gt;          ..\n06\/28\/2020  08:25 PM         2,560,304 en-chunker.bin\n06\/30\/2020  11:24 PM         1,632,029 en-lemmatizer.bin\n06\/28\/2020  08:24 PM         5,030,307 en-ner-date.bin\n06\/28\/2020  08:25 PM         5,110,658 en-ner-location.bin\n06\/28\/2020  08:25 PM         4,806,234 en-ner-money.bin\n06\/28\/2020  08:25 PM         5,297,172 en-ner-organization.bin\n06\/28\/2020  08:25 PM         4,728,645 en-ner-percentage.bin\n06\/28\/2020  08:25 PM         5,207,953 en-ner-person.bin\n06\/28\/2020  08:25 PM         4,724,357 en-ner-time.bin\n06\/28\/2020  08:26 PM        36,345,477 en-parser-chunking.bin\n06\/28\/2020  08:24 PM         5,696,197 en-pos-maxent.bin\n06\/28\/2020  08:24 PM         3,975,786 en-pos-perceptron.bin\n06\/28\/2020  08:24 PM            98,533 en-sent.bin\n06\/28\/2020  08:24 PM           439,890 en-token.bin\n06\/30\/2020  10:34 PM                35 stop.pos.txt\n              15 File(s)     85,653,577 bytes\n               2 Dir(s)  47,963,561,984 bytes free<\/pre>\n<h4 class=\"wp-block-heading\">3.2.3 Defining Schema<\/h4>\n<p>Before we define the schema, it would be good to have some basic understanding of <code>TextField<\/code>, <code>analyzer<\/code>, <code>tokenizer<\/code> and <code>filter<\/code> in Solr. <code>TextField<\/code> is the basic type for configurable text analysis. It allows the specification of custom text analyzers consist of a tokenizer and a list of token filters. Different analyzers may be specified for indexing and querying. For more info on customizing your analyzer chain, please see <a href=\"http:\/\/lucene.apache.org\/solr\/guide\/understanding-analyzers-tokenizers-and-filters.html#understanding-analyzers-tokenizers-and-filters\" target=\"_blank\" aria-label=\"undefined (opens in a new tab)\" rel=\"noreferrer noopener\">Understanding Analyzers, Tokenizers, and Filters<\/a>.<\/p>\n<p>Now let&#8217;s see how to configure OpenNLP analysis components.<\/p>\n<p>The OpenNLP Tokenizer takes two language-specific binary model files as required parameters: a sentence detector model and a tokenizer model. For example:<\/p>\n<pre class=\"brush:xml\">&lt;analyzer&gt;\n  &lt;tokenizer class=\"solr.OpenNLPTokenizerFactory\" sentenceModel=\"en-sent.bin\" tokenizerModel=\"en-token.bin\"\/&gt;\n&lt;\/analyzer&gt;<\/pre>\n<p>The OpenNLP Part-Of-Speech Filter takes one language-specific binary model file as the required parameter: a POS tagger model. Normally we don\u2019t want to include punctuation in the index, so the <code>TypeTokenFilter<\/code> is included in the examples below, with <code>stop.pos.txt<\/code> containing the following:<\/p>\n<p><code>stop.pos.txt<\/code><\/p>\n<pre class=\"brush:bash\">#\n$\n''\n``\n,\n-LRB-\n-RRB-\n:\n.<\/pre>\n<p>The OpenNLP Part-Of-Speech Filter example:<\/p>\n<pre class=\"brush:xml\">&lt;analyzer&gt;\n  &lt;tokenizer class=\"solr.OpenNLPTokenizerFactory\" sentenceModel=\"en-sent.bin\" tokenizerModel=\"en-token.bin\"\/&gt;\n  &lt;filter class=\"solr.OpenNLPPOSFilterFactory\" posTaggerModel=\"en-pos-maxent.bin\"\/&gt;\n  &lt;filter class=\"solr.TypeAsPayloadFilterFactory\"\/&gt;\n  &lt;filter class=\"solr.TypeTokenFilterFactory\" types=\"stop.pos.txt\"\/&gt;\n&lt;\/analyzer&gt;<\/pre>\n<p>The OpenNLP Phrase Chunking Filter takes one language-specific binary model file as the required parameter: a phrase chunker model. For example:<\/p>\n<pre class=\"brush:xml\">&lt;analyzer&gt;\n  &lt;tokenizer class=\"solr.OpenNLPTokenizerFactory\" sentenceModel=\"en-sent.bin\" tokenizerModel=\"en-token.bin\"\/&gt;\n  &lt;filter class=\"solr.OpenNLPChunkerFilterFactory\" chunkerModel=\"en-chunker.bin\"\/&gt;\n&lt;\/analyzer&gt;<\/pre>\n<p>The OpenNLP Lemmatizer Filter takes two optional parameters: a dictionary-based lemmatizer and a model-based lemmatizer. In this example, we perform model-based lemmatization only, preserving the original token and emitting the lemma as a synonym.<\/p>\n<pre class=\"brush:xml\">&lt;analyzer&gt;\n  &lt;tokenizer class=\"solr.OpenNLPTokenizerFactory\" sentenceModel=\"en-sent.bin\" tokenizerModel=\"en-token.bin\"\/&gt;\n  &lt;filter class=\"solr.KeywordRepeatFilterFactory\"\/&gt;\n  &lt;filter class=\"solr.OpenNLPLemmatizerFilterFactory\" lemmatizerModel=\"en-lemmatizer.bin\"\/&gt;\n  &lt;filter class=\"solr.RemoveDuplicatesTokenFilterFactory\"\/&gt;\n&lt;\/analyzer&gt;<\/pre>\n<p>Put everything above together, the analyzer configuration would be:[ulp id=&#8217;nVHijykNs8bcCQYH&#8217;]<\/p>\n<pre class=\"brush:xml\">&lt;analyzer&gt;\n  &lt;tokenizer class=\"solr.OpenNLPTokenizerFactory\" sentenceModel=\"en-sent.bin\" tokenizerModel=\"en-token.bin\"\/&gt;\n  &lt;filter class=\"solr.OpenNLPPOSFilterFactory\" posTaggerModel=\"en-pos-maxent.bin\"\/&gt;\n  &lt;filter class=\"solr.OpenNLPChunkerFilterFactory\" chunkerModel=\"en-chunker.bin\"\/&gt;\n  &lt;filter class=\"solr.KeywordRepeatFilterFactory\"\/&gt;\n  &lt;filter class=\"solr.OpenNLPLemmatizerFilterFactory\" lemmatizerModel=\"en-lemmatizer.bin\"\/&gt;\n  &lt;filter class=\"solr.RemoveDuplicatesTokenFilterFactory\"\/&gt;\n  &lt;filter class=\"solr.TypeAsPayloadFilterFactory\"\/&gt;\n  &lt;filter class=\"solr.TypeTokenFilterFactory\" types=\"stop.pos.txt\"\/&gt;\n&lt;\/analyzer&gt;<\/pre>\n<p>Open <code>managed-schema<\/code> file with any text editor in <code>jcg_example_configs<\/code> configSet under the directory <code>${solr.install.dir}\\server\\solr\\configsets\\jcg_example_configs\\conf<\/code>. Add field type <code>text_en_opennlp<\/code> using OpenNLP-based analysis components described above, then field <em>introduction<\/em> using <code>text_en_opennlp<\/code> field type as below:<\/p>\n<pre class=\"brush:xml\">&lt;!-- English TextField OpenNLP --&gt;\n&lt;fieldType name=\"text_en_opennlp\" class=\"solr.TextField\" positionIncrementGap=\"100\"&gt;\n  &lt;analyzer&gt;\n    &lt;tokenizer class=\"solr.OpenNLPTokenizerFactory\" sentenceModel=\"opennlp\/en-sent.bin\" tokenizerModel=\"opennlp\/en-token.bin\"\/&gt;\n    &lt;filter class=\"solr.OpenNLPPOSFilterFactory\" posTaggerModel=\"opennlp\/en-pos-maxent.bin\"\/&gt;\n    &lt;filter class=\"solr.OpenNLPChunkerFilterFactory\" chunkerModel=\"opennlp\/en-chunker.bin\"\/&gt;\n    &lt;filter class=\"solr.KeywordRepeatFilterFactory\"\/&gt;\n    &lt;filter class=\"solr.OpenNLPLemmatizerFilterFactory\" lemmatizerModel=\"opennlp\/en-lemmatizer.bin\"\/&gt;\n    &lt;filter class=\"solr.RemoveDuplicatesTokenFilterFactory\"\/&gt;\n    &lt;filter class=\"solr.TypeAsPayloadFilterFactory\"\/&gt;\n    &lt;filter class=\"solr.TypeTokenFilterFactory\" types=\"opennlp\/stop.pos.txt\"\/&gt;\n  &lt;\/analyzer&gt;\n&lt;\/fieldType&gt;\n&lt;field name=\"introduction\" type=\"text_en_opennlp\" indexed=\"true\" stored=\"true\"\/&gt;<\/pre>\n<p>If extracting named entities from text seems interesting and useful in your use cases, we can set up <a aria-label=\"undefined (opens in a new tab)\" href=\"https:\/\/lucene.apache.org\/solr\/guide\/8_5\/update-request-processors.html\" target=\"_blank\" rel=\"noreferrer noopener\">Update Request Processors<\/a> by using OpenNLP NER models. This step is optional and out of scope of this article. Feel free to check out details usage of <code>solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory<\/code> in <a aria-label=\"undefined (opens in a new tab)\" href=\"https:\/\/lucene.apache.org\/solr\/8_5_0\/\/solr-analysis-extras\/org\/apache\/solr\/update\/processor\/OpenNLPExtractNamedEntitiesUpdateProcessorFactory.html\" target=\"_blank\" rel=\"noreferrer noopener\">the java doc<\/a>. An example configuration to extract company names from introduction field by using OpenNLP NER model <code>en-ner-organization.bin<\/code> is listed below:<\/p>\n<p>Open <code>solrconfig.xml<\/code>, add the following snippet:<\/p>\n<pre class=\"brush:xml\">&lt;updateRequestProcessorChain name=\"single-extract\"&gt;\n  &lt;processor class=\"solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory\"&gt;\n    &lt;str name=\"modelFile\"&gt;opennlp\/en-ner-organization.bin&lt;\/str&gt;\n    &lt;str name=\"analyzerFieldType\"&gt;text_en_opennlp&lt;\/str&gt;\n    &lt;str name=\"source\"&gt;introduction&lt;\/str&gt;\n    &lt;str name=\"dest\"&gt;company&lt;\/str&gt;\n  &lt;\/processor&gt;\n  &lt;processor class=\"solr.LogUpdateProcessorFactory\" \/&gt;\n  &lt;processor class=\"solr.RunUpdateProcessorFactory\" \/&gt;\n&lt;\/updateRequestProcessorChain&gt;<\/pre>\n<p>Open <code>managed-schema<\/code>, add the following field:<\/p>\n<pre class=\"brush:xml\">&lt;field name=\"company\" type=\"text_general\" indexed=\"true\" stored=\"true\"\/&gt;<\/pre>\n<p>For your convenience, a <code>jcg_example_configs.zip<\/code> file containing all configurations and schema is attached to the article. You can simply download and extract it to the directory <code>${solr.install.dir}\\server\\solr\\configsets\\jcg_example_configs<\/code>.<\/p>\n<h4 class=\"wp-block-heading\">3.2.4 Starting Solr Instance<\/h4>\n<p>For simplicity, instead of setting up a SolrCloud on your local machine as demonstrated in <a aria-label=\"undefined (opens in a new tab)\" href=\"https:\/\/examples.javacodegeeks.com\/apache-solr-clustering-example\/\" target=\"_blank\" rel=\"noreferrer noopener\">Apache Solr Clustering Example<\/a>, we run a single Solr instance on our local machine with the command below:<\/p>\n<pre class=\"brush:bash\">bin\\solr.cmd start<\/pre>\n<p>The output would be:<\/p>\n<pre class=\"brush:bash\">D:\\Java\\solr-8.5.2&gt;bin\\solr.cmd start\nWaiting up to 30 to see Solr running on port 8983\nStarted Solr server on port 8983. Happy searching!<\/pre>\n<h4 class=\"wp-block-heading\">3.2.5 Creating A New Core<\/h4>\n<p>As we are running Solr in standalone mode, we need to create a new core named <code>jcg_example_core<\/code> with the <code>jcg_example_configs<\/code> configSet on the local machine. For example, we can do it via the CoreAdmin API:<\/p>\n<pre class=\"brush:bash\">curl -G http:\/\/localhost:8983\/solr\/admin\/cores --data-urlencode action=CREATE --data-urlencode name=jcg_example_core --data-urlencode configSet=jcg_example_configs<\/pre>\n<p>The output would be:<\/p>\n<pre class=\"brush:bash\">D:\\Java\\solr-8.5.2&gt;curl -G http:\/\/localhost:8983\/solr\/admin\/cores --data-urlencode action=CREATE --data-urlencode name=jcg_example_core --data-urlencode configSet=jcg_example_configs\n{\n  \"responseHeader\":{\n    \"status\":0,\n    \"QTime\":641},\n  \"core\":\"jcg_example_core\"}<\/pre>\n<p>If you would like to remove a core, you can do it via the CoreAdmin API as below:<\/p>\n<pre class=\"brush:bash\">curl -G http:\/\/localhost:8983\/solr\/admin\/cores --data-urlencode action=UNLOAD --data-urlencode core=jcg_example_core --data-urlencode deleteInstanceDir=true<\/pre>\n<p>The output would be:<\/p>\n<pre class=\"brush:bash\">D:\\Java\\solr-8.5.2&gt;curl -G http:\/\/localhost:8983\/solr\/admin\/cores --data-urlencode action=UNLOAD --data-urlencode core=jcg_example_core --data-urlencode deleteInstanceDir=true\n{\n  \"responseHeader\":{\n    \"status\":0,\n    \"QTime\":37}}<\/pre>\n<h3 class=\"wp-block-heading\"><a name=\"examples\"><\/a>3.3 Examples<\/h3>\n<p>Time to see some examples of how Solr OpenNLP works.<\/p>\n<h4 class=\"wp-block-heading\">3.3.1 Indexing Data<\/h4>\n<p>Download and extract the sample data file attached to this article and index the <code>articles-opennlp.csv<\/code> with the following command:<\/p>\n<pre class=\"brush:bash\">java -jar -Dc=jcg_example_core -Dauto post.jar articles-opennlp.csv<\/pre>\n<p>The output would be:<\/p>\n<pre class=\"brush:bash\">SimplePostTool version 5.0.0\nPosting files to [base] url http:\/\/localhost:8983\/solr\/jcg_example_core\/update...\nEntering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log\nPOSTing file articles-opennlp.csv (text\/csv) to [base]\n1 files indexed.\nCOMMITting Solr index changes to http:\/\/localhost:8983\/solr\/jcg_example_core\/update...\nTime spent: 0:00:00.670<\/pre>\n<p>The <code>post.jar<\/code> is included in Solr distribution file under <code>example\\exampledocs<\/code>. It is also be included in the sample data file attached to this article.<\/p>\n<h4 class=\"wp-block-heading\">3.3.2 Semantic Search Examples<\/h4>\n<p>As we know that when searching with Solr if we specify the field to be searched, we may get more relevant results. But in a real-world applications, normally customers have no idea about which field to look at or they are just provided with a simple text input box to enter keywords they are looking for. For example, is it possible to search the author of the article &#8220;Java Array Example&#8221; without knowing any field to search for? With OpenNLP integration we&#8217;ve set up, we can do this easily by saying a sentence &#8220;author of java array example&#8221; to Solr as below:<\/p>\n<pre class=\"brush:bash\">curl -G http:\/\/localhost:8983\/solr\/jcg_example_core\/select --data-urlencode \"q=author of java array example\" --data-urlencode fl=title,author,introduction<\/pre>\n<p>The output would be:<\/p>\n<pre class=\"brush:bash\">{\n  \"responseHeader\":{\n    \"status\":0,\n    \"QTime\":2,\n    \"params\":{\n      \"q\":\"author of java array example\",\n      \"fl\":\"title,author,introduction\"}},\n  \"response\":{\"numFound\":1,\"start\":0,\"docs\":[\n      {\n        \"title\":[\"Java Array Example\"],\n        \"author\":[\"Kevin Yang\"],\n        \"introduction\":\" Kevin Yang wrote an article with title \\\"Java Array Example\\\" for Microsoft in Beijing China in June 2018\"}]\n  }}<\/pre>\n<p>How exciting! It seems we are talking to the search engine with natural human language. Let&#8217;s try another one by saying &#8220;articles written by James Cook in 2019&#8221; as below:<\/p>\n<pre class=\"brush:bash\">curl -G http:\/\/localhost:8983\/solr\/jcg_example_core\/select --data-urlencode \"q=articles written by James Cook in 2019\" --data-urlencode fl=title,author,introduction,score<\/pre>\n<p>The output would be:<\/p>\n<pre class=\"brush:bash\">{\n  \"responseHeader\":{\n    \"status\":0,\n    \"QTime\":5,\n    \"params\":{\n      \"q\":\"articles written by James Cook in 2019\",\n      \"fl\":\"title,author,introduction,score\"}},\n  \"response\":{\"numFound\":13,\"start\":0,\"maxScore\":3.8089,\"docs\":[\n      {\n        \"title\":[\"The Apache Solr Cookbook\"],\n        \"author\":[\"James Cook\"],\n        \"introduction\":\"This article was written by James Cook in Oxford UK in 2019\",\n        \"score\":3.8089},\n      {\n        \"title\":[\"The Solr Runbook\"],\n        \"author\":[\"James Cook\"],\n        \"introduction\":\"This article was written by James Cook in London UK in 2020\",\n        \"score\":2.5949912},\n      {\n        \"title\":[\"Java ArrayList 101\"],\n        \"author\":[\"Kevin Yang\"],\n        \"introduction\":\"This article was written by Kevin Yang in Sydney Australia in 2020\",\n        \"score\":0.1685594},\n      {\n        \"title\":[\"Java Remote Method Invocation Example\"],\n        \"author\":[\"Kevin Yang\"],\n        \"introduction\":\"This article was written by Kevin Yang in Beijing China in 2010\",\n        \"score\":0.1685594},\n      {\n        \"title\":[\"Thread\"],\n        \"author\":[\"Kevin Yang\"],\n        \"introduction\":\"This article was written by Kevin Yang in Sydney Australia in 2020\",\n        \"score\":0.1685594},\n      {\n        \"title\":[\"Java StringTokenizer Example\"],\n        \"author\":[\"Kevin Yang\"],\n        \"introduction\":\"This article was written by Kevin Yang in Sydney Australia in 2020\",\n        \"score\":0.1685594},\n      {\n        \"title\":[\"Java HashMap Example\"],\n        \"author\":[\"Evan Swing\"],\n        \"introduction\":\"This article was written by Evan Swing in Boston USA in 2018\",\n        \"score\":0.1685594},\n      {\n        \"title\":[\"Java HashSet Example\"],\n        \"author\":[\"Evan Swing\"],\n        \"introduction\":\"This article was written by Kevin Yang in Sydney Australia in 2020\",\n        \"score\":0.1685594},\n      {\n        \"title\":[\"Apache SolrCloud Example\"],\n        \"author\":[\"Kevin Yang\"],\n        \"introduction\":\"This article was written by Kevin Yang in Sydney Australia in 2020\",\n        \"score\":0.1685594},\n      {\n        \"title\":[\"The Solr REST API\"],\n        \"author\":[\"Steven Thomas\"],\n        \"introduction\":\"This article was written by Steven Thomas in Seattle USA in 2020\",\n        \"score\":0.1685594}]\n  }}<\/pre>\n<p>From the output above we can see that the article &#8220;The Apache Solr Cookbook&#8221; written by James Cook in 2019 is returned as the first result with the highest relevance score.<\/p>\n<h2 class=\"wp-block-heading\"><a name=\"download\"><\/a>4. Download the Sample Data File<\/h2>\n<div class=\"download\"><strong>Download<\/strong><br \/>\nYou can download the sample data file of this example here: <a href=\"https:\/\/examples.javacodegeeks.com\/wp-content\/uploads\/2020\/07\/apache-solr-opennlp-tutorial.zip\"><strong>Apache Solr OpenNLP Tutorial &#8211; Part 1<\/strong><\/a><\/div>\n","protected":false},"excerpt":{"rendered":"<p>This is an article about Apache Solr OpenNLP. 1. Introduction Natural Language Processing (NLP) is a field focusing on processing and analyzing human languages by using computers. Using NLP in a search will help search service providers to have a better understanding of what their customers really mean in their searches, thus to run search &hellip;<\/p>\n","protected":false},"author":223,"featured_media":25294,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[949],"tags":[946,1220,45492,1226],"class_list":["post-92042","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-apache-solr","tag-apache-solr","tag-example","tag-opennlp","tag-tutorial"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Apache Solr OpenNLP - Part 1 - Examples Java Code Geeks - 2026<\/title>\n<meta name=\"description\" content=\"This is an article about Apache Solr OpenNLP. 1. Introduction Natural Language Processing (NLP) is a field focusing on processing and analyzing human\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/examples.javacodegeeks.com\/apache-solr-opennlp-tutorial\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Apache Solr OpenNLP - Part 1 - Examples Java Code Geeks - 2026\" \/>\n<meta property=\"og:description\" content=\"This is an article about Apache Solr OpenNLP. 1. Introduction Natural Language Processing (NLP) is a field focusing on processing and analyzing human\" \/>\n<meta property=\"og:url\" content=\"https:\/\/examples.javacodegeeks.com\/apache-solr-opennlp-tutorial\/\" \/>\n<meta property=\"og:site_name\" content=\"Examples Java Code Geeks\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/javacodegeeks\" \/>\n<meta property=\"article:published_time\" content=\"2020-07-10T08:00:00+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/examples.javacodegeeks.com\/wp-content\/uploads\/2015\/07\/apache-solr-logo.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"150\" \/>\n\t<meta property=\"og:image:height\" content=\"150\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Kevin Yang\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@javacodegeeks\" \/>\n<meta name=\"twitter:site\" content=\"@javacodegeeks\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Kevin Yang\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"13 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/examples.javacodegeeks.com\/apache-solr-opennlp-tutorial\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/examples.javacodegeeks.com\/apache-solr-opennlp-tutorial\/\"},\"author\":{\"name\":\"Kevin Yang\",\"@id\":\"https:\/\/examples.javacodegeeks.com\/#\/schema\/person\/3f6ff013b8204dc7f5e6d2660fbc9f8f\"},\"headline\":\"Apache Solr OpenNLP Tutorial &#8211; Part 1\",\"datePublished\":\"2020-07-10T08:00:00+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/examples.javacodegeeks.com\/apache-solr-opennlp-tutorial\/\"},\"wordCount\":1291,\"commentCount\":2,\"publisher\":{\"@id\":\"https:\/\/examples.javacodegeeks.com\/#organization\"},\"image\":{\"@id\":\"https:\/\/examples.javacodegeeks.com\/apache-solr-opennlp-tutorial\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/examples.javacodegeeks.com\/wp-content\/uploads\/2015\/07\/apache-solr-logo.jpg\",\"keywords\":[\"Apache Solr\",\"example\",\"OpenNLP\",\"tutorial\"],\"articleSection\":[\"Apache Solr\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/examples.javacodegeeks.com\/apache-solr-opennlp-tutorial\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/examples.javacodegeeks.com\/apache-solr-opennlp-tutorial\/\",\"url\":\"https:\/\/examples.javacodegeeks.com\/apache-solr-opennlp-tutorial\/\",\"name\":\"Apache Solr OpenNLP - Part 1 - Examples Java Code Geeks - 2026\",\"isPartOf\":{\"@id\":\"https:\/\/examples.javacodegeeks.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/examples.javacodegeeks.com\/apache-solr-opennlp-tutorial\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/examples.javacodegeeks.com\/apache-solr-opennlp-tutorial\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/examples.javacodegeeks.com\/wp-content\/uploads\/2015\/07\/apache-solr-logo.jpg\",\"datePublished\":\"2020-07-10T08:00:00+00:00\",\"description\":\"This is an article about Apache Solr OpenNLP. 1. Introduction Natural Language Processing (NLP) is a field focusing on processing and analyzing human\",\"breadcrumb\":{\"@id\":\"https:\/\/examples.javacodegeeks.com\/apache-solr-opennlp-tutorial\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/examples.javacodegeeks.com\/apache-solr-opennlp-tutorial\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/examples.javacodegeeks.com\/apache-solr-opennlp-tutorial\/#primaryimage\",\"url\":\"https:\/\/examples.javacodegeeks.com\/wp-content\/uploads\/2015\/07\/apache-solr-logo.jpg\",\"contentUrl\":\"https:\/\/examples.javacodegeeks.com\/wp-content\/uploads\/2015\/07\/apache-solr-logo.jpg\",\"width\":150,\"height\":150},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/examples.javacodegeeks.com\/apache-solr-opennlp-tutorial\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/examples.javacodegeeks.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Java Development\",\"item\":\"https:\/\/examples.javacodegeeks.com\/category\/java-development\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Enterprise Java\",\"item\":\"https:\/\/examples.javacodegeeks.com\/category\/java-development\/enterprise-java\/\"},{\"@type\":\"ListItem\",\"position\":4,\"name\":\"Apache Solr\",\"item\":\"https:\/\/examples.javacodegeeks.com\/category\/java-development\/enterprise-java\/apache-solr\/\"},{\"@type\":\"ListItem\",\"position\":5,\"name\":\"Apache Solr OpenNLP Tutorial &#8211; Part 1\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/examples.javacodegeeks.com\/#website\",\"url\":\"https:\/\/examples.javacodegeeks.com\/\",\"name\":\"Java Code Geeks\",\"description\":\"Java Examples and Code Snippets\",\"publisher\":{\"@id\":\"https:\/\/examples.javacodegeeks.com\/#organization\"},\"alternateName\":\"JCG\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/examples.javacodegeeks.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/examples.javacodegeeks.com\/#organization\",\"name\":\"Exelixis Media P.C.\",\"url\":\"https:\/\/examples.javacodegeeks.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/examples.javacodegeeks.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/examples.javacodegeeks.com\/wp-content\/uploads\/2022\/06\/exelixis-logo.png\",\"contentUrl\":\"https:\/\/examples.javacodegeeks.com\/wp-content\/uploads\/2022\/06\/exelixis-logo.png\",\"width\":864,\"height\":246,\"caption\":\"Exelixis Media P.C.\"},\"image\":{\"@id\":\"https:\/\/examples.javacodegeeks.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/javacodegeeks\",\"https:\/\/x.com\/javacodegeeks\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/examples.javacodegeeks.com\/#\/schema\/person\/3f6ff013b8204dc7f5e6d2660fbc9f8f\",\"name\":\"Kevin Yang\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/examples.javacodegeeks.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/2efb55f26af9d8752be93a78f2cdd9b2529df1f087c7b8901b68dbe11b7cf5ee?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/2efb55f26af9d8752be93a78f2cdd9b2529df1f087c7b8901b68dbe11b7cf5ee?s=96&d=mm&r=g\",\"caption\":\"Kevin Yang\"},\"description\":\"A software design and development professional with seventeen years\u2019 experience in the IT industry, especially with Java EE and .NET, I have worked for software companies, scientific research institutes and websites.\",\"sameAs\":[\"https:\/\/www.linkedin.com\/in\/kevinyang2050\/\"],\"url\":\"https:\/\/examples.javacodegeeks.com\/author\/kevin-yang\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Apache Solr OpenNLP - Part 1 - Examples Java Code Geeks - 2026","description":"This is an article about Apache Solr OpenNLP. 1. Introduction Natural Language Processing (NLP) is a field focusing on processing and analyzing human","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/examples.javacodegeeks.com\/apache-solr-opennlp-tutorial\/","og_locale":"en_US","og_type":"article","og_title":"Apache Solr OpenNLP - Part 1 - Examples Java Code Geeks - 2026","og_description":"This is an article about Apache Solr OpenNLP. 1. Introduction Natural Language Processing (NLP) is a field focusing on processing and analyzing human","og_url":"https:\/\/examples.javacodegeeks.com\/apache-solr-opennlp-tutorial\/","og_site_name":"Examples Java Code Geeks","article_publisher":"https:\/\/www.facebook.com\/javacodegeeks","article_published_time":"2020-07-10T08:00:00+00:00","og_image":[{"width":150,"height":150,"url":"https:\/\/examples.javacodegeeks.com\/wp-content\/uploads\/2015\/07\/apache-solr-logo.jpg","type":"image\/jpeg"}],"author":"Kevin Yang","twitter_card":"summary_large_image","twitter_creator":"@javacodegeeks","twitter_site":"@javacodegeeks","twitter_misc":{"Written by":"Kevin Yang","Est. reading time":"13 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/examples.javacodegeeks.com\/apache-solr-opennlp-tutorial\/#article","isPartOf":{"@id":"https:\/\/examples.javacodegeeks.com\/apache-solr-opennlp-tutorial\/"},"author":{"name":"Kevin Yang","@id":"https:\/\/examples.javacodegeeks.com\/#\/schema\/person\/3f6ff013b8204dc7f5e6d2660fbc9f8f"},"headline":"Apache Solr OpenNLP Tutorial &#8211; Part 1","datePublished":"2020-07-10T08:00:00+00:00","mainEntityOfPage":{"@id":"https:\/\/examples.javacodegeeks.com\/apache-solr-opennlp-tutorial\/"},"wordCount":1291,"commentCount":2,"publisher":{"@id":"https:\/\/examples.javacodegeeks.com\/#organization"},"image":{"@id":"https:\/\/examples.javacodegeeks.com\/apache-solr-opennlp-tutorial\/#primaryimage"},"thumbnailUrl":"https:\/\/examples.javacodegeeks.com\/wp-content\/uploads\/2015\/07\/apache-solr-logo.jpg","keywords":["Apache Solr","example","OpenNLP","tutorial"],"articleSection":["Apache Solr"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/examples.javacodegeeks.com\/apache-solr-opennlp-tutorial\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/examples.javacodegeeks.com\/apache-solr-opennlp-tutorial\/","url":"https:\/\/examples.javacodegeeks.com\/apache-solr-opennlp-tutorial\/","name":"Apache Solr OpenNLP - Part 1 - Examples Java Code Geeks - 2026","isPartOf":{"@id":"https:\/\/examples.javacodegeeks.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/examples.javacodegeeks.com\/apache-solr-opennlp-tutorial\/#primaryimage"},"image":{"@id":"https:\/\/examples.javacodegeeks.com\/apache-solr-opennlp-tutorial\/#primaryimage"},"thumbnailUrl":"https:\/\/examples.javacodegeeks.com\/wp-content\/uploads\/2015\/07\/apache-solr-logo.jpg","datePublished":"2020-07-10T08:00:00+00:00","description":"This is an article about Apache Solr OpenNLP. 1. Introduction Natural Language Processing (NLP) is a field focusing on processing and analyzing human","breadcrumb":{"@id":"https:\/\/examples.javacodegeeks.com\/apache-solr-opennlp-tutorial\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/examples.javacodegeeks.com\/apache-solr-opennlp-tutorial\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/examples.javacodegeeks.com\/apache-solr-opennlp-tutorial\/#primaryimage","url":"https:\/\/examples.javacodegeeks.com\/wp-content\/uploads\/2015\/07\/apache-solr-logo.jpg","contentUrl":"https:\/\/examples.javacodegeeks.com\/wp-content\/uploads\/2015\/07\/apache-solr-logo.jpg","width":150,"height":150},{"@type":"BreadcrumbList","@id":"https:\/\/examples.javacodegeeks.com\/apache-solr-opennlp-tutorial\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/examples.javacodegeeks.com\/"},{"@type":"ListItem","position":2,"name":"Java Development","item":"https:\/\/examples.javacodegeeks.com\/category\/java-development\/"},{"@type":"ListItem","position":3,"name":"Enterprise Java","item":"https:\/\/examples.javacodegeeks.com\/category\/java-development\/enterprise-java\/"},{"@type":"ListItem","position":4,"name":"Apache Solr","item":"https:\/\/examples.javacodegeeks.com\/category\/java-development\/enterprise-java\/apache-solr\/"},{"@type":"ListItem","position":5,"name":"Apache Solr OpenNLP Tutorial &#8211; Part 1"}]},{"@type":"WebSite","@id":"https:\/\/examples.javacodegeeks.com\/#website","url":"https:\/\/examples.javacodegeeks.com\/","name":"Java Code Geeks","description":"Java Examples and Code Snippets","publisher":{"@id":"https:\/\/examples.javacodegeeks.com\/#organization"},"alternateName":"JCG","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/examples.javacodegeeks.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/examples.javacodegeeks.com\/#organization","name":"Exelixis Media P.C.","url":"https:\/\/examples.javacodegeeks.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/examples.javacodegeeks.com\/#\/schema\/logo\/image\/","url":"https:\/\/examples.javacodegeeks.com\/wp-content\/uploads\/2022\/06\/exelixis-logo.png","contentUrl":"https:\/\/examples.javacodegeeks.com\/wp-content\/uploads\/2022\/06\/exelixis-logo.png","width":864,"height":246,"caption":"Exelixis Media P.C."},"image":{"@id":"https:\/\/examples.javacodegeeks.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/javacodegeeks","https:\/\/x.com\/javacodegeeks"]},{"@type":"Person","@id":"https:\/\/examples.javacodegeeks.com\/#\/schema\/person\/3f6ff013b8204dc7f5e6d2660fbc9f8f","name":"Kevin Yang","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/examples.javacodegeeks.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/2efb55f26af9d8752be93a78f2cdd9b2529df1f087c7b8901b68dbe11b7cf5ee?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/2efb55f26af9d8752be93a78f2cdd9b2529df1f087c7b8901b68dbe11b7cf5ee?s=96&d=mm&r=g","caption":"Kevin Yang"},"description":"A software design and development professional with seventeen years\u2019 experience in the IT industry, especially with Java EE and .NET, I have worked for software companies, scientific research institutes and websites.","sameAs":["https:\/\/www.linkedin.com\/in\/kevinyang2050\/"],"url":"https:\/\/examples.javacodegeeks.com\/author\/kevin-yang\/"}]}},"_links":{"self":[{"href":"https:\/\/examples.javacodegeeks.com\/wp-json\/wp\/v2\/posts\/92042","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/examples.javacodegeeks.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/examples.javacodegeeks.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/examples.javacodegeeks.com\/wp-json\/wp\/v2\/users\/223"}],"replies":[{"embeddable":true,"href":"https:\/\/examples.javacodegeeks.com\/wp-json\/wp\/v2\/comments?post=92042"}],"version-history":[{"count":0,"href":"https:\/\/examples.javacodegeeks.com\/wp-json\/wp\/v2\/posts\/92042\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/examples.javacodegeeks.com\/wp-json\/wp\/v2\/media\/25294"}],"wp:attachment":[{"href":"https:\/\/examples.javacodegeeks.com\/wp-json\/wp\/v2\/media?parent=92042"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/examples.javacodegeeks.com\/wp-json\/wp\/v2\/categories?post=92042"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/examples.javacodegeeks.com\/wp-json\/wp\/v2\/tags?post=92042"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}