{"id":44619,"date":"2015-09-27T22:11:26","date_gmt":"2015-09-27T19:11:26","guid":{"rendered":"http:\/\/www.javacodegeeks.com\/?p=44619"},"modified":"2023-12-07T11:00:43","modified_gmt":"2023-12-07T09:00:43","slug":"lucene-analysis-process-guide","status":"publish","type":"post","link":"https:\/\/www.javacodegeeks.com\/2015\/09\/lucene-analysis-process-guide.html","title":{"rendered":"Lucene Analysis Process Guide"},"content":{"rendered":"<p><em>This article is part of our Academy Course titled <a href=\"http:\/\/www.javacodegeeks.com\/2015\/09\/apache-lucene-fundamentals\/\">Apache Lucene Fundamentals<\/a>.<\/em><\/p>\n<p>In this course, you will get an introduction to Lucene. You will see why a library like this is important and then learn how searching works in Lucene. Moreover, you will learn how to integrate Lucene Search into your own applications in order to provide robust searching capabilities. Check it out <a href=\"http:\/\/www.javacodegeeks.com\/2015\/09\/apache-lucene-fundamentals\/\">here<\/a>!<\/p>\n<div class=\"toc\">\n<h4>Table Of Contents<\/h4>\n<dl>\n<dt><a href=\"#introduction\">1. Introduction<\/a><\/dt>\n<dt><a href=\"#using_analyzers\">2. Using Analyzers<\/a><\/dt>\n<dd>\n<dl>\n<dt><a href=\"#working\">2.1. Working process of Lucene Analyzer<\/a><\/dt>\n<\/dl>\n<\/dd>\n<dt><a href=\"#types\">3. Types of Analyzers<\/a><\/dt>\n<\/dl>\n<\/div>\n<h2><a name=\"introduction\"><\/a>1. Introduction<\/h2>\n<p>Analysis, in Lucene, is the process of converting field text into its most fundamental indexed representation, terms. In general, the tokens are referred to as words (we are discussing this topic in reference to the English language only) to the analyzers. However, for special analyzers the token can be with more than one words, which includes spaces also. These terms are used to determine what documents match a query during searching. For example, if you indexed this sentence in a field the terms might start with for and example, and so on, as separate terms in sequence. An analyzer is an encapsulation of the analysis process. An analyzer tokenizes text by performing any number of operations on it, which could include extracting words, discarding punctuation, removing accents from characters, lowercasing (also called normalizing), removing common words, reducing words to a root form (stemming), or changing words into the basic form (lemmatization). This process is also called tokenization, and the chunks of text pulled from a stream of text are called tokens. Tokens, combined with their associated field name, are terms.<\/p>\n<h2><a name=\"using_analyzers\"><\/a>2. Using Analyzers<\/h2>\n<p>Lucene&#8217;s primary goal is to facilitate information retrieval. The emphasis on retrieval is important. You want to throw gobs of text at Lucene and have them be richly searchable by the individual words within that text. In order for Lucene to know what \u201cwords\u201d are, it analyzes the text during indexing, extracting it to terms. These terms are the primitive building blocks for searching.<\/p>\n<p>Choosing the right analyzer is a crucial development decision with Lucene, and one size definitely doesn\u2019t fit all. Language is one factor, because each has its own unique features. Another factor to consider is the domain of the text being analyzed; different industries have different terminology, acronyms, and abbreviations that may deserve attention. No single analyzer will suffice for all situations. It\u2019s possible that none of the built-in analysis options are adequate for our needs, and we\u2019ll have to invest in creating a custom analysis solution; fortunately, Lucene\u2018s building blocks make this quite easy.<\/p>\n<p><b>In general Lucene Analyzers are designed using the following steps:<\/b><\/p>\n<p>Actual Text \u2013&gt; Basic Token Preparation \u2013&gt; lower case filtering \u2013&gt; stop word filtering (negation of not so useful words, which comprise in the 40-50% of words in a content) \u2013&gt; Filtering by Custom Logic \u2013&gt; Final Token preparation for indexing in lucene, which will be referenced in the searching of lucene.<\/p>\n<p>Different analyzers use different tokenizers and on the basis of that, the output token streams \u2013 sequences of group of text will be different.<\/p>\n<p>Stemmers are used to get the root of a word in question. For example, for the words running, ran, run etc. the root word will be run. This feature is used in analyzers to make the search scope higher in the content by the search api. If the root word is referred in the index then may be for the exact word, we can have more than one option in the index for searching and the probability of phrase matching may be higher here. So this concept, referred to as stemmers, is often used in analyser design.<\/p>\n<p>Stop words are the frequent, less useful words in written language. For English these words are \u201ca\u201d, \u201cthe\u201d, \u201cI\u201d etc.<\/p>\n<p>In different analyzers, the token streams are cleaned from Stop-words to make the index more useful for search results.<\/p>\n<h3><a name=\"working\"><\/a>2.1. Working process of Lucene Analyzer<\/h3>\n<p>The analysis process has three parts. To help illustrate the process, we are going to use the following raw text as an example. (Processing an HTML or PDF document to obtain the title, main text, and other fields to be analyzed is called parsing and is beyond the scope of analysis. Let\u2019s assume a parser has already extracted this text out of a larger document.)<\/p>\n<pre class=\"brush:bash\">&lt;h1&gt;Building a &lt;em&gt;top-notch&lt;\/em&gt; search engine&lt;\/h1&gt;<\/pre>\n<p>First, character filters pre-process the raw text to be analyzed. For example, the HTML Strip character filter removes HTML. We\u2019re now left with this text:<\/p>\n<pre class=\"brush:bash\">Building a top-notch search engine<\/pre>\n<p>Next, a tokenizer breaks up the pre-processed text into tokens. Tokens are usually words, but different tokenizers handle corner cases, such as \u201ctop-notch\u201d, differently. Some tokenizers, such as the Standard tokenizer, consider dashes to be word boundaries, so \u201ctop-notch\u201d would be two tokens (\u201ctop\u201d and \u201cnotch\u201d). Other tokenizers such as the Whitespace tokenizer only considers whitespace to be word boundaries, so \u201ctop-notch\u201d would be a single token. There are also some unusual tokenizers like the NGram tokenizer that generates tokens that are partial words.<\/p>\n<p>Assuming a dash is considered a word boundary, we now have:<\/p>\n<pre class=\"brush:bash\">[Building] [a] [top] [notch] [search] [engine]<\/pre>\n<p>Finally, token filters perform additional processing on tokens, such as removing suffixes (called stemming) and converting characters to lower case. The final sequence of tokens might end up looking like this:<\/p>\n<pre class=\"brush:bash\">[build] [a] [top] [notch] [search] [engine]<\/pre>\n<p>The combination of a tokenizer and zero or more filters makes up an analyzer. The Standard analyzer, which consists of a Standard tokenizer and the Standard, Lowercase, and Stop token filters, is used by default.<\/p>\n<p>Analyzers can do more complex manipulations to achieve better results. For example, an analyzer might use a token filter to spell check words or introduce synonyms so that searching for \u201csaerch\u201d or \u201cfind\u201d will both return the document that contained \u201cSearching\u201d. There are also different implementations of similar features to choose from.<\/p>\n<p>In addition to choosing between the included analyzers, we can create our own custom analyzer by chaining together an existing tokenizer and zero or more filters. The Standard analyzer doesn\u2019t do stemming, so you might want to create a custom analyzer that includes a stemming token filter.<div style=\"display:inline-block; margin: 15px 0;\"> <div id=\"adngin-JavaCodeGeeks_incontent_video-0\" style=\"display:inline-block;\"><\/div> <\/div><\/p>\n<p>With so many possibilities, we can test out different combinations and see what works best for our situation.<\/p>\n<h2><a name=\"types\"><\/a>3. Types of Analyzers<\/h2>\n<p>In Lucene, some of different analyzers are:<\/p>\n<p><b>a. Whitespace analyzer<\/b><\/p>\n<p>The Whitespace analyzer processes text into tokens based on whitespaces. All characters between whitespaces are indexed. Here, stop words are not used for analyzing and the letter cases are not changed. Whitespace analyzer is built using a Whitespace Tokenizer (a tokenizer of type whitespace that divides text at whitespace).<\/p>\n<p><b>b. SimpleAnalyser<\/b><\/p>\n<p>SimpleAnalyser uses letter tokenizer and lower case filtering to extract tokens from the contents and put it in lucene indexing. Simple analyzer is built using a Lower Case Tokenizer. A lower case tokenizer performs the function of Letter Tokenizer and Lower Case Token Filter together. It divides text at non-letters and converts them to lower case. While it is functionally equivalent to the combination of Letter Tokenizer and Lower Case Token Filter, there is a performance advantage to doing the two tasks at once, hence this (redundant) implementation.<\/p>\n<p><b>c. StopAnalyzer<\/b><\/p>\n<p>StopAnalyser removes common English words that are not so useful for indexing. These are accomplished by providing the analyzer a list of STOP_WORDS lists.<\/p>\n<p>Stop analyzer is built using lower case tokenizer with Stop token filter (A token filter of type stop that removes stop words from token streams)<\/p>\n<p>The following are settings that can be set for a stop analyzer type:<\/p>\n<ul>\n<li>stopwords -&gt; A list of stopwords to initialize the stop filter with. Defaults to the english stop words.<\/li>\n<li>stopwords_path -&gt; A path (either relative to config location, or absolute) to a stopwords file configuration.<\/li>\n<\/ul>\n<p><b>d. StandardAnalyzer<\/b><\/p>\n<p>StandardAnalyser is a general purpose analyser. It generally converts tokens to lowercase, taking help of standard stop words to analyze the texts and is also governed by other rules. StandardAnalyzer is built using the Standard Tokenizer with the Standard Token Filter (a token filter of type standard that normalizes tokens extracted with the Standard Tokenizer), Lower Case Token Filter and Stop Token Filter.<\/p>\n<p><b>e. ClassicAnalyzer<\/b><\/p>\n<p>An Analyzer that filters ClassicTokenizer with ClassicFilter, LowerCaseFilter and StopFilter, using a list of English stop words.<\/p>\n<p><b>f. UAX29URLEmailAnalyzer<\/b><\/p>\n<p>An Analyzer that filters UAX29URLEmailTokenizer with StandardFilter, LowerCaseFilter and StopFilter, using a list of English stop words.<\/p>\n<p><b>g. Keyword Analyzer<\/b><\/p>\n<p>An analyzer of type keyword that &#8220;tokenizes&#8221; an entire stream as a single token. This is useful for data like zip codes, ids and so on. Note, when using mapping definitions, it might make more sense to simply mark the field as not_analyzed.<\/p>\n<p><b>h. Snowball Analyzer<\/b><\/p>\n<p>An analyzer of type snowball that uses the standard tokenizer, with standard filter, lowercase filter, stop filter, and snowball filter.<\/p>\n<p>The Snowball Analyzer is a stemming analyzer from Lucene that is originally based on the snowball project from snowball.tartarus.org.<\/p>\n<p>Sample usage:<\/p>\n<pre class=\"brush:bash\">{\n\n    \"index\" : {\n\n        \"analysis\" : {\n\n            \"analyzer\" : {\n\n                \"my_analyzer\" : {\n\n                    \"type\" : \"snowball\",\n\n                    \"language\" : \"English\"\n\n                }\n\n            }\n\n        }\n\n    }\n\n}\n<\/pre>\n<p><b>i. Pattern Analyzer<\/b><\/p>\n<p>An analyzer of type pattern that can flexibly separate text into terms via a regular expression. Accepts the following settings:<\/p>\n<p>Example of pattern analyzer:<\/p>\n<p><b>whitespace tokenizer:<\/b><\/p>\n<pre class=\"brush:bash\">    curl -XPUT 'localhost:9200\/test' -d '\n\n    {\n\n        \"settings\":{\n\n            \"analysis\": {\n\n                \"analyzer\": {\n\n                    \"whitespace\":{\n\n                        \"type\": \"pattern\",\n\n                        \"pattern\":\"\\\\\\\\\\\\\\\\s+\"\n\n                    }\n\n                }\n\n            }\n\n        }\n\n    }<\/pre>\n<pre class=\"brush:bash\">curl 'localhost:9200\/test\/_analyze?pretty=1&amp;analyzer=whitespace' -d 'foo,bar baz'\n\n    # \"foo,bar\", \"baz\"\n<\/pre>\n<p><strong>non-word character tokenizer:<\/strong><\/p>\n<pre class=\"brush:bash\">    curl -XPUT 'localhost:9200\/test' -d '\n\n    {\n\n        \"settings\":{\n\n            \"analysis\": {\n\n                \"analyzer\": {\n\n                    \"nonword\":{\n\n                        \"type\": \"pattern\",\n\n                        \"pattern\":\"[^\\\\\\\\\\\\\\\\w]+\"\n\n                    }\n\n                }\n\n            }\n\n        }\n\n    }<\/pre>\n<pre class=\"brush:bash\">curl 'localhost:9200\/test\/_analyze?pretty=1&amp;amp;analyzer=nonword' -d 'foo,bar baz'&lt;\/strong&gt;\n\n# \"foo,bar baz\" becomes \"foo\", \"bar\", \"baz\"<\/pre>\n<pre class=\"brush:bash\">curl 'localhost:9200\/test\/_analyze?pretty=1&amp;amp;analyzer=nonword' -d 'type_1-type_4'&lt;\/strong&gt;\n\n# \"type_1\",\"type_4\"<\/pre>\n<p><b>camelcase tokenizer:<\/b><\/p>\n<pre class=\"brush:bash\">    curl -XPUT 'localhost:9200\/test?pretty=1' -d '\n\n    {\n\n        \"settings\":{\n\n            \"analysis\": {\n\n                \"analyzer\": {\n\n                    \"camel\":{\n\n                        \"type\": \"pattern\",\n\n                        \"pattern\":\"([^\\\\\\\\\\\\\\\\p{L}\\\\\\\\\\\\\\\\d]+)|(?&lt;=\\\\\\\\\\\\\\\\D)(?=\\\\\\\\\\\\\\\\d)|(?&lt;=\\\\\\\\\\\\\\\\d)(?=\\\\\\\\\\\\\\\\D)|(?&lt;=[\\\\\\\\\\\\\\\\p{L}&amp;&amp;[^\\\\\\\\\\\\\\\\p{Lu}]])(?=\\\\\\\\\\\\\\\\p{Lu})|(?&lt;=\\\\\\\\\\\\\\\\p{Lu})(?=\\\\\\\\\\\\\\\\p{Lu}[\\\\\\\\\\\\\\\\p{L}&amp;&amp;[^\\\\\\\\\\\\\\\\p{Lu}]])\"\n\n                    }\n\n                }\n\n            }\n\n        }\n\n    }<\/pre>\n<pre class=\"brush:bash\">    curl 'localhost:9200\/test\/_analyze?pretty=1&amp;analyzer=camel' -d '\n\n        MooseX::FTPClass2_beta\n\n    '\n\n    # \"moose\",\"x\",\"ftp\",\"class\",\"2\",\"beta\"\n<\/pre>\n<p>The regex above is easier to understand as:<\/p>\n<div class=\"wp-caption aligncenter\">\n<table>\n<tbody>\n<tr>\n<td>([^\\\\p{L}\\\\d]+)<\/td>\n<td># swallow non letters and numbers,<\/td>\n<\/tr>\n<tr>\n<td>| (?&lt;=\\\\D)(?=\\\\d)<\/td>\n<td># or non-number followed by number,<\/td>\n<\/tr>\n<tr>\n<td>| (?&lt;=\\\\d)(?=\\\\D)<\/td>\n<td># or number followed by non-number,<\/td>\n<\/tr>\n<tr>\n<td>| (?&lt;=[ \\\\p{L} &amp;&amp; [^\\\\p{Lu}]])<\/td>\n<td># or lower case<\/td>\n<\/tr>\n<tr>\n<td>(?=\\\\p{Lu})<\/td>\n<td># followed by upper case,<\/td>\n<\/tr>\n<tr>\n<td>| (?&lt;=\\\\p{Lu})<\/td>\n<td># or upper case<\/td>\n<\/tr>\n<tr>\n<td>(?=\\\\p{Lu}<\/td>\n<td># followed by upper case<\/td>\n<\/tr>\n<tr>\n<td>[\\\\p{L}&amp;&amp;[^\\\\p{Lu}]]<\/td>\n<td># then lower case<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p class=\"wp-caption-text\">Table 1<\/p>\n<\/div>\n<p><b>j. Custom Analyzer<\/b><\/p>\n<p>An analyzer of type custom that allows to combine a Tokenizer with zero or more Token Filters, and zero or more Char Filters. The custom analyzer accepts a logical\/registered name of the tokenizer to use, and a list of logical\/registered names of token filters.<\/p>\n<p>Here is example of a custom analyzer:<\/p>\n<pre class=\"brush:bash\">index :\n    analysis :\n        analyzer :\n            myAnalyzer2 :\n                type : custom\n                tokenizer : myTokenizer1\n                filter : [myTokenFilter1, myTokenFilter2]\n                char_filter : [my_html]\n\n        tokenizer :\n            myTokenizer1 :\n                type : standard\n                max_token_length : 900\n\n        filter :\n            myTokenFilter1 :\n                type : stop\n                stopwords : [stop1, stop2, stop3, stop4]\n\n            myTokenFilter2 :\n                type : length\n                min : 0\n                max : 2000\n\n        char_filter :\n              my_html :\n                type : html_strip\n                escaped_tags : [xxx, yyy]\n                read_ahead : 1024\n<\/pre>\n<p>The language parameter can have the same values as the snowball filter and defaults to English. Note that not all the language analyzers have a default set of stopwords provided.<\/p>\n<p>The stopwords parameter can be used to provide stopwords for the languages that have no defaults, or to simply replace the default set with your custom list. Check Stop Analyzer for more details. A default set of stopwords for many of these languages is available from for instance here and here.<\/p>\n<p>A sample configuration (in YAML format) specifying Swedish with stopwords:<\/p>\n<pre class=\"brush:bash;wrap-lines:false\">index :\n    analysis :\n        analyzer :\n           my_analyzer:\n                type: snowball\n                language: Swedish\n                stopwords: \"och,det,att,i,en,jag,hon,som,han,p\u00e5,den,med,var,sig,f\u00f6r,s\u00e5,till,\u00e4r,men,ett,om,hade,de,av,icke,mig,du,henne,d\u00e5,sin,nu,har,inte,hans,honom,skulle,hennes,d\u00e4r,min,man,ej,vid,kunde,n\u00e5got,fr\u00e5n,ut,n\u00e4r,efter,upp,vi,dem,vara,vad,\u00f6ver,\u00e4n,dig,kan,sina,h\u00e4r,ha,mot,alla,under,n\u00e5gon,allt,mycket,sedan,ju,denna,sj\u00e4lv,detta,\u00e5t,utan,varit,hur,ingen,mitt,ni,bli,blev,oss,din,dessa,n\u00e5gra,deras,blir,mina,samma,vilken,er,s\u00e5dan,v\u00e5r,blivit,dess,inom,mellan,s\u00e5dant,varf\u00f6r,varje,vilka,ditt,vem,vilket,sitta,s\u00e5dana,vart,dina,vars,v\u00e5rt,v\u00e5ra,ert,era,vilkas\"\n<\/pre>\n<p>Here is an example of a custom string analyzer, which is built by extending Lucene&#8217;s abstract Analyzer class. The following listing shows the SampleStringAnalyzer, which implements the <code>tokenStream(String,Reader)<\/code> method. The SampleStringAnalyzer defines a set of stop words that can be discarded in the process of indexing, using a StopFilter provided by Lucene. The tokenStream method checks the field that is being indexed. If the field is a comment, it first tokenizes and lower-cases input using the LowerCaseTokenizer, eliminates stop words of English (a limited set of English stop words) using the StopFilter, and uses the PorterStemFilter to remove common morphological and inflectional endings. If the content to be indexed is not a comment, the analyzer tokenizes and lower-cases input using LowerCaseTokenizer and eliminates the Java keywords using the StopFilter.<\/p>\n<pre class=\"brush:java\">public class SampleStringAnalyzer extends Analyzer {\n\tprivate Set specialStopSet;\n\tprivate Set englishStopSet;\n\tprivate static final String[] SPECIALWORD_STOP_WORDS = {\n    \t\t\"abstract\",\"implements\",\"extends\",\"null\"\"new\",\n    \t\t\"switch\",\"case\", \"default\" ,\"synchronized\" ,\n    \t\t\"do\", \"if\", \"else\", \"break\",\"continue\",\"this\",\n    \t\t\"assert\" ,\"for\", \"transient\",\n    \t\t\"final\", \"static\",\"catch\",\"try\",\n    \t\t\"throws\",\"throw\",\"class\", \"finally\",\"return\",\n    \t\t\"const\" , \"native\", \"super\",\"while\", \"import\",\n    \t\t\"package\" ,\"true\", \"false\" };\n\tprivate static final String[] ENGLISH_STOP_WORDS ={\n    \t\t\"a\", \"an\", \"and\", \"are\",\"as\",\"at\",\"be\" \"but\",\n    \t\t\"by\", \"for\", \"if\", \"in\", \"into\", \"is\", \"it\",\n    \t\t\"no\", \"not\", \"of\", \"on\", \"or\", \"s\", \"such\",\n    \t\t\"that\", \"the\", \"their\", \"then\", \"there\",\"these\",\n    \t\t\"they\", \"this\", \"to\", \"was\", \"will\", \"with\" };\n\tpublic SourceCodeAnalyzer(){\n    \t\tsuper();\n    \t\tspecialStopSet = StopFilter.makeStopSet(SPECIALWORD_STOP_WORDS);\n    \t\tenglishStopSet = StopFilter.makeStopSet(ENGLISH_STOP_WORDS);\n\t}\n\tpublic TokenStream tokenStream(String fieldName,\n                    Reader reader) {\n    \t\tif (fieldName.equals(\"comment\"))\n        \t\treturn   new PorterStemFilter(\n        \t\t\tnew StopFilter(\n        \t\t\tnew LowerCaseTokenizer(reader),englishStopSet));\n    \t\telse\n        \t\treturn   new StopFilter(\n        \t\t\tnew LowerCaseTokenizer(reader),specialStopSet);\n \t\t}\n\t}\n\n}\n<\/pre>\n<p><b>What\u2019s inside an Analyzer?<\/b><\/p>\n<p>Analyzers need to return a TokenStream<\/p>\n<p><figure id=\"attachment_5248\" aria-describedby=\"caption-attachment-5248\" style=\"width: 579px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2015\/09\/imagechp5.jpg\"><img decoding=\"async\" class=\" wp-image-5248\" src=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2015\/09\/imagechp5.jpg\" alt=\"Figure 1\" width=\"579\" height=\"365\" \/><\/a><figcaption id=\"caption-attachment-5248\" class=\"wp-caption-text\">Figure 1<\/figcaption><\/figure><\/p>\n<p><b>Analyzing Text into Tokens:<\/b><\/p>\n<p>Search and indexing over text fields require processing text data into tokens. The package <code>oal.analysis<\/code> contains the base classes for tokenizing and indexing text. Processing may consist of a sequence of transformations , e.g., whitespace tokenization, case normalization, stop-listing, and stemming.<\/p>\n<p>The abstract class <code>oal.analysis.TokenStream<\/code> breaks the incoming text into a sequence of tokens that are retrieved using an iterator-like pattern. TokenStream has two subclasses: <code>oal.analysis.Tokenizer<\/code> and <code>oal.analysis.TokenFilter<\/code>. A Tokenizer takes a <code>java.io.Reader<\/code> as input whereas a TokenFilter takes another oal.analysis.TokenStream as input. This allows us to chain together tokenizers such that the initial tokenizer gets its input from a reader and the others operate on tokens from the preceding TokenStream in the chain.<\/p>\n<p>An <code>oal.analysis.Analyzer<\/code> supplies the indexing and searching processes with TokenStreams on a per-field basis. It maps field names to tokenizers and may also supply a default analyzer for unknown field names. Lucene includes many analysis modules that provide concrete implementations of different kinds of analyzers. As of Lucene 4, these modules are bundled into separate jar files. There are several dozen language-specific analysis packages, from <code>oal.analysis.ar<\/code> for Arabic to <code>oal.analysis.tr<\/code> for Turkish. The package <code>oal.analysis.core<\/code> provides several general-purpose analyzers, tokenizers, and tokenizer factory classes.<\/p>\n<p>The abstract class <code>oal.analysis.Analyzer<\/code> contains methods used to extract terms from input text. Concrete subclasses of Analyzer must override the method <code>createComponents<\/code>, which returns an object of the nested class <code>TokenStreamComponents<\/code> that defines the tokenization process and provides access to initial and file components of the processing pipeline. The initial component is a Tokenizer that handles the input source. The final component is an instance of TokenFilter and it is the TokenStream returned by the method <code>Analyzer.tokenStream(String,Reader)<\/code>. Here is an example of a custom Analyzer that tokenizes its inputs into individual words with all letters lowercase.<\/p>\n<pre class=\"brush:java\">Analyzer analyzer = new Analyzer() {\n    @Override\n    protected TokenStreamComponents createComponents(String fieldName, Reader reader) {\n        Tokenizer source =\n            new StandardTokenizer(VERSION,reader);\n        TokenStream filter =\n            new LowerCaseFilter(VERSION,source);\n        return new TokenStreamComponents(source, filter);\n    }\n};\n<\/pre>\n<p>The constructors for the <code>oal.analysis.standard.StandardTokenizer<\/code> and <code>oal.analysis.core.LowerCaseFilter<\/code> objects require a Version argument. Further note that package <code>oal.analysis.standard<\/code> is distributed in the jarfile lucene-analyzers-common-4.x.y.jar, where x and y are the minor version and release number.<\/p>\n<p>Which core analyzer should we use?<\/p>\n<p>We\u2019ve now seen the substantial differences in how each of the four core Lucene analyzers works. To choose the right one for our application surprise us: most applications don\u2019t use any of the built-in analyzers, and instead opt to create their own analyzer chain. For those applications that do use a core analyzer, StandardAnalyzer is likely the most common choice. The remaining core analyzers are usually far too simplistic for most applications, except perhaps for specific use cases (for example, a field that contains a list of part numbers might use Whitespace-Analyzer). But these analyzers are great for test cases and are indeed used heavily by Lucene\u2018s unit tests.<\/p>\n<p>Typically an application has specific needs, such as customizing the stop-words list, performing special tokenization for application-specific tokens like part numbers or for synonym expansion, preserving case for certain tokens, or choosing a specific stemming algorithm.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This article is part of our Academy Course titled Apache Lucene Fundamentals. In this course, you will get an introduction to Lucene. You will see why a library like this is important and then learn how searching works in Lucene. Moreover, you will learn how to integrate Lucene Search into your own applications in order &hellip;<\/p>\n","protected":false},"author":448,"featured_media":71,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[8],"tags":[26],"class_list":["post-44619","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-enterprise-java","tag-apache-lucene"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.5 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Lucene Analysis Process Guide - Java Code Geeks<\/title>\n<meta name=\"description\" content=\"This article is part of our Academy Course titled Apache Lucene Fundamentals. In this course, you will get an introduction to Lucene. You will see why a\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.javacodegeeks.com\/2015\/09\/lucene-analysis-process-guide.html\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Lucene Analysis Process Guide - Java Code Geeks\" \/>\n<meta property=\"og:description\" content=\"This article is part of our Academy Course titled Apache Lucene Fundamentals. In this course, you will get an introduction to Lucene. You will see why a\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.javacodegeeks.com\/2015\/09\/lucene-analysis-process-guide.html\" \/>\n<meta property=\"og:site_name\" content=\"Java Code Geeks\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/javacodegeeks\" \/>\n<meta property=\"article:author\" content=\"http:\/\/www.facebook.com\/phlocblogger\" \/>\n<meta property=\"article:published_time\" content=\"2015-09-27T19:11:26+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-12-07T09:00:43+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2012\/10\/apache-lucene-logo.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"150\" \/>\n\t<meta property=\"og:image:height\" content=\"150\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Piyas De\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@https:\/\/twitter.com\/phloxblog\" \/>\n<meta name=\"twitter:site\" content=\"@javacodegeeks\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Piyas De\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"11 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2015\\\/09\\\/lucene-analysis-process-guide.html#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2015\\\/09\\\/lucene-analysis-process-guide.html\"},\"author\":{\"name\":\"Piyas De\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#\\\/schema\\\/person\\\/20f3c9ff4b90d43da03decd2ad2b4f37\"},\"headline\":\"Lucene Analysis Process Guide\",\"datePublished\":\"2015-09-27T19:11:26+00:00\",\"dateModified\":\"2023-12-07T09:00:43+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2015\\\/09\\\/lucene-analysis-process-guide.html\"},\"wordCount\":2350,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2015\\\/09\\\/lucene-analysis-process-guide.html#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2012\\\/10\\\/apache-lucene-logo.jpg\",\"keywords\":[\"Apache Lucene\"],\"articleSection\":[\"Enterprise Java\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.javacodegeeks.com\\\/2015\\\/09\\\/lucene-analysis-process-guide.html#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2015\\\/09\\\/lucene-analysis-process-guide.html\",\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2015\\\/09\\\/lucene-analysis-process-guide.html\",\"name\":\"Lucene Analysis Process Guide - Java Code Geeks\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2015\\\/09\\\/lucene-analysis-process-guide.html#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2015\\\/09\\\/lucene-analysis-process-guide.html#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2012\\\/10\\\/apache-lucene-logo.jpg\",\"datePublished\":\"2015-09-27T19:11:26+00:00\",\"dateModified\":\"2023-12-07T09:00:43+00:00\",\"description\":\"This article is part of our Academy Course titled Apache Lucene Fundamentals. In this course, you will get an introduction to Lucene. You will see why a\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2015\\\/09\\\/lucene-analysis-process-guide.html#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.javacodegeeks.com\\\/2015\\\/09\\\/lucene-analysis-process-guide.html\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2015\\\/09\\\/lucene-analysis-process-guide.html#primaryimage\",\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2012\\\/10\\\/apache-lucene-logo.jpg\",\"contentUrl\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2012\\\/10\\\/apache-lucene-logo.jpg\",\"width\":150,\"height\":150},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2015\\\/09\\\/lucene-analysis-process-guide.html#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.javacodegeeks.com\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Java\",\"item\":\"https:\\\/\\\/www.javacodegeeks.com\\\/category\\\/java\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Enterprise Java\",\"item\":\"https:\\\/\\\/www.javacodegeeks.com\\\/category\\\/java\\\/enterprise-java\"},{\"@type\":\"ListItem\",\"position\":4,\"name\":\"Lucene Analysis Process Guide\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#website\",\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/\",\"name\":\"Java Code Geeks\",\"description\":\"Java Developers Resource Center\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#organization\"},\"alternateName\":\"JCG\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.javacodegeeks.com\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#organization\",\"name\":\"Exelixis Media P.C.\",\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2022\\\/06\\\/exelixis-logo.png\",\"contentUrl\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2022\\\/06\\\/exelixis-logo.png\",\"width\":864,\"height\":246,\"caption\":\"Exelixis Media P.C.\"},\"image\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/javacodegeeks\",\"https:\\\/\\\/x.com\\\/javacodegeeks\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#\\\/schema\\\/person\\\/20f3c9ff4b90d43da03decd2ad2b4f37\",\"name\":\"Piyas De\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/eadd6728b7b5be23f0d6585da1a953926e49c6f2369703d6cb4f1147d4dd2203?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/eadd6728b7b5be23f0d6585da1a953926e49c6f2369703d6cb4f1147d4dd2203?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/eadd6728b7b5be23f0d6585da1a953926e49c6f2369703d6cb4f1147d4dd2203?s=96&d=mm&r=g\",\"caption\":\"Piyas De\"},\"description\":\"Piyas is Sun Microsystems certified Enterprise Architect with 10+ years of professional IT experience in various areas such as Architecture Definition, Define Enterprise Application, Client-server\\\/e-business solutions.Currently he is engaged in providing solutions for digital asset management in media companies.He is also founder and main author of \\\"Technical Blogs(Blog about small technical Know hows)\\\" Hyperlink - http:\\\/\\\/www.phloxblog.in\",\"sameAs\":[\"http:\\\/\\\/www.phloxblog.in\",\"http:\\\/\\\/www.facebook.com\\\/phlocblogger\",\"http:\\\/\\\/in.linkedin.com\\\/in\\\/piyasde\",\"https:\\\/\\\/x.com\\\/https:\\\/\\\/twitter.com\\\/phloxblog\"],\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/author\\\/piyas-de\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Lucene Analysis Process Guide - Java Code Geeks","description":"This article is part of our Academy Course titled Apache Lucene Fundamentals. In this course, you will get an introduction to Lucene. You will see why a","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.javacodegeeks.com\/2015\/09\/lucene-analysis-process-guide.html","og_locale":"en_US","og_type":"article","og_title":"Lucene Analysis Process Guide - Java Code Geeks","og_description":"This article is part of our Academy Course titled Apache Lucene Fundamentals. In this course, you will get an introduction to Lucene. You will see why a","og_url":"https:\/\/www.javacodegeeks.com\/2015\/09\/lucene-analysis-process-guide.html","og_site_name":"Java Code Geeks","article_publisher":"https:\/\/www.facebook.com\/javacodegeeks","article_author":"http:\/\/www.facebook.com\/phlocblogger","article_published_time":"2015-09-27T19:11:26+00:00","article_modified_time":"2023-12-07T09:00:43+00:00","og_image":[{"width":150,"height":150,"url":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2012\/10\/apache-lucene-logo.jpg","type":"image\/jpeg"}],"author":"Piyas De","twitter_card":"summary_large_image","twitter_creator":"@https:\/\/twitter.com\/phloxblog","twitter_site":"@javacodegeeks","twitter_misc":{"Written by":"Piyas De","Est. reading time":"11 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.javacodegeeks.com\/2015\/09\/lucene-analysis-process-guide.html#article","isPartOf":{"@id":"https:\/\/www.javacodegeeks.com\/2015\/09\/lucene-analysis-process-guide.html"},"author":{"name":"Piyas De","@id":"https:\/\/www.javacodegeeks.com\/#\/schema\/person\/20f3c9ff4b90d43da03decd2ad2b4f37"},"headline":"Lucene Analysis Process Guide","datePublished":"2015-09-27T19:11:26+00:00","dateModified":"2023-12-07T09:00:43+00:00","mainEntityOfPage":{"@id":"https:\/\/www.javacodegeeks.com\/2015\/09\/lucene-analysis-process-guide.html"},"wordCount":2350,"commentCount":0,"publisher":{"@id":"https:\/\/www.javacodegeeks.com\/#organization"},"image":{"@id":"https:\/\/www.javacodegeeks.com\/2015\/09\/lucene-analysis-process-guide.html#primaryimage"},"thumbnailUrl":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2012\/10\/apache-lucene-logo.jpg","keywords":["Apache Lucene"],"articleSection":["Enterprise Java"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.javacodegeeks.com\/2015\/09\/lucene-analysis-process-guide.html#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.javacodegeeks.com\/2015\/09\/lucene-analysis-process-guide.html","url":"https:\/\/www.javacodegeeks.com\/2015\/09\/lucene-analysis-process-guide.html","name":"Lucene Analysis Process Guide - Java Code Geeks","isPartOf":{"@id":"https:\/\/www.javacodegeeks.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.javacodegeeks.com\/2015\/09\/lucene-analysis-process-guide.html#primaryimage"},"image":{"@id":"https:\/\/www.javacodegeeks.com\/2015\/09\/lucene-analysis-process-guide.html#primaryimage"},"thumbnailUrl":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2012\/10\/apache-lucene-logo.jpg","datePublished":"2015-09-27T19:11:26+00:00","dateModified":"2023-12-07T09:00:43+00:00","description":"This article is part of our Academy Course titled Apache Lucene Fundamentals. In this course, you will get an introduction to Lucene. You will see why a","breadcrumb":{"@id":"https:\/\/www.javacodegeeks.com\/2015\/09\/lucene-analysis-process-guide.html#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.javacodegeeks.com\/2015\/09\/lucene-analysis-process-guide.html"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.javacodegeeks.com\/2015\/09\/lucene-analysis-process-guide.html#primaryimage","url":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2012\/10\/apache-lucene-logo.jpg","contentUrl":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2012\/10\/apache-lucene-logo.jpg","width":150,"height":150},{"@type":"BreadcrumbList","@id":"https:\/\/www.javacodegeeks.com\/2015\/09\/lucene-analysis-process-guide.html#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.javacodegeeks.com\/"},{"@type":"ListItem","position":2,"name":"Java","item":"https:\/\/www.javacodegeeks.com\/category\/java"},{"@type":"ListItem","position":3,"name":"Enterprise Java","item":"https:\/\/www.javacodegeeks.com\/category\/java\/enterprise-java"},{"@type":"ListItem","position":4,"name":"Lucene Analysis Process Guide"}]},{"@type":"WebSite","@id":"https:\/\/www.javacodegeeks.com\/#website","url":"https:\/\/www.javacodegeeks.com\/","name":"Java Code Geeks","description":"Java Developers Resource Center","publisher":{"@id":"https:\/\/www.javacodegeeks.com\/#organization"},"alternateName":"JCG","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.javacodegeeks.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.javacodegeeks.com\/#organization","name":"Exelixis Media P.C.","url":"https:\/\/www.javacodegeeks.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.javacodegeeks.com\/#\/schema\/logo\/image\/","url":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2022\/06\/exelixis-logo.png","contentUrl":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2022\/06\/exelixis-logo.png","width":864,"height":246,"caption":"Exelixis Media P.C."},"image":{"@id":"https:\/\/www.javacodegeeks.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/javacodegeeks","https:\/\/x.com\/javacodegeeks"]},{"@type":"Person","@id":"https:\/\/www.javacodegeeks.com\/#\/schema\/person\/20f3c9ff4b90d43da03decd2ad2b4f37","name":"Piyas De","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/eadd6728b7b5be23f0d6585da1a953926e49c6f2369703d6cb4f1147d4dd2203?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/eadd6728b7b5be23f0d6585da1a953926e49c6f2369703d6cb4f1147d4dd2203?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/eadd6728b7b5be23f0d6585da1a953926e49c6f2369703d6cb4f1147d4dd2203?s=96&d=mm&r=g","caption":"Piyas De"},"description":"Piyas is Sun Microsystems certified Enterprise Architect with 10+ years of professional IT experience in various areas such as Architecture Definition, Define Enterprise Application, Client-server\/e-business solutions.Currently he is engaged in providing solutions for digital asset management in media companies.He is also founder and main author of \"Technical Blogs(Blog about small technical Know hows)\" Hyperlink - http:\/\/www.phloxblog.in","sameAs":["http:\/\/www.phloxblog.in","http:\/\/www.facebook.com\/phlocblogger","http:\/\/in.linkedin.com\/in\/piyasde","https:\/\/x.com\/https:\/\/twitter.com\/phloxblog"],"url":"https:\/\/www.javacodegeeks.com\/author\/piyas-de"}]}},"_links":{"self":[{"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/posts\/44619","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/users\/448"}],"replies":[{"embeddable":true,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/comments?post=44619"}],"version-history":[{"count":0,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/posts\/44619\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/media\/71"}],"wp:attachment":[{"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/media?parent=44619"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/categories?post=44619"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/tags?post=44619"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}