{"id":1178,"date":"2012-06-01T16:00:00","date_gmt":"2012-06-01T16:00:00","guid":{"rendered":"http:\/\/www.javacodegeeks.com\/2012\/10\/solr-creating-a-spellchecker.html"},"modified":"2012-10-22T04:57:39","modified_gmt":"2012-10-22T04:57:39","slug":"solr-creating-spellchecker","status":"publish","type":"post","link":"https:\/\/www.javacodegeeks.com\/2012\/06\/solr-creating-spellchecker.html","title":{"rendered":"Solr: Creating a spellchecker"},"content":{"rendered":"<div dir=\"ltr\" style=\"text-align: left\">\n<div style=\"text-align: justify\">In a previous <a href=\"http:\/\/emmaespina.wordpress.com\/2011\/01\/18\/solr-spellchecker-internals-now-with-tests\/\">post<\/a> I talked about how the Solr Spellchecker works and then I showed you some test results of its performance. Now we are going to see another aproach to spellchecking. <\/div>\n<div style=\"text-align: justify\">This method, as many others, use a two step procedure. A rather fast \u201ccandidate word\u201d selection, and then a scoring of those words. We are going to select different methods from the ones that Solr uses and test its performance. Our main objective will be effectiveness in the correction, and in a second term, velocity in the results. We can tolerate a slightly slower performance considering that we are gaining in correctness of the results. <\/div>\n<div style=\"text-align: justify\">Our strategy will be to use a special Lucene index, and query it using fuzzy queries to get a candidate list. Then we are going to rank the candidates with a Python script (that can easily be transformed in a Solr spell checker subclass if we get better results). <\/div>\n<p><strong>Candidate selection<\/strong>  <\/p>\n<div style=\"text-align: justify\">Fuzzy queries have historically been considered a slow performance query in relation with others but , as they have been optimized in the 1.4 version, they are a good choice for the first part of our algorithm. So, the idea will be very simple: we are going to construct a Lucene index where every document will be a dictionary word. When we have to correct a misspelled word we are going to do a simple fuzzy query of that word and get a list of results. The results will be words similar to the one we provided (ie with a small edit distance). I found that with approximately 70 candidates we can get excellent results. <\/div>\n<div style=\"text-align: justify\">With fuzzy queries we are covering all the typos because, as I said in the previous post, most of the typos are of edit distance 1 with respect to the correct word. But although this is the most common error people make while typing, there are other kinds of errors. <\/div>\n<div style=\"text-align: justify\">\n<\/div>\n<p>We can find three types of misspellings <a href=\"http:\/\/www.blogger.com\/blogger.g?blogID=8574118689743302986#kukich\">[Kukich]<\/a>: <\/p>\n<ol>\n<li>Typographic errors<\/li>\n<li>Cognitive errors<\/li>\n<li>Phonetic errors<\/li>\n<\/ol>\n<div style=\"text-align: justify\">Typographic errors are the typos, when people knows the correct spelling but makes a motor coordination slip when typing. The cognitive errors are those caused by a lack of knowledge of the person. Finally, phonetic errors are a special case of cognitive errors that are words that sound correctly but are orthographically incorrect. We already covered typographic errors with the fuzzy query, but we can also do something for the phonetic errors. Solr has a Phonetic Filter in its analysis package that, among others, has the double methaphone algorithm. In the same way we perform fuzzy query to find similar words, we can index the methaphone equivalent of the word and perform fuzzy query on it. We must manually obtain the methaphone equivalent of the word (because the Lucene query parser don\u2019t analyze fuzzy queries) and construct a fuzzy query with that word. <\/div>\n<p>In few words, for the candidate selection we construct an index with the following solr schema:<\/p>\n<pre class=\"brush:xml\">&lt;fieldType name=\"spellcheck_text\" class=\"solr.TextField\" positionIncrementGap=\"100\" autoGeneratePhraseQueries=\"true\"&gt;\r\n      &lt;analyzer type=\"index\"&gt;\r\n        &lt;tokenizer class=\"solr.KeywordTokenizerFactory\"\/&gt;\r\n        &lt;filter class=\"solr.LowerCaseFilterFactory\"\/&gt;\r\n        &lt;filter class=\"solr.PhoneticFilterFactory\" encoder=\"DoubleMetaphone\" maxCodeLength=\"20\" inject=\"false\"\/&gt;\r\n     &lt;\/analyzer&gt;\r\n    &lt;\/fieldType&gt;\r\n\r\n   &lt;field name=\"original_word\" type=\"string\" indexed=\"true\" stored=\"true\" multiValued=\"false\"\/&gt;\r\n   &lt;field name=\"analyzed_word\" type=\"spellcheck_text\" indexed=\"true\" stored=\"true\" multiValued=\"false\"\/&gt;\r\n   &lt;field name=\"freq\" type=\"tfloat\" stored=\"true\" multiValued=\"false\"\/&gt;<\/pre>\n<div style=\"text-align: justify\">As you can see the analyzed_word field contains the \u201csoundslike\u201d of the word. The freq field will be used in the next phase of the algorithm. It is simply the frequency of the term in the language. How can we estimate the frequency of a word in a language? Counting the frequency of the word in a big text corpus. In this case the source of the terms is the wikipedia and we are using the TermComponents of Solr to count how many times each term appears in the wikipedia. <\/div>\n<div style=\"text-align: justify\">But the Wikipedia is written by common people that make errors! How can we trust in this as a \u201ccorrect dictionary\u201d? We make use of the \u201ccolective knowledge\u201d of the people that writes the Wikipedia. This dictionary of terms extracted from the Wikipedia has a lot of terms! Over 1.800.00, and most of them aren\u2019t even words. It is likely that words with a high frequency are correctly spelled in the Wikipedia. This approach of building a dictionary from a big corpus of words and considering correct the most frequent ones isn\u2019t new. In <a href=\"http:\/\/www.blogger.com\/blogger.g?blogID=8574118689743302986#cucerzan\">[Cucerzan]<\/a> they use the same concept but using query logs to build the dictionary. It apears that Google\u2019s \u201cDid you mean\u201d use a similar <a href=\"http:\/\/www.youtube.com\/watch?v=syKY8CrHkck#t=22m03s\">concept<\/a>. <\/div>\n<div style=\"text-align: justify\">We can add little optimizations here. I have found that we can remove some words and get good results. For example, I removed words with frequency 1, and words that begin with numbers. We can continue removing words based on other criteria, but we\u2019ll leave this like that. <\/div>\n<div style=\"text-align: justify\">So the procedure for building the index is simple, we extract all the terms from the wikipedia index via the TermsComponent of Solr along with frequencies, and then create an index in Solr, using SolrJ.<\/div>\n<p><strong>Candidate ranking<\/strong>  <div style=\"display:inline-block; margin: 15px 0;\"> <div id=\"adngin-JavaCodeGeeks_incontent_video-0\" style=\"display:inline-block;\"><\/div> <\/div><\/p>\n<div style=\"text-align: justify\">Now the ranking of the candidates. For the second phase of the algorithm we are going to make use of information theory, in particular, the <a href=\"http:\/\/en.wikipedia.org\/wiki\/Noisy_channel_model\">noisy channel model<\/a>. The noisy channel applied to this case assumes that the human knows the correct spelling of a word but some noise in the channel introduces the error and as the result we get another word, misspelled. We intuitively know that it is very unlikely that we get \u2018sarasa\u2019 when trying to type \u2018house\u2019 so the noisy channel model introduces some formality to finding how probable an error was.<\/div>\n<div style=\"text-align: justify\">For example, we have misspelled \u2018houze\u2019 and we want to know which one is the most likely word that we wanted to type. To accomplish that we have a big dictionary of possible words, but not all of them are equally probable. We want to obtain the word with the highest probability of having been intended to be typed. In mathematics that is called conditional probability; given that we typed \u2018houze\u2019 how high is the probability of each of the correct words to be the word that we intended. The notation of conditional probability is: P(\u2018house\u2019|&#8217;houze\u2019) that stands for the probability of \u2018house\u2019 given \u2018houze\u2019 <\/div>\n<div style=\"text-align: justify\">This problem can be seen from two perspectives: we may think that the most common words are more probable, for example \u2018house\u2019 is more probable than \u2018hose\u2019 because the former is a more common word. In the other hand, we also intuitively think that \u2018house\u2019 is more probable than \u2018photosinthesis\u2019 because of the big difference in both words. Both of these aspects, are formally deduced by <a href=\"http:\/\/en.wikipedia.org\/wiki\/Bayes'_theorem\">Bayes theorem<\/a>: <\/div>\n<div class=\"separator\" style=\"clear: both;text-align: center\"><a href=\"http:\/\/2.bp.blogspot.com\/-2qEl6Gj_OpI\/T8hxbtDINoI\/AAAAAAAAAVY\/BaQpxHDIJ6Y\/s1600\/latex.png\"><img decoding=\"async\" border=\"0\" height=\"31\" src=\"http:\/\/2.bp.blogspot.com\/-2qEl6Gj_OpI\/T8hxbtDINoI\/AAAAAAAAAVY\/BaQpxHDIJ6Y\/s320\/latex.png\" width=\"320\" \/><\/a><\/div>\n<div style=\"text-align: justify\">We have to maximize this probability and to do that we only have one parameter: the correct candidate word (\u2018house\u2019 in the case shown). <\/div>\n<div style=\"text-align: justify\">\n<\/div>\n<div style=\"text-align: justify\">For that reason the probability of the misspelled word will be constant and we are not interested in it. The formula reduces to <\/div>\n<div class=\"separator\" style=\"clear: both;text-align: center\"><a href=\"http:\/\/2.bp.blogspot.com\/-j05gAT8-Y88\/T8hxnoL9faI\/AAAAAAAAAVg\/plpJximBncM\/s1600\/latex+(1).png\"><img decoding=\"async\" border=\"0\" height=\"15\" src=\"http:\/\/2.bp.blogspot.com\/-j05gAT8-Y88\/T8hxnoL9faI\/AAAAAAAAAVg\/plpJximBncM\/s400\/latex+(1).png\" width=\"400\" \/><\/a><\/div>\n<div style=\"text-align: justify\">And to add more structure to this, scientists have given named to these two factors. The P(\u2018houze\u2019|&#8217;house\u2019) factor is the Error model (or Channel Model) and relates with how probable is that the channel introduces this particular misspell when trying to write the second word. The second term P(\u2018house\u2019) is called the Language model and gives us an idea of how common a word is in a language. <\/div>\n<div style=\"text-align: justify\">\n<\/div>\n<div style=\"text-align: justify\">Up to this point, I only introduced the mathematical aspects of the model. Now we have to come up with a concrete model of this two probabilities. For the Language model we can use the frequency of the term in the text corpus. I have found empirically that it works much better to use the logarithm of the frequency rather than the frequency alone. Maybe this is because we want to reduce the weight of the very frequent terms more than the less frequent ones, and the logarithm does just that. <\/div>\n<div style=\"text-align: justify\">\n<\/div>\n<div style=\"text-align: justify\">There is not only one way to construct a Channel model. Many different ideas have been proposed. We are going to use a simple one based in the Damerau-Levenshtein distance. But also I found that the fuzzy query of the first phase does a good job in finding the candidates. It gives the correct word in the first place in more than half of the test cases with some datasets. So the Channel model will be a combination of the Damerau-Levenshtein distance and the score that Lucene created for the terms of the fuzzy query. <\/div>\n<p>The ranking formula will be: <\/p>\n<div class=\"separator\" style=\"clear: both;text-align: center\"><a href=\"http:\/\/3.bp.blogspot.com\/-XDa_amVPzP8\/T8hx0ZknCRI\/AAAAAAAAAVo\/uWhG30PkGZQ\/s1600\/latex+(2).png\"><img decoding=\"async\" border=\"0\" src=\"http:\/\/3.bp.blogspot.com\/-XDa_amVPzP8\/T8hx0ZknCRI\/AAAAAAAAAVo\/uWhG30PkGZQ\/s1600\/latex+(2).png\" \/><\/a><\/div>\n<p>I programmed a small script (python) that does all that was previously said:<\/p>\n<pre class=\"brush:java\">from urllib import urlopen\r\nimport doubleMethaphone\r\nimport levenshtain\r\nimport json\r\n\r\nserver = \"http:\/\/benchmarks:8983\/solr\/testSpellMeta\/\"\r\n\r\ndef spellWord(word, candidateNum = 70):\r\n    #fuzzy + soundlike\r\n    metaphone = doubleMethaphone.dm(word)\r\n    query = \"original_word:%s~ OR analyzed_word:%s~\" % (word, metaphone[0])\r\n\r\n    if metaphone[1] != None:\r\n        query = query + \" OR analyzed_word:%s~\" % metaphone[1]\r\n\r\n    doc = urlopen(server + \"select?rows=%d&amp;wt=json&amp;fl=*,score&amp;omitHeader=true&amp;q=%s\" % (candidateNum, query)).read( )\r\n    response = json.loads(doc)\r\n    suggestions = response['response']['docs']\r\n\r\n    if len(suggestions) &gt; 0:\r\n        #score\r\n        scores = [(sug['original_word'], scoreWord(sug, word)) for sug in suggestions]\r\n        scores.sort(key=lambda candidate: candidate[1])\r\n        return scores\r\n    else:\r\n        return []\r\n\r\ndef scoreWord(suggestion, misspelled):\r\n    distance = float(levenshtain.dameraulevenshtein(suggestion['original_word'], misspelled))\r\n    if distance == 0:\r\n        distance = 1000\r\n    fuzzy = suggestion['score']\r\n    logFreq = suggestion['freq']\r\n\r\n    return distance\/(fuzzy*logFreq)<\/pre>\n<div style=\"text-align: justify\">From the previous listing I have to make some remarks. In line 2 and 3 we use third party libraries for <a href=\"http:\/\/mwh.geek.nz\/2009\/04\/26\/python-damerau-levenshtein-distance\/\">Levenshtein distance<\/a> and <a href=\"http:\/\/www.atomodo.com\/code\/double-metaphone\">metaphone<\/a> algorithms. In line 8 we are collecting a list of 70 candidates. This particular number was found empirically. With higher candidates the algorithm is slower and with fewer is less effective. We are also excluding the misspelled word from the candidates list in line 30. As we used the wikipedia as our source it is common that the misspelled word is found in the dictionary. So if the Leveshtain distance is 0 (same word) we add 1000 to its distance. <\/div>\n<p><strong>Tests<\/strong>  <\/p>\n<div style=\"text-align: justify\">I ran some tests with this algorithm. The first one will be using the dataset that <a href=\"http:\/\/norvig.com\/spell-correct.html\">Peter Norvig<\/a> used in his article. I found the correct suggestion of the word in the first position approximately 80% of the times!!! That\u2019s is a really good result. Norvig with the same dataset (but with a different algoritm and training set) got 67% <\/div>\n<p>Now let\u2019s repeat some of the test of the previous post to see the improvement. In the following table I show you the results. <\/p>\n<table border=\"1\">\n<tbody>\n<tr>\n<td><strong>Test set<\/strong><\/td>\n<td><strong>% Solr<\/strong><\/td>\n<td><strong>% new<\/strong><\/td>\n<td><strong> Solr time [seconds]<\/strong><\/td>\n<td><strong>New time [seconds]<\/strong><\/td>\n<td><strong>Improvement<\/strong><\/td>\n<td><strong>Time loss<\/strong><\/td>\n<\/tr>\n<tr>\n<td><em><span style=\"color: lime\">FAWTHROP1DAT.643<\/span><\/em><\/td>\n<td>45,61%<\/td>\n<td>81,91%<\/td>\n<td>31,50<\/td>\n<td>74,19<\/td>\n<td>79,58%<\/td>\n<td>135,55%<\/td>\n<\/tr>\n<tr>\n<td><em><span style=\"color: lime\">batch0.tab<\/span><\/em><\/td>\n<td>28,70%<\/td>\n<td>56,34%<\/td>\n<td>21,95<\/td>\n<td>47,05<\/td>\n<td>96,30%<\/td>\n<td>114,34%<\/td>\n<\/tr>\n<tr>\n<td><em><span style=\"color: lime\">SHEFFIELDDAT.643<\/span><\/em><\/td>\n<td>60,42%<\/td>\n<td>86,24%<\/td>\n<td>19,29<\/td>\n<td>35,12<\/td>\n<td>42,75%<\/td>\n<td>82,06%<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>We can see that we get very good improvements in effectiveness of the correction but it takes about twice the time.<\/p>\n<p><strong>Future work<\/strong>      <\/p>\n<div style=\"text-align: justify\">How can we improve this spellchecker. Well, studying the candidates list it can be found that the correct word is generally (95% of the times) contained in it. So all our efforts should be aimed to improve the scoring algorithm.       <\/div>\n<div style=\"text-align: justify\">\n<\/div>\n<div style=\"text-align: justify\">We have many ways of improving the channel model; several papers show that calculating more sophisticated distances weighting the different letter transformations according to language statistics can give us a better measure. For example we know that writing \u2018houpe\u2019 y less probable than writing \u2018houze\u2019.       <\/div>\n<div style=\"text-align: justify\">\n<\/div>\n<div style=\"text-align: justify\">For the language model, great improvements can be obtained by adding more context to the word. For example if we misspelled \u2018nouse\u2019 it is very difficult to tell that the correct word is \u2018house\u2019 or \u2018mouse\u2019. But if we add more words \u201cpaint my nouse\u201d it is evident that the word that we were looking for was \u2018house\u2019 (unless you have strange habits involving rodents). These are also called ngrams (but of words in this case, instead of letters). Google has offered a big collection of ngrams that are available to download, with their frequencies.       <\/div>\n<div style=\"text-align: justify\">\n<\/div>\n<div style=\"text-align: justify\">Lastly but not least, the performance can be improved by programming the script in java. Part of the algorithm was in python.       <\/div>\n<div style=\"text-align: justify\">\n<\/div>\n<div style=\"text-align: justify\">Bye!       <\/div>\n<div style=\"text-align: justify\">\n<\/div>\n<div style=\"text-align: justify\">As an update for all of you interested, Robert Muir <a href=\"http:\/\/search-lucene.com\/m\/n0xN61iTIry\/My+spellchecker+experiment&amp;subj=My+spellchecker+experiment\">told me<\/a> in the Solr User list that there is a new spellchecker, DirectSpellChecker, that was in the trunk then and now should be part of Solr 3.1. It uses a similar technique to the one i presented in this entry without the performance loses.&nbsp; &nbsp; &nbsp; <\/div>\n<div style=\"text-align: justify\">\n<\/div>\n<div style=\"text-align: justify\"><strong>References<\/strong><\/div>\n<div style=\"text-align: justify\">\n<\/div>\n<div style=\"text-align: justify\"><a href=\"http:\/\/www.blogger.com\/blogger.g?blogID=8574118689743302986\" name=\"kukich\"><\/a><em>[Kukich] Karen Kukich \u2013 Techniques for automatically correcting words in text \u2013 ACM Computing Surveys \u2013 Volume 24 Issue 4, Dec. 1992<\/em>      <\/div>\n<div style=\"text-align: justify\">\n<\/div>\n<div style=\"text-align: justify\"><em><a href=\"http:\/\/www.blogger.com\/blogger.g?blogID=8574118689743302986\" name=\"cucerzan\"><\/a>[Cucerzan] S. Cucerzan and E. Brill Spelling correction as an iterative process that exploits the collective knowledge of web users. July 2004<\/em><\/div>\n<div style=\"text-align: justify\">\n<\/div>\n<div style=\"text-align: justify\"><em><a href=\"http:\/\/norvig.com\/spell-correct.html\">Peter Norvig \u2013 How to Write a Spelling Corrector<\/a><\/em><\/div>\n<div style=\"text-align: justify\">\n<\/div>\n<p><strong><i>Reference: <\/i><\/strong><a href=\"http:\/\/emmaespina.wordpress.com\/2011\/01\/31\/20\/\">Creating a spellchecker with Solr <\/a> from our <a href=\"http:\/\/www.javacodegeeks.com\/p\/jcg.html\">JCG partner<\/a> Emmanuel Espina at the <a href=\"http:\/\/emmaespina.wordpress.com\/\">emmaespina<\/a> blog.<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>In a previous post I talked about how the Solr Spellchecker works and then I showed you some test results of its performance. Now we are going to see another aproach to spellchecking. This method, as many others, use a two step procedure. A rather fast \u201ccandidate word\u201d selection, and then a scoring of those &hellip;<\/p>\n","protected":false},"author":194,"featured_media":80,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[8],"tags":[470],"class_list":["post-1178","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-enterprise-java","tag-apache-solr"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.5 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Solr: Creating a spellchecker - Java Code Geeks<\/title>\n<meta name=\"description\" content=\"In a previous post I talked about how the Solr Spellchecker works and then I showed you some test results of its performance. Now we are going to see\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.javacodegeeks.com\/2012\/06\/solr-creating-spellchecker.html\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Solr: Creating a spellchecker - Java Code Geeks\" \/>\n<meta property=\"og:description\" content=\"In a previous post I talked about how the Solr Spellchecker works and then I showed you some test results of its performance. Now we are going to see\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.javacodegeeks.com\/2012\/06\/solr-creating-spellchecker.html\" \/>\n<meta property=\"og:site_name\" content=\"Java Code Geeks\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/javacodegeeks\" \/>\n<meta property=\"article:published_time\" content=\"2012-06-01T16:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2012-10-22T04:57:39+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2012\/10\/apache-solr-logo.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"150\" \/>\n\t<meta property=\"og:image:height\" content=\"150\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Emmanuel Espina\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@javacodegeeks\" \/>\n<meta name=\"twitter:site\" content=\"@javacodegeeks\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Emmanuel Espina\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"11 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2012\\\/06\\\/solr-creating-spellchecker.html#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2012\\\/06\\\/solr-creating-spellchecker.html\"},\"author\":{\"name\":\"Emmanuel Espina\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#\\\/schema\\\/person\\\/ad873280e32f8e37df39e76f58b299c4\"},\"headline\":\"Solr: Creating a spellchecker\",\"datePublished\":\"2012-06-01T16:00:00+00:00\",\"dateModified\":\"2012-10-22T04:57:39+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2012\\\/06\\\/solr-creating-spellchecker.html\"},\"wordCount\":1933,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2012\\\/06\\\/solr-creating-spellchecker.html#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2012\\\/10\\\/apache-solr-logo.jpg\",\"keywords\":[\"Apache Solr\"],\"articleSection\":[\"Enterprise Java\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.javacodegeeks.com\\\/2012\\\/06\\\/solr-creating-spellchecker.html#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2012\\\/06\\\/solr-creating-spellchecker.html\",\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2012\\\/06\\\/solr-creating-spellchecker.html\",\"name\":\"Solr: Creating a spellchecker - Java Code Geeks\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2012\\\/06\\\/solr-creating-spellchecker.html#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2012\\\/06\\\/solr-creating-spellchecker.html#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2012\\\/10\\\/apache-solr-logo.jpg\",\"datePublished\":\"2012-06-01T16:00:00+00:00\",\"dateModified\":\"2012-10-22T04:57:39+00:00\",\"description\":\"In a previous post I talked about how the Solr Spellchecker works and then I showed you some test results of its performance. Now we are going to see\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2012\\\/06\\\/solr-creating-spellchecker.html#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.javacodegeeks.com\\\/2012\\\/06\\\/solr-creating-spellchecker.html\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2012\\\/06\\\/solr-creating-spellchecker.html#primaryimage\",\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2012\\\/10\\\/apache-solr-logo.jpg\",\"contentUrl\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2012\\\/10\\\/apache-solr-logo.jpg\",\"width\":150,\"height\":150},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2012\\\/06\\\/solr-creating-spellchecker.html#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.javacodegeeks.com\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Java\",\"item\":\"https:\\\/\\\/www.javacodegeeks.com\\\/category\\\/java\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Enterprise Java\",\"item\":\"https:\\\/\\\/www.javacodegeeks.com\\\/category\\\/java\\\/enterprise-java\"},{\"@type\":\"ListItem\",\"position\":4,\"name\":\"Solr: Creating a spellchecker\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#website\",\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/\",\"name\":\"Java Code Geeks\",\"description\":\"Java Developers Resource Center\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#organization\"},\"alternateName\":\"JCG\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.javacodegeeks.com\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#organization\",\"name\":\"Exelixis Media P.C.\",\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2022\\\/06\\\/exelixis-logo.png\",\"contentUrl\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2022\\\/06\\\/exelixis-logo.png\",\"width\":864,\"height\":246,\"caption\":\"Exelixis Media P.C.\"},\"image\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/javacodegeeks\",\"https:\\\/\\\/x.com\\\/javacodegeeks\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#\\\/schema\\\/person\\\/ad873280e32f8e37df39e76f58b299c4\",\"name\":\"Emmanuel Espina\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/82f79d369915f9bc8b5b85c9ed35e47ef57e7f609e72ed2f98f3396efa15f43e?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/82f79d369915f9bc8b5b85c9ed35e47ef57e7f609e72ed2f98f3396efa15f43e?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/82f79d369915f9bc8b5b85c9ed35e47ef57e7f609e72ed2f98f3396efa15f43e?s=96&d=mm&r=g\",\"caption\":\"Emmanuel Espina\"},\"sameAs\":[\"http:\\\/\\\/emmaespina.wordpress.com\\\/\"],\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/author\\\/Emmanuel-Espina\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Solr: Creating a spellchecker - Java Code Geeks","description":"In a previous post I talked about how the Solr Spellchecker works and then I showed you some test results of its performance. Now we are going to see","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.javacodegeeks.com\/2012\/06\/solr-creating-spellchecker.html","og_locale":"en_US","og_type":"article","og_title":"Solr: Creating a spellchecker - Java Code Geeks","og_description":"In a previous post I talked about how the Solr Spellchecker works and then I showed you some test results of its performance. Now we are going to see","og_url":"https:\/\/www.javacodegeeks.com\/2012\/06\/solr-creating-spellchecker.html","og_site_name":"Java Code Geeks","article_publisher":"https:\/\/www.facebook.com\/javacodegeeks","article_published_time":"2012-06-01T16:00:00+00:00","article_modified_time":"2012-10-22T04:57:39+00:00","og_image":[{"width":150,"height":150,"url":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2012\/10\/apache-solr-logo.jpg","type":"image\/jpeg"}],"author":"Emmanuel Espina","twitter_card":"summary_large_image","twitter_creator":"@javacodegeeks","twitter_site":"@javacodegeeks","twitter_misc":{"Written by":"Emmanuel Espina","Est. reading time":"11 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.javacodegeeks.com\/2012\/06\/solr-creating-spellchecker.html#article","isPartOf":{"@id":"https:\/\/www.javacodegeeks.com\/2012\/06\/solr-creating-spellchecker.html"},"author":{"name":"Emmanuel Espina","@id":"https:\/\/www.javacodegeeks.com\/#\/schema\/person\/ad873280e32f8e37df39e76f58b299c4"},"headline":"Solr: Creating a spellchecker","datePublished":"2012-06-01T16:00:00+00:00","dateModified":"2012-10-22T04:57:39+00:00","mainEntityOfPage":{"@id":"https:\/\/www.javacodegeeks.com\/2012\/06\/solr-creating-spellchecker.html"},"wordCount":1933,"commentCount":0,"publisher":{"@id":"https:\/\/www.javacodegeeks.com\/#organization"},"image":{"@id":"https:\/\/www.javacodegeeks.com\/2012\/06\/solr-creating-spellchecker.html#primaryimage"},"thumbnailUrl":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2012\/10\/apache-solr-logo.jpg","keywords":["Apache Solr"],"articleSection":["Enterprise Java"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.javacodegeeks.com\/2012\/06\/solr-creating-spellchecker.html#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.javacodegeeks.com\/2012\/06\/solr-creating-spellchecker.html","url":"https:\/\/www.javacodegeeks.com\/2012\/06\/solr-creating-spellchecker.html","name":"Solr: Creating a spellchecker - Java Code Geeks","isPartOf":{"@id":"https:\/\/www.javacodegeeks.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.javacodegeeks.com\/2012\/06\/solr-creating-spellchecker.html#primaryimage"},"image":{"@id":"https:\/\/www.javacodegeeks.com\/2012\/06\/solr-creating-spellchecker.html#primaryimage"},"thumbnailUrl":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2012\/10\/apache-solr-logo.jpg","datePublished":"2012-06-01T16:00:00+00:00","dateModified":"2012-10-22T04:57:39+00:00","description":"In a previous post I talked about how the Solr Spellchecker works and then I showed you some test results of its performance. Now we are going to see","breadcrumb":{"@id":"https:\/\/www.javacodegeeks.com\/2012\/06\/solr-creating-spellchecker.html#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.javacodegeeks.com\/2012\/06\/solr-creating-spellchecker.html"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.javacodegeeks.com\/2012\/06\/solr-creating-spellchecker.html#primaryimage","url":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2012\/10\/apache-solr-logo.jpg","contentUrl":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2012\/10\/apache-solr-logo.jpg","width":150,"height":150},{"@type":"BreadcrumbList","@id":"https:\/\/www.javacodegeeks.com\/2012\/06\/solr-creating-spellchecker.html#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.javacodegeeks.com\/"},{"@type":"ListItem","position":2,"name":"Java","item":"https:\/\/www.javacodegeeks.com\/category\/java"},{"@type":"ListItem","position":3,"name":"Enterprise Java","item":"https:\/\/www.javacodegeeks.com\/category\/java\/enterprise-java"},{"@type":"ListItem","position":4,"name":"Solr: Creating a spellchecker"}]},{"@type":"WebSite","@id":"https:\/\/www.javacodegeeks.com\/#website","url":"https:\/\/www.javacodegeeks.com\/","name":"Java Code Geeks","description":"Java Developers Resource Center","publisher":{"@id":"https:\/\/www.javacodegeeks.com\/#organization"},"alternateName":"JCG","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.javacodegeeks.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.javacodegeeks.com\/#organization","name":"Exelixis Media P.C.","url":"https:\/\/www.javacodegeeks.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.javacodegeeks.com\/#\/schema\/logo\/image\/","url":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2022\/06\/exelixis-logo.png","contentUrl":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2022\/06\/exelixis-logo.png","width":864,"height":246,"caption":"Exelixis Media P.C."},"image":{"@id":"https:\/\/www.javacodegeeks.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/javacodegeeks","https:\/\/x.com\/javacodegeeks"]},{"@type":"Person","@id":"https:\/\/www.javacodegeeks.com\/#\/schema\/person\/ad873280e32f8e37df39e76f58b299c4","name":"Emmanuel Espina","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/82f79d369915f9bc8b5b85c9ed35e47ef57e7f609e72ed2f98f3396efa15f43e?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/82f79d369915f9bc8b5b85c9ed35e47ef57e7f609e72ed2f98f3396efa15f43e?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/82f79d369915f9bc8b5b85c9ed35e47ef57e7f609e72ed2f98f3396efa15f43e?s=96&d=mm&r=g","caption":"Emmanuel Espina"},"sameAs":["http:\/\/emmaespina.wordpress.com\/"],"url":"https:\/\/www.javacodegeeks.com\/author\/Emmanuel-Espina"}]}},"_links":{"self":[{"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/posts\/1178","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/users\/194"}],"replies":[{"embeddable":true,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/comments?post=1178"}],"version-history":[{"count":0,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/posts\/1178\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/media\/80"}],"wp:attachment":[{"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/media?parent=1178"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/categories?post=1178"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/tags?post=1178"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}