Skip to content

Use default index + user-lang index in autocomplete#1300

Merged
orangejulius merged 5 commits intomasterfrom
joxit/feat/autocomplete-multi-lang-index
Jun 26, 2020
Merged

Use default index + user-lang index in autocomplete#1300
orangejulius merged 5 commits intomasterfrom
joxit/feat/autocomplete-multi-lang-index

Conversation

@Joxit
Copy link
Copy Markdown
Member

@Joxit Joxit commented May 20, 2019

EDITED
The lang boos is the same as the default index.
The default lang is en.

I ran an acceptance test on our production server, there are no regressions 🎉

related: #1296
superseed: #1298
Fixes pelias/pelias#767

@Joxit
Copy link
Copy Markdown
Member Author

Joxit commented May 20, 2019

It seems that multi_match with boost * 2 breaks the focus point.
Even if I remove * 2, there are Hard Rock, Myanmar which should not be in the list (its distance is 13550.189).

Request:

/v1/autocomplete?text=hard rock cafe&focus.point.lat=40.744243&focus.point.lon=-73.990342

@missinglink
Copy link
Copy Markdown
Member

missinglink commented May 21, 2019

The boost values are all messed up, in #1287 I set them all back to 1 and it works fine.
I suspect that once my work in #1287 is finished and we merge these PRs then the Hard Rock test case will fix itself.

@Joxit
Copy link
Copy Markdown
Member Author

Joxit commented May 21, 2019

Okay, I will wait your PR to fix this then.

@Joxit
Copy link
Copy Markdown
Member Author

Joxit commented Oct 2, 2019

Rebase to master with the new pelias parser 😄

I added some acceptance tests too

@Joxit
Copy link
Copy Markdown
Member Author

Joxit commented Oct 3, 2019

Some news about this PR

I did some acceptance tests, and there is no change between master and this branch and, as is, this PR is a great improvement for Pelias API.

There are some limitations, such as the search of Parijs, Frankrijk, due to hierarchy index limitation.
Since we have pelias/whosonfirst#446 and the support for multi-lang names property in OSM, we can search for both OSM venues/addresses and WOF by their non-default/English name.

This means that if we do an autocomplete search of Parijs (with lang=nl) we will get Paris 👍 but if you do an autocomplete search of Parijs, Frankrijk we will get no results 👎.

Why ?

Because the API will search a document with a name property Parijs AND in its hierarchy (country, locality...) Frankrijk. But in our ES index, we only have the default name in document hierarchy. So Frankrijk is not in the Parijs hierarchy... Only France is present.

How to get rid of this ?

We do not have many solutions... I found only 2.

  1. We add alternative names in our hierarchy to takes advantage of ES fuzzy search when we autocomplete things.
  2. We do 2 requests, my example is for the search of Pont Neuf, Parijs, Frankrijk:
    1. The first one is searching ES admin information, something like source=wof AND (name.lang='Parijs' or name.lang='Frankrijk') and take their ids.
    2. The second one is searching in ES Pont Neuf which is in the hierarchy found lately, something like name.lang='Pont Neuf' AND (country.id IN ('whosonfirst:country:85633147', 'whosonfirst:locality:101751119') OR locality IN ('whosonfirst:country:85633147', 'whosonfirst:locality:101751119'))

IMO, the first solution is the simplest, but will increase significantly the ES index because this must be done for all 650M documents... The second issue will be "how to get this data at import time ?" using WOF + placehoder for alt names (more http call + complexity) ? Add alt names in WOF Admin lookup (more memory usage) ?

Thoughts @missinglink @orangejulius ?

@Joxit Joxit force-pushed the joxit/feat/autocomplete-multi-lang-index branch from e266722 to 5cd3882 Compare October 16, 2019 15:25
@Joxit
Copy link
Copy Markdown
Member Author

Joxit commented Oct 16, 2019

Sync again with master

@Joxit Joxit force-pushed the joxit/feat/autocomplete-multi-lang-index branch from 5cd3882 to 7dc1666 Compare October 30, 2019 09:04
@Joxit Joxit force-pushed the joxit/feat/autocomplete-multi-lang-index branch from 7dc1666 to 4441892 Compare November 8, 2019 13:58
@Joxit
Copy link
Copy Markdown
Member Author

Joxit commented Nov 8, 2019

Sync with master

@Joxit
Copy link
Copy Markdown
Member Author

Joxit commented Nov 8, 2019

Added one commit, now I use the multi_match leaf query from pelias/query#114, the code is cleaner and easier to review

@Joxit Joxit force-pushed the joxit/feat/autocomplete-multi-lang-index branch from cb171a5 to d8bf26f Compare November 8, 2019 16:31
Copy link
Copy Markdown
Member

@missinglink missinglink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, firstly sorry for not getting to this sooner 🙇‍♂️

The code is nice & clean and I think this is good to merge.
Although not explicitly specified the best fields query type (the default) is the correct one to use here.

Nice work 👍

@Joxit
Copy link
Copy Markdown
Member Author

Joxit commented Nov 25, 2019

I used phrase type because it's a mix between best_field and the phrase type already used for autocomplete.

The phrase and phrase_prefix types behave just like best_fields, but they use a match_phrase or match_phrase_prefix query instead of a match query.

Acceptance tests for this PR are here : pelias/acceptance-tests#505 🚀

@missinglink
Copy link
Copy Markdown
Member

Oh sorry, I think I got confused 👍

@orangejulius
Copy link
Copy Markdown
Member

I finally tested this out and it is awesome.

It indeed doesn't appear to break anything, but should offer huge improvements out of the box for people who have their browser set to non-English languages.

Yet again, another PR I should have dealt with far, far sooner. <3 @Joxit

I rebased this again and fixed a tiny issue that had cropped up with some string validation (only an issue because I let this PR sit for so long).

I'm going to do a little performance testing on this (somehow), but expect it to be merged soon.

@orangejulius orangejulius self-assigned this May 26, 2020
@orangejulius
Copy link
Copy Markdown
Member

Another update here, we have been testing this out with a few Geocode Earth customers (it's behind a feature flag they can opt into) and it all looks amazing so far. I plan to merge this soon :)

@Joxit
Copy link
Copy Markdown
Member Author

Joxit commented Jun 2, 2020

Yeeeeees !! 😄 I'm glad this PR works great and is appreciated 😄

@Joxit Joxit force-pushed the joxit/feat/autocomplete-multi-lang-index branch from 37d5261 to 6ac599d Compare June 4, 2020 13:55
@Joxit
Copy link
Copy Markdown
Member Author

Joxit commented Jun 4, 2020

Sync with master for conflicts

@slvlirnoff
Copy link
Copy Markdown

  1. We add alternative names in our hierarchy to takes advantage of ES fuzzy search when we autocomplete things.

  2. We do 2 requests, my example is for the search of Pont Neuf, Parijs, Frankrijk:

    1. The first one is searching ES admin information, something like source=wof AND (name.lang='Parijs' or name.lang='Frankrijk') and take their ids.
    2. The second one is searching in ES Pont Neuf which is in the hierarchy found lately, something like name.lang='Pont Neuf' AND (country.id IN ('whosonfirst:country:85633147', 'whosonfirst:locality:101751119') OR locality IN ('whosonfirst:country:85633147', 'whosonfirst:locality:101751119'))

IMO, the first solution is the simplest, but will increase significantly the ES index because this must be done for all 650M documents... The second issue will be "how to get this data at import time ?" using WOF + placehoder for alt names (more http call + complexity) ? Add alt names in WOF Admin lookup (more memory usage) ?

Thoughts @missinglink @orangejulius ?

A small workaround, that makes results more noisy but at least is quite easy to put in place:
if query.lang !== default indexed admin then change the admin part of the query to a 'should' instead of a 'must'

orangejulius added a commit to pelias/acceptance-tests that referenced this pull request Jun 25, 2020
@orangejulius orangejulius force-pushed the joxit/feat/autocomplete-multi-lang-index branch from 6ac599d to c108a37 Compare June 26, 2020 13:48
@orangejulius
Copy link
Copy Markdown
Member

Okay, today's the day :)

🥳 🎉 🎈

@orangejulius orangejulius merged commit a3fc0b6 into master Jun 26, 2020
@orangejulius orangejulius deleted the joxit/feat/autocomplete-multi-lang-index branch June 26, 2020 13:51
@Joxit
Copy link
Copy Markdown
Member Author

Joxit commented Jun 26, 2020

Yeaaaaaah !!!! :D 🚀

orangejulius added a commit to pelias/acceptance-tests that referenced this pull request Aug 25, 2021
These were actually all fixed back in
pelias/api#1300, but we hadn't marked them
passing until now.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Autocomplete] No Results for Foreign Language Queries

4 participants