-
Notifications
You must be signed in to change notification settings - Fork 531
Geospatial search #8239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Geospatial search #8239
Conversation
IQSS/8188-cache_version_differences
|
Per discussion at the DCM that the improved geospatial indexing could be useful independent of adding further geospatial search/display capabilities, I'm moving this PR from draft to review-ready. Not sure what the Rocky shellspec issue is - probably worth re-running the build to see if there's any real issue. |
pdurbin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qqmyers this looks promising! Can you please merge from develop and add a release note? What about docs? I feel like we should add something...
pdurbin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment about testing.
| } | ||
| } | ||
|
|
||
| for(String metadataBlockName : metadataBlocksWithValue) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rewatched Jim's recent "Experimenting with Geospatial indexing" talk ( https://osf.io/84pnw ) and it reminded me that I asked for an test to be added to SearchIT. I still want this and I'm happy to try to add it, if it make sense for me to jump in the code. 😄
Co-authored-by: Philip Durbin <[email protected]>
we may want to test for that, but not here.
|
I think the indexing code assumes you're talking about a point (or line) if you leave one coord out so it just replicates the value for the other extreme. So ~feature. |
With only westLongitude added and the other three coordinates left empty, we were getting the following exception. A null check was added to prevent this. Command [DatasetCreate dataset:132] failed: Exception thrown from bean: javax.ejb.EJBTransactionRolledbackException: Exception thrown from bean: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8983/solr/collection1: ERROR: [doc=dataset_132_draft] Error adding field 'solr_bboxtype'='ENVELOPE(null,null,null,null)' msg=Unable to parse shape given formats "lat,lon", "x y" or as WKT because java.text.ParseException: Expected a number input: ENVELOPE(null,null,null,null)
|
@qqmyers ok, but I did go ahead and fix a bug when there was only one coordinate: e5187b2 Next up: names. I haven't heard any objections to On the Solr field side we currently have Any reason not to shorted to just I don't know what to say about RPT. SpatialRecursivePrefixTreeFieldType is a mouthful. I'm not in love with Update: In 28215fb I renamed the Solr fields:
Feedback welcome! |
|
I added some error checking and updated the docs based on how the code is now. I talked to @atrisovic about this feature and she immediately asked "Can you sort by distance?" Good question! 😄 I haven't looked into this but maybe? See https://solr.apache.org/guide/8_11/spatial-search.html#distance-sorting-or-boosting-function-queries for I'm also ok with this being API only for now. That's how I wrote it up in the User Guide: https://dataverse-guide--8239.org.readthedocs.build/en/8239/user/find-use-data.html#geospatial-search I'm ready for review. We could keep hacking but there's lots of other stuff to do as well. 😄 I think this is an excellent toehold into geospatial search! |
| junkRadius.then().assertThat() | ||
| .statusCode(BAD_REQUEST.getStatusCode()) | ||
| .body("message", CoreMatchers.equalTo("Non-number radius supplied.")); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a test that's actually a valid lat, long and radius?
or a valid lat with a non-numeric long, etc.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 364e347 I added a test for if lat/long is too big. It turns out Solr provides a pretty useful error, which was already being bubbled up.
I also took a swing at sending a huge value like "99999999999999999999999999999999" to geo_radius but Solr didn't give an error. I dunno, we already test that this number is positive. I say we let the use send bigger and bigger numbers (these are in kilometers). Eventually, they'll figure out they just aren't going to get a geospatial hit. 😄
|
|
||
| } catch (Exception ex) { | ||
| return error(Response.Status.BAD_REQUEST, ex.getLocalizedMessage()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to catch a nonnumeric exception here and give more feedback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, getGeoPoint and getGeoRadius will already throw exceptions with feedback about non-numeric values.
For example, if the user does this:
/api/search?q=*&geo_point=41.9580775,-70.6621063&geo_radius=junk
They'll get this error:
{"status":"ERROR","message":"Non-number radius supplied."}
That said, I may not be following the use cases, the concerns. I'm happy to add more tests for these.
sekmiller
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly looked at Phil's most recent changes. Looks good, just a couple of suggestions/questions about testing and error handling.
pdurbin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm replying to review feedback.
src/main/java/edu/harvard/iq/dataverse/search/IndexServiceBean.java
Outdated
Show resolved
Hide resolved
|
|
||
| } catch (Exception ex) { | ||
| return error(Response.Status.BAD_REQUEST, ex.getLocalizedMessage()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, getGeoPoint and getGeoRadius will already throw exceptions with feedback about non-numeric values.
For example, if the user does this:
/api/search?q=*&geo_point=41.9580775,-70.6621063&geo_radius=junk
They'll get this error:
{"status":"ERROR","message":"Non-number radius supplied."}
That said, I may not be following the use cases, the concerns. I'm happy to add more tests for these.
| junkRadius.then().assertThat() | ||
| .statusCode(BAD_REQUEST.getStatusCode()) | ||
| .body("message", CoreMatchers.equalTo("Non-number radius supplied.")); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 364e347 I added a test for if lat/long is too big. It turns out Solr provides a pretty useful error, which was already being bubbled up.
I also took a swing at sending a huge value like "99999999999999999999999999999999" to geo_radius but Solr didn't give an error. I dunno, we already test that this number is positive. I say we let the use send bigger and bigger numbers (these are in kilometers). Eventually, they'll figure out they just aren't going to get a geospatial hit. 😄
sekmiller
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks.
|
grooming
|

What this PR does / why we need it: Adds bounding box indexing in preparation for providing geosearch or other features
Which issue(s) this PR closes:
Closes #
Special notes for your reviewer:
Suggestions on how to test this:
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Is there a release notes update needed for this change?:
Additional documentation: design doc: https://docs.google.com/document/d/1aQCIM6wbAdgMNLTVSUhvuHQKzWDAvQ5XoYGbfint3P4/edit?usp=sharing