Skip to content

Conversation

@qqmyers
Copy link
Member

@qqmyers qqmyers commented Nov 11, 2021

What this PR does / why we need it: Adds bounding box indexing in preparation for providing geosearch or other features

Which issue(s) this PR closes:

Closes #

Special notes for your reviewer:

Suggestions on how to test this:

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation: design doc: https://docs.google.com/document/d/1aQCIM6wbAdgMNLTVSUhvuHQKzWDAvQ5XoYGbfint3P4/edit?usp=sharing

@coveralls
Copy link

coveralls commented Nov 11, 2021

Coverage Status

Coverage increased (+0.008%) to 19.989% when pulling b5383f4 on GlobalDataverseCommunityConsortium:GDCC/geosearch into 11abccf on IQSS:develop.

@qqmyers qqmyers marked this pull request as ready for review July 11, 2022 17:38
@qqmyers
Copy link
Member Author

qqmyers commented Jul 11, 2022

Per discussion at the DCM that the improved geospatial indexing could be useful independent of adding further geospatial search/display capabilities, I'm moving this PR from draft to review-ready. Not sure what the Rocky shellspec issue is - probably worth re-running the build to see if there's any real issue.

@pdurbin pdurbin self-assigned this Sep 8, 2022
Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qqmyers this looks promising! Can you please merge from develop and add a release note? What about docs? I feel like we should add something...

Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment about testing.

}
}

for(String metadataBlockName : metadataBlocksWithValue) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewatched Jim's recent "Experimenting with Geospatial indexing" talk ( https://osf.io/84pnw ) and it reminded me that I asked for an test to be added to SearchIT. I still want this and I'm happy to try to add it, if it make sense for me to jump in the code. 😄

@qqmyers
Copy link
Member Author

qqmyers commented Sep 15, 2022

I'm not sure it is possible to make an IT test yet. The solr query that works is as shown:
image

In the UI, I don't think we can submit filterqueries and with the API, you can submit the filter query part but I haven't seen how to send the pt and d parameters yet. If that is possible, an IT test would be straight forward.

Otherwise, to test one could use the solr console (dropping the jetty.host-localhost recommended for our solr install to make it accessible, and possibly doing an aws ec2 authorize-security-group-ingress --group-id <id> --protocol tcp --port 8983 --cidr <your ip>/32 to let a browser access the console.)

@qqmyers
Copy link
Member Author

qqmyers commented Oct 25, 2022

I think the indexing code assumes you're talking about a point (or line) if you leave one coord out so it just replicates the value for the other extreme. So ~feature.

With only westLongitude added and the other three coordinates left
empty, we were getting the following exception. A null check was added
to prevent this.

Command [DatasetCreate dataset:132] failed: Exception thrown from bean:
javax.ejb.EJBTransactionRolledbackException: Exception thrown from
bean:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
Error from server at http://localhost:8983/solr/collection1: ERROR:
[doc=dataset_132_draft] Error adding field
'solr_bboxtype'='ENVELOPE(null,null,null,null)' msg=Unable to parse
shape given formats "lat,lon", "x y" or as WKT because
java.text.ParseException: Expected a number input:
ENVELOPE(null,null,null,null)
@pdurbin
Copy link
Member

pdurbin commented Oct 25, 2022

@qqmyers ok, but I did go ahead and fix a bug when there was only one coordinate: e5187b2

Next up: names. I haven't heard any objections to geo_point and geo_radius so I'm planning on sticking with them. Unless we want spatial_point and spatial_radius? I'm pretty sure we're limiting ourselves to Earth, though, so I think "geo" is fine (and shorter!). 🌎

On the Solr field side we currently have solr_bboxtype and solr_srpt. None of our existing fields start with solr_ so I'm thinking of lopping that off.

Any reason not to shorted to just bbox? Our Solr fields usually don't end with type and just bbox is what's shown in the docs: https://solr.apache.org/guide/8_11/spatial-search.html#bboxfield

I don't know what to say about RPT. SpatialRecursivePrefixTreeFieldType is a mouthful. I'm not in love with solr_srpt but the only example I see in the docs is location_rpt. Perhaps we should go with that.


Update: In 28215fb I renamed the Solr fields:

  • solr_srpt is now geolocation
  • solr_bboxtype is now boundingBox

Feedback welcome!

@pdurbin pdurbin changed the title GDCC/Geosearch indexing Geospatial search Oct 26, 2022
@pdurbin pdurbin removed their assignment Oct 26, 2022
@pdurbin
Copy link
Member

pdurbin commented Oct 26, 2022

I added some error checking and updated the docs based on how the code is now.

I talked to @atrisovic about this feature and she immediately asked "Can you sort by distance?" Good question! 😄 I haven't looked into this but maybe? See https://solr.apache.org/guide/8_11/spatial-search.html#distance-sorting-or-boosting-function-queries for sort=geodist() asc. I'd need more data to test with.

I'm also ok with this being API only for now. That's how I wrote it up in the User Guide: https://dataverse-guide--8239.org.readthedocs.build/en/8239/user/find-use-data.html#geospatial-search

I'm ready for review. We could keep hacking but there's lots of other stuff to do as well. 😄 I think this is an excellent toehold into geospatial search!

@scolapasta
Copy link
Contributor

Moved back to "This Sprint" so that @pdurbin's recent changes can be reviewed. (@pdurbin reviewed the initial PR and was fine with the changes Jim made.)

@sekmiller sekmiller self-assigned this Nov 15, 2022
junkRadius.then().assertThat()
.statusCode(BAD_REQUEST.getStatusCode())
.body("message", CoreMatchers.equalTo("Non-number radius supplied."));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a test that's actually a valid lat, long and radius?
or a valid lat with a non-numeric long, etc.?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 364e347 I added a test for if lat/long is too big. It turns out Solr provides a pretty useful error, which was already being bubbled up.

I also took a swing at sending a huge value like "99999999999999999999999999999999" to geo_radius but Solr didn't give an error. I dunno, we already test that this number is positive. I say we let the use send bigger and bigger numbers (these are in kilometers). Eventually, they'll figure out they just aren't going to get a geospatial hit. 😄


} catch (Exception ex) {
return error(Response.Status.BAD_REQUEST, ex.getLocalizedMessage());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to catch a nonnumeric exception here and give more feedback?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, getGeoPoint and getGeoRadius will already throw exceptions with feedback about non-numeric values.

For example, if the user does this:

/api/search?q=*&geo_point=41.9580775,-70.6621063&geo_radius=junk

They'll get this error:

{"status":"ERROR","message":"Non-number radius supplied."}

That said, I may not be following the use cases, the concerns. I'm happy to add more tests for these.

Copy link
Contributor

@sekmiller sekmiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly looked at Phil's most recent changes. Looks good, just a couple of suggestions/questions about testing and error handling.

@pdurbin pdurbin self-assigned this Nov 18, 2022
Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm replying to review feedback.


} catch (Exception ex) {
return error(Response.Status.BAD_REQUEST, ex.getLocalizedMessage());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, getGeoPoint and getGeoRadius will already throw exceptions with feedback about non-numeric values.

For example, if the user does this:

/api/search?q=*&geo_point=41.9580775,-70.6621063&geo_radius=junk

They'll get this error:

{"status":"ERROR","message":"Non-number radius supplied."}

That said, I may not be following the use cases, the concerns. I'm happy to add more tests for these.

junkRadius.then().assertThat()
.statusCode(BAD_REQUEST.getStatusCode())
.body("message", CoreMatchers.equalTo("Non-number radius supplied."));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 364e347 I added a test for if lat/long is too big. It turns out Solr provides a pretty useful error, which was already being bubbled up.

I also took a swing at sending a huge value like "99999999999999999999999999999999" to geo_radius but Solr didn't give an error. I dunno, we already test that this number is positive. I say we let the use send bigger and bigger numbers (these are in kilometers). Eventually, they'll figure out they just aren't going to get a geospatial hit. 😄

Copy link
Contributor

@sekmiller sekmiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks.

@kcondon kcondon self-assigned this Nov 30, 2022
@kcondon kcondon merged commit b62a8a5 into IQSS:develop Nov 30, 2022
@pdurbin pdurbin added this to the 5.13 milestone Dec 1, 2022
@mreekie mreekie added the pm.netcdf-hdf5.d All 3 aims are currently under this deliverable label Mar 30, 2023
@mreekie
Copy link

mreekie commented Mar 30, 2023

grooming

  • Looking at the NetCDF deliverables.
  • This looks like it's a related issue that is related to active work, but was done before we were trying to tag each issue
  • Adding it to the backlog as cleared, and tagging it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Feature: Geospatial FY26 Sprint 14 FY26 Sprint 14 (2025-12-31 - 2026-01-14) pm.netcdf-hdf5.d All 3 aims are currently under this deliverable

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

8 participants