Skip to content

Comments

BUG Return response in to_authority when not proritized authority#3239

Closed
snowman2 wants to merge 1 commit intorasterio:mainfrom
snowman2:to_authority
Closed

BUG Return response in to_authority when not proritized authority#3239
snowman2 wants to merge 1 commit intorasterio:mainfrom
snowman2:to_authority

Conversation

@snowman2
Copy link
Member

@snowman2 snowman2 commented Nov 7, 2024

Alternative to #3206

@snowman2 snowman2 added the bug label Nov 7, 2024
@snowman2 snowman2 added this to the 1.4.2 milestone Nov 7, 2024
for auth_name in preferred_authorities:
try:
return auth_name, matches[auth_name][0]
except (KeyError, IndexError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be curious what the generated c code looks like for the whole function this versus the original if statements.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snowman2 how about sorting matches using preferred_authorities.index(auth_name) if auth_name in preferred_authorities else 10 as a sort key, and then take the first one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about sorting matches

What improvement do you see from sorting?

Copy link
Member

@sgillies sgillies Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to avoid using try/except for logic when there are other approaches, leaving it more for handing unexpected errors or inherently non-deterministic behavior. Also, the whole intent of the function is to sort preferentially, so I think it's best to do that explicitly.

Copy link
Contributor

@groutr groutr Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would something like this work?

if not matches:
    return None
for auth_name in preferred_authorities:
    if auth_name in matches:
        return auth_name, matches[auth_name][0]
else:
    auth_name, codes = max(matches.items(), key=lambda x: x[1][0])
    return auth_name, codes[0]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@groutr something like that.

Looking at the code of to_authority() and _matches(), however, I'm no longer sure that the latter returns a data structure that allows the former to return the best match.

GDAL has a method that finds authority matches and returns them in descending confidence order. I think my intention for "preferred authorities" was that we could break ties in one direction in case there were multiple authority/code matches with the same confidence. I had the understanding at the time, and I think I saw it in practice, that the ESRI authority duplicated items in the EPSG database, and that it would be best to return ("EPSG", 00000) in such cases.

However, _matches() loses information about confidence when it collects matches by authority name. If we changed it (it's a "private" method, so we can if we want), we could have it return a list, every item of which could be a list of authority/code matches with the same confidence level. Something like:

[
  [("ESRI", 10000), ("EPSG", 20000), ...],
  [("FOO", 30000)],
  ...
]

Then, to_authority() could iterate over these items, return the one that is most preferred, or move on to the next item in the list.

Does that seem okay to you all?

Copy link
Contributor

@groutr groutr Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preserving the confidence info makes a lot of sense to me. What if _matches() just returned a flat list? I'm not seeing the utility of having the results grouped by confidence level.

# First element is the confidence level (0-100).
[(100, "ESRI", 10000),
(95, "EPSG", 20000),
(80, "ESRI", 10500),
(75, "FOO", 30000),
  ...
]

We know the output OSRFindMatches is already sorted in descending order, which is useful to preserve. If we wanted to groupby confidence level it becomes easy with:

from itertools import groupby
groupby(matches, key=itemgetter(0))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list_authority would be a good reference for the design. It returns a list of namedtuple that contains the match confidence information. The docstrings show an example of that.

@snowman2
Copy link
Member Author

Going to close this in favor of a different implementation following the sorting modifications @sgillies suggested above. I am not sure when I will have a free moment, so anyone interested is welcome to move this forward.

@snowman2 snowman2 closed this Nov 10, 2024
@snowman2 snowman2 deleted the to_authority branch November 26, 2024 02:22
sgillies added a commit that referenced this pull request Nov 27, 2024
)

* Rewrite _matches() to better support to_authority() and to_epsg()

Resolves #3239

* Remove list() call and update change log
sgillies added a commit that referenced this pull request Dec 2, 2024
* Eliminate boundless reads in merge, new compositing implementation (#3234)

* Eliminate boundless reads in merge, new compositing implementation

Resolves #3228

* Fix nits

* Compare non-zero mean of arrays

* Implement and use Window.align()

Combines the effects of Window.round_offsets and
Window.round_lengths, producing the same effect as gdal_merge.py

* Remove unused math import

* Add more docs about align()

Also increase the fraction used in round_offsets() to be
consistent with align()

* Move align() to a private func, use two existing methods in tests

* Add a test for merging WarpedVRTs (#3237)

Resolves #3196

* Backport of #3217 (#3243)

* Backport of #3217

* Update change log

* Increment GDAL and Python versions for CI (#3244)

* Rewrite _matches() to better support to_authority() and to_epsg() (#3255)

* Rewrite _matches() to better support to_authority() and to_epsg()

Resolves #3239

* Remove list() call and update change log

* Use to_epsg() in is_epsg_code() (#3258)

* Use to_epsg() in is_epsg_code()

Resolves #3248

* Update change log

* Allow MRF compression to surface in properties (#3259)

* Allow MRF compression to surface in properties

Resolves #3256

* Update change log

* Register drivers at most once per process (#3260)

* Register drivers at most once per process

Resolves #3250

* Appease flake8

* Add a note about GDAL_SKIP in the change log

* Support all GDALFillNodata() options in rasterio.fill (#3265)

* Support all GDALFillNodata() options

Resolve #3175.

* Cast values to str and update docs

* Update change log

* Prevent rasterio from trying to open a dataset object (#3266)

Resolves #3105.

* Fix typos discovered by codespell (#3264) (#3267)

* Fix typos discovered by codespell

* crasher



---------

Co-authored-by: Christian Clauss <[email protected]>

* Fix erroneous masking of 0-valued data (#3268)

* Fix erroneous masking of 0-valued data

Resolves #3245

* Add an assertion about data values and update change log

* This is 1.4.3

---------

Co-authored-by: Christian Clauss <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants