BUG Return response in to_authority when not proritized authority#3239
BUG Return response in to_authority when not proritized authority#3239snowman2 wants to merge 1 commit intorasterio:mainfrom
Conversation
| for auth_name in preferred_authorities: | ||
| try: | ||
| return auth_name, matches[auth_name][0] | ||
| except (KeyError, IndexError): |
There was a problem hiding this comment.
I'd be curious what the generated c code looks like for the whole function this versus the original if statements.
There was a problem hiding this comment.
@snowman2 how about sorting matches using preferred_authorities.index(auth_name) if auth_name in preferred_authorities else 10 as a sort key, and then take the first one?
There was a problem hiding this comment.
how about sorting matches
What improvement do you see from sorting?
There was a problem hiding this comment.
I'd like to avoid using try/except for logic when there are other approaches, leaving it more for handing unexpected errors or inherently non-deterministic behavior. Also, the whole intent of the function is to sort preferentially, so I think it's best to do that explicitly.
There was a problem hiding this comment.
Would something like this work?
if not matches:
return None
for auth_name in preferred_authorities:
if auth_name in matches:
return auth_name, matches[auth_name][0]
else:
auth_name, codes = max(matches.items(), key=lambda x: x[1][0])
return auth_name, codes[0]There was a problem hiding this comment.
@groutr something like that.
Looking at the code of to_authority() and _matches(), however, I'm no longer sure that the latter returns a data structure that allows the former to return the best match.
GDAL has a method that finds authority matches and returns them in descending confidence order. I think my intention for "preferred authorities" was that we could break ties in one direction in case there were multiple authority/code matches with the same confidence. I had the understanding at the time, and I think I saw it in practice, that the ESRI authority duplicated items in the EPSG database, and that it would be best to return ("EPSG", 00000) in such cases.
However, _matches() loses information about confidence when it collects matches by authority name. If we changed it (it's a "private" method, so we can if we want), we could have it return a list, every item of which could be a list of authority/code matches with the same confidence level. Something like:
[
[("ESRI", 10000), ("EPSG", 20000), ...],
[("FOO", 30000)],
...
]
Then, to_authority() could iterate over these items, return the one that is most preferred, or move on to the next item in the list.
Does that seem okay to you all?
There was a problem hiding this comment.
Preserving the confidence info makes a lot of sense to me. What if _matches() just returned a flat list? I'm not seeing the utility of having the results grouped by confidence level.
# First element is the confidence level (0-100).
[(100, "ESRI", 10000),
(95, "EPSG", 20000),
(80, "ESRI", 10500),
(75, "FOO", 30000),
...
]
We know the output OSRFindMatches is already sorted in descending order, which is useful to preserve. If we wanted to groupby confidence level it becomes easy with:
from itertools import groupby
groupby(matches, key=itemgetter(0))There was a problem hiding this comment.
list_authority would be a good reference for the design. It returns a list of namedtuple that contains the match confidence information. The docstrings show an example of that.
|
Going to close this in favor of a different implementation following the sorting modifications @sgillies suggested above. I am not sure when I will have a free moment, so anyone interested is welcome to move this forward. |
) * Rewrite _matches() to better support to_authority() and to_epsg() Resolves #3239 * Remove list() call and update change log
* Eliminate boundless reads in merge, new compositing implementation (#3234) * Eliminate boundless reads in merge, new compositing implementation Resolves #3228 * Fix nits * Compare non-zero mean of arrays * Implement and use Window.align() Combines the effects of Window.round_offsets and Window.round_lengths, producing the same effect as gdal_merge.py * Remove unused math import * Add more docs about align() Also increase the fraction used in round_offsets() to be consistent with align() * Move align() to a private func, use two existing methods in tests * Add a test for merging WarpedVRTs (#3237) Resolves #3196 * Backport of #3217 (#3243) * Backport of #3217 * Update change log * Increment GDAL and Python versions for CI (#3244) * Rewrite _matches() to better support to_authority() and to_epsg() (#3255) * Rewrite _matches() to better support to_authority() and to_epsg() Resolves #3239 * Remove list() call and update change log * Use to_epsg() in is_epsg_code() (#3258) * Use to_epsg() in is_epsg_code() Resolves #3248 * Update change log * Allow MRF compression to surface in properties (#3259) * Allow MRF compression to surface in properties Resolves #3256 * Update change log * Register drivers at most once per process (#3260) * Register drivers at most once per process Resolves #3250 * Appease flake8 * Add a note about GDAL_SKIP in the change log * Support all GDALFillNodata() options in rasterio.fill (#3265) * Support all GDALFillNodata() options Resolve #3175. * Cast values to str and update docs * Update change log * Prevent rasterio from trying to open a dataset object (#3266) Resolves #3105. * Fix typos discovered by codespell (#3264) (#3267) * Fix typos discovered by codespell * crasher --------- Co-authored-by: Christian Clauss <[email protected]> * Fix erroneous masking of 0-valued data (#3268) * Fix erroneous masking of 0-valued data Resolves #3245 * Add an assertion about data values and update change log * This is 1.4.3 --------- Co-authored-by: Christian Clauss <[email protected]>
Alternative to #3206