API, Core: Add geometry and geography types support#12346
API, Core: Add geometry and geography types support#12346rdblue merged 16 commits intoapache:mainfrom
Conversation
b39e32c to
904f503
Compare
904f503 to
896bc19
Compare
|
@rdblue @szehon-ho @flyrain please review when you have time 🙏🏻 |
szehon-ho
left a comment
There was a problem hiding this comment.
Hey , really sorry for delay, let's work on it this week. Some early comments
|
|
||
| private final Geometry geometry; | ||
|
|
||
| public Geography(Geometry geometry) { |
There was a problem hiding this comment.
Why do we need interop between Geography and Geometry? I assume we have just have a class for each.
There was a problem hiding this comment.
We define Geography this way to reuse the data structures of coordinates and geospatial objects (Points, LineString, etc.), as well as WKB/WKT functions provided by JTS. Geography does not interoperate with Geometry, they are different classes but have the same data structure.
There was a problem hiding this comment.
Can we have a base class, and two implementing classes then? its confusing as it is.
There was a problem hiding this comment.
I'll try removing Geometry and Geography from iceberg-api, and use ByteBuffer as the underlying Java class for these types.
| return String.valueOf(value).startsWith((String) literal.value()); | ||
| case NOT_STARTS_WITH: | ||
| return !String.valueOf(value).startsWith((String) literal.value()); | ||
| case ST_INTERSECTS: |
There was a problem hiding this comment.
Lets split out predicate pruning support. I feel 80% of the changes is to support those, let's focus in this pr to get the API right.
There was a problem hiding this comment.
Predicates and pruning were removed from this PR.
896bc19 to
4d1bb83
Compare
build.gradle
Outdated
|
|
||
| dependencies { | ||
| implementation project(path: ':iceberg-bundled-guava', configuration: 'shadow') | ||
| api libs.jts.core |
There was a problem hiding this comment.
I am not comfortable leaking the JTS API from Iceberg API, as this forces all the consumer to now depend on JTS. Can we instead encapsulate this dep in iceberg-core? Going to check with other Iceberg PMC's on this.
At very least this should be implementation.
| if (primitive.typeId() == Type.TypeID.GEOMETRY) { | ||
| Types.GeometryType geometryType = (Types.GeometryType) primitive; | ||
| generator.writeStartObject(); | ||
| generator.writeStringField("type", "geometry"); |
There was a problem hiding this comment.
Nit: use the constant for "type"
There was a problem hiding this comment.
Replaced constant literals such as "type", "geometry", "geography", "crs" and "algorithm" with constants.
|
|
||
| static void toJson(Type.PrimitiveType primitive, JsonGenerator generator) throws IOException { | ||
| generator.writeString(primitive.toString()); | ||
| if (primitive.typeId() == Type.TypeID.GEOMETRY) { |
There was a problem hiding this comment.
Nit: use switch statement like rest of the code.
There was a problem hiding this comment.
Refactored to use switch-case.
| generator.writeStringField("type", "geometry"); | ||
| String crs = geometryType.crs(); | ||
| if (!crs.isEmpty()) { | ||
| generator.writeStringField("crs", geometryType.crs()); |
There was a problem hiding this comment.
Nit: "crs" seems used frequently enough to warrant a constant.
| return TYPES.get(lowerTypeString); | ||
| } | ||
|
|
||
| if (lowerTypeString.startsWith("geometry")) { |
There was a problem hiding this comment.
Why cant we put the startsWith in the regex, like DECIMAL and FIXED
There was a problem hiding this comment.
We cannot match the regex with lowerTypeString, because we want to extract the original CRS from the type instead of a lower case one. I'll see how to make it more concise and get rid of startsWith.
There was a problem hiding this comment.
So we consider CRS to be case sensitive?
The capture group of a case insensitive regular expression returns the matched characters without modification. So if CRS is case sensitive, you can still use a regular expression, like this:
Pattern pattern = Pattern.compile("geometry\\(([^)]*)\\)", Pattern.CASE_INSENSITIVE);This test will pass:
Matcher m = pattern.matcher("GEOMETRY(aBc)");
assertThat(m.matches()).isTrue();
assertThat(m.group(1)).isEqualTo("aBc");I'd prefer using a case insensitive pattern on typeString like this, rather than startsWith and substring with a hard-coded length.
There was a problem hiding this comment.
Switched to using case insensitive regex pattern matching.
| return new Literals.GeometryLiteral(value); | ||
| } | ||
|
|
||
| static Literal<Geography> of(Geography value) { |
There was a problem hiding this comment.
I guess here, let's specifically not leak the Geography in the API, we should just have a wrapper class as well.
There was a problem hiding this comment.
I'll remove GeometryLiteral ad GeographyLiteral. The iceberg expressions should only work with bounding boxes, so there's no need to involve full-fledged Geometry and Geography objects.
|
|
||
| private final Geometry geometry; | ||
|
|
||
| public Geography(Geometry geometry) { |
There was a problem hiding this comment.
Can we have a base class, and two implementing classes then? its confusing as it is.
| } | ||
|
|
||
| public static GeographyType of(String crs) { | ||
| return of(crs, DEFAULT_ALGORITHM); |
There was a problem hiding this comment.
another option is to allow null for algorithm right? and the code can default it , as per the spec?
There was a problem hiding this comment.
Fixed. The algorithm field is null when no algorithm is specified.
4d1bb83 to
73f6022
Compare
…ry and geography types to ByteBuffer
|
I have removed dependency to JTS from iceberg-api, now the underlying Java type of Geometry and Geography are I have removed code for converting between ByteBuffer and geospatial types, removed GeometryLiteral/GeographyLiteral, as well as default initial/write value support for geospatial fields. I'll proceed adding them once we are all agree on the API of primitive geospatial types. |
| class Identity<T> implements Transform<T, T> { | ||
| private static final Identity<?> INSTANCE = new Identity<>(); | ||
|
|
||
| private static final Set<Type.TypeID> UNSUPPORTED_TYPES = |
| } | ||
|
|
||
| public static GeometryType of(String crs) { | ||
| return new GeometryType(crs == null ? "" : crs); |
There was a problem hiding this comment.
(nit) Maybe crs == null can be crs == null || crs.isEmpty()
There was a problem hiding this comment.
I believed that including crs.isEmpty() was redundant in this situation, so I didn't add it, as we would return an empty string regardless.
There was a problem hiding this comment.
The empty string should not be allowed. That is an invalid CRS because it has no information. Null is fine to support here if you expect to have situations in which callers will find it convenient to pass null to mean "use the default".
| // We need to set outputDimension = 4 for XYM geometries to make JTS WKTWriter or WKBWriter work | ||
| // correctly. | ||
| // The WKB/WKT writers will ignore Z ordinate for XYM geometries. | ||
| if (!Double.isNaN(coordinate.getZ())) { |
There was a problem hiding this comment.
Perhaps we could do (not a must).
if (Coordinate.NULL_ORDINATE != coordinate.getZ())There was a problem hiding this comment.
I'm afraid that we cannot do this. Coordinate.NULL_ORDINATE is NaN, comparing it with other floating point number (including NaN) using != will always yield true.
| import org.locationtech.jts.geom.Geometry; | ||
| import org.locationtech.jts.geom.GeometryFactory; | ||
|
|
||
| public class TestGeometryUtil { |
There was a problem hiding this comment.
It seems like you covered Coordinate and its subclasses except ExtendedCoordinate, am I right?
There was a problem hiding this comment.
Yes. As far as I know, ExtendedCoordinate is part of JTS example code so we don't need to handle it. JTS will only be used internally in iceberg and it exchanges geospatial data with other systems through WKB/Bounds byte buffer, so there's no need to take non-standard extensions of Coordinate into consideration.
There was a problem hiding this comment.
Thanks for the explanation.
szehon-ho
left a comment
There was a problem hiding this comment.
Thanks @Kontinuation for removing the api dependency on jts from iceberg-api, this looks a lot closer to me
| } | ||
|
|
||
| public static EdgeInterpolationAlgorithm fromName(String algorithmName) { | ||
| try { |
There was a problem hiding this comment.
Precondition, check null and throw exception?
There was a problem hiding this comment.
Added precondition with error message: Invalid edge interpolation algorithm: null.
| return EdgeInterpolationAlgorithm.valueOf(algorithmName.toUpperCase(Locale.ENGLISH)); | ||
| } catch (IllegalArgumentException e) { | ||
| throw new IllegalArgumentException( | ||
| String.format("Invalid edge interpolation algorithm name: %s", algorithmName), e); |
There was a problem hiding this comment.
Nit: can remove redundant 'name'
There was a problem hiding this comment.
Fixed. Now the error message is Invalid edge interpolation algorithm: %s
| } | ||
|
|
||
| public static GeometryType get() { | ||
| return of(""); |
There was a problem hiding this comment.
why not , new GeometryType("")
There was a problem hiding this comment.
Fixed. Now we are using null as default value as requested. For now it is new GeometryType(null).
| } | ||
|
|
||
| public static GeographyType get() { | ||
| return of(""); |
There was a problem hiding this comment.
Fixed. Now it is new GeographyType(null, null).
| .isThrownBy(() -> required("field").ofType(Types.StringType.get()).build()) | ||
| .withMessage("Id cannot be null"); | ||
|
|
||
| assertThat(Types.fromPrimitiveString("geometry")).isEqualTo(Types.GeometryType.get()); |
There was a problem hiding this comment.
this should be in its own test
There was a problem hiding this comment.
In this suite, these cases would be added to fromPrimitiveString and fromTypeName.
There was a problem hiding this comment.
Moved tests to fromTypeName and fromPrimitiveString.
| } | ||
|
|
||
| private static Types.GeometryType geometryFromJson(JsonNode json) { | ||
| String crs = JsonUtil.getStringOrNull("crs", json); |
There was a problem hiding this comment.
nit: these (and following) can be replaced by the constants?
There was a problem hiding this comment.
This is reverted since we are not encoding geospatial types as JSON.
| private final String crs; | ||
|
|
||
| private GeometryType(String crs) { | ||
| this.crs = crs; |
There was a problem hiding this comment.
Precondition that crs is not null?
There was a problem hiding this comment.
I think it would be easier to allow null to indicate the default. That's easier to detect for toString to show geometry instead of geometry().
There was a problem hiding this comment.
We allow null and not allow empty string for CRS. Additionally we check that the CRS should not have leading or trailing spaces to make sure that the value of CRS will be persisted after converting to string and parsing it back.
There was a problem hiding this comment.
Curious, why not always have toString show the default value like Geography? Geometry("OGC:CRS84"). Is it to save some bytes ? Seems to be easier imo.
There was a problem hiding this comment.
Geography does not always show the default CRS. It happens when the edge algorithm is not using default value but CRS is left unset. We have to fill in the default CRS as the first argument in this case.
I thought that the behavior of toString is to omit optional parameters when possible. See https://github.com/apache/iceberg/pull/12346/files/0c451cac53f4098680aa0ae518af70dc45efaeb1#r1994430054.
| } | ||
|
|
||
| private static int getOutputDimension(Geometry geom) { | ||
| int dimension = 2; |
There was a problem hiding this comment.
nit: make a constant to avoid instantiating it every time?
There was a problem hiding this comment.
This file is removed as we don't need WKB/WKT serialization according to the latest spec.
|
Also, (as can't comment on files that are not in the change) Do we need to add Geo types to following places?
|
core/src/test/java/org/apache/iceberg/catalog/CatalogTests.java
Outdated
Show resolved
Hide resolved
| case FIXED: | ||
| case BINARY: | ||
| case GEOMETRY: | ||
| case GEOGRAPHY: |
There was a problem hiding this comment.
Why is this included? I don't think that we want to translate these into Hive types.
There was a problem hiding this comment.
I've reverted this change. Does it mean that Hive catalog won't support geospatial columns?
There was a problem hiding this comment.
When this is needed, we can add it. But there's no value in translating to binary right now.
| Matcher geometry = GEOMETRY_PARAMETERS.matcher(typeString); | ||
| if (geometry.matches()) { | ||
| String crs = geometry.group(1); | ||
| Preconditions.checkArgument(!crs.contains(","), "Invalid CRS: %s", crs); |
There was a problem hiding this comment.
Why is this invalid? This should pass the string as-is. It is up to others to determine what is a valid CRS.
There was a problem hiding this comment.
Comma in CRS will disrupt the parsing of geography types such as geography(crs,with,comma, spherical), so I treat crs containing comma as invalid to avoid ambiguity.
There was a problem hiding this comment.
I'll remove this check and pass the CRS string as-is, since it won't cause ambiguity for geometry types.
| } | ||
|
|
||
| public String crs() { | ||
| return crs; |
There was a problem hiding this comment.
If CRS is null, should this return DEFAULT_CRS?
There was a problem hiding this comment.
I think it is better to return DEFAULT_CRS, so that the callers don't have to handle both nulls and default CRS.
We can make GeographyType.algorithm return EdgeAlgorithm.SPHERICAL when algorithm is null as well.
|
Looks great! There are some minor nits, but I think it makes sense to commit this to move forward. Thanks @Kontinuation and @szehon-ho! |
|
Thanks , great work @Kontinuation , and thanks @rdblue ! |
This adds 2 primitive types to
iceberg-apiandiceberg-corefor supporting geospatial data types, partially implementing the iceberg geo spec: #10981The newly added primitive types are:
geometry(C): geometries with linear/planar edge interpolationgeography(C, A): geometries with non-linear edge interpolation, the algorithm for interpolating edges is parameterized byA.