Conversation
|
|
||
| if (invalidFields.size() > 0) { | ||
| throw new ValidationException("Invalid Schema, please check fields: " + | ||
| Joiner.on(",").join(invalidFields.iterator())); |
There was a problem hiding this comment.
Nit: missing a space in the separator after ,.
|
I think this is actually easier to implement than the version here. Instead of running a new validator class, I think you can add this validation to the existing private void addField(String name, int fieldId) {
String fullName = name;
if (!fieldNames().isEmpty()) {
fullName = DOT.join(DOT.join(fieldNames().descendingIterator()), name);
}
Integer existingFieldId = nameToId.put(fullName, fieldId);
if (existingFieldId != null) {
throw new ValidationException(
"Invalid schema: multiple fields for name %s: %s and %s", fullName, existingFieldId, fieldId);
}
}Then all you'd need to do is to access |
|
@rdblue, Good idea. Let me update it |
|
@rdblue, sorry I forgot that this visitor is different from For example, when we define a mixed typed schema such as: Schema schema = new Schema(
required(1, "list_of_lists",
ListType.ofOptional(2, ListType.ofOptional(3, StructType.of(
required(4, "id", LongType.get()))
))));It throws exception like: The map returned by |
|
@chenjunjiedada, ignoring the problem when the name is "element" seems to solve that case: Integer existingFieldId = nameToId.put(fullName, fieldId);
if (existingFieldId != null && !"element".equals(name)) {
throw new ValidationException(
"Invalid schema: multiple fields for name %s: %s and %s", fullName, existingFieldId, fieldId);
}I think that's a reasonable fix for now. I think we should also go back and fix how these indexes are produced. The field names are only used in |
|
@chenjunjiedada, I submitted a fix for the indexing problem with the |
|
Sure, will take a look. |
68ad613 to
e9b963b
Compare
| this.aliasToId = aliases != null ? ImmutableBiMap.copyOf(aliases) : null; | ||
|
|
||
| // validate the schema through IndexByName visitor | ||
| TypeUtil.indexByName(struct); |
There was a problem hiding this comment.
As long as indexByName is being called, we may as well capture the result and use it for the name to ID map. Can you update these constructors to call lazyNameToId instead?
There was a problem hiding this comment.
Since lazyNameToId has one more check for uniqueness for value, some unit tests failed due to they don't have unique field ID such as testMixedTypes. And I think the lazy initialization is not lazy if we put it in a constructor since NameToId is always null when constructing. So can we just call IndexByName in constructor? It would be an easy way.
There was a problem hiding this comment.
Can we fix those tests by reassigning new IDs to types? I think that would be a better solution.
There was a problem hiding this comment.
Do you mean update scheme definition in unit tests by reassigning new IDs? For example, there is a constant field SUPPORTED_PRIMITIVES in testMixedTypes that is used by other nested fields, do we need to remove the constant and use raw definition with new IDs? Or do you mean to use reassignIds from TypeUtils?
There was a problem hiding this comment.
No, I mean to use TypeUtil.assignFreshIds.
82633f0 to
e9b963b
Compare
| ))) | ||
| ); | ||
|
|
||
| Schema schema = new Schema(TypeUtil.assignFreshIds(structType, this::assignNewId).asStructType().fields()); |
There was a problem hiding this comment.
You should be able to use (new AtomicInteger(0))::incrementAndGet instead. There is no need to create global state in the test cases.
|
@rdblue , ready for merge? |
…tadata table (apache#4720) (apache#619) Co-authored-by: Prashant Singh <[email protected]>
This fixes #598.