fix(spec): validate schema types by format version#2417
Conversation
|
@CTTY @liurenjie1024 please help take a look. |
|
Hi Manu, in terms of implementation, I think implementing |
ffd534c to
f5201b1
Compare
31327b2 to
13b6d45
Compare
|
@CTTY Please take another look. Thanks! |
13b6d45 to
cb2d9e8
Compare
cb2d9e8 to
7d855c2
Compare
|
I would recommend for future commits, please don't use force push in Iceberg repos. It removes the ability to do a diff on what's changed since last commit. My understanding is that Github will squash commits when merging into |
| } | ||
|
|
||
| #[test] | ||
| fn test_table_metadata_builder_rejects_v2_list_element_requiring_v3() { |
There was a problem hiding this comment.
Thank you for including, I was going to request a nested data type unit test.
| } | ||
|
|
||
| /// Validates that all schema types are supported by the table format version. | ||
| pub fn validate_compatible_with_format_version( |
There was a problem hiding this comment.
It would be nice to have a 4+ layer deep unit test and an integration test with 100+ columns and multiple deeply nested structs to see how this validations affects Table creation times.
There was a problem hiding this comment.
I don't see the point of adding such a test. Even if it costs time for deep nested structs, DDL is not a frequent action in practice, isn't it? Anyway, I don't think this should be part of unit tests.
| let primitive = field_type.as_primitive_type()?; | ||
| let min_format_version = primitive.min_format_version()?; | ||
| (format_version < min_format_version).then(|| { | ||
| let column_name = self.name_by_field_id(field.id).unwrap_or(field.name.as_str()); |
There was a problem hiding this comment.
Is it possible to cause this unwrap_or to throw an error? I believe that would cause a panic and a crash at runtime, but I can't figure out if its possible to pass a field.name that would cause the as_str() to panic.
There was a problem hiding this comment.
I don't see how it can panic. Could you please give an example?
| let min_format_version = primitive.min_format_version()?; | ||
| (format_version < min_format_version).then(|| { |
There was a problem hiding this comment.
Do you need to do a comparison here? I thought the point of implementing the match in primitive.min_format_version() was so it would do the comparison for you.
a7a9fce to
d6d8187
Compare
|
@Kurtiscwright Sorry if it has made your review harder. I've got used to running |
d6d8187 to
448dcb9
Compare
Reject schema types that require a newer table format and update the DataFusion catalog registration path to request v3 only when the converted schema needs it. This keeps ordinary CREATE TABLE defaults on v2 while allowing timestamp_ns schemas to pass validation. Co-authored-by: Codex <codex@openai.com>
Move the table-format comparison behind PrimitiveType so schema validation asks the primitive for the required format version only when the requested format does not support it. Co-authored-by: Codex <codex@openai.com>
448dcb9 to
d7f11a3
Compare
Which issue does this PR close?
timestamp_ns/timestamptz_nstypes #2418.What changes are included in this PR?
Reject schema types that require a newer table format (e.g. v3). This keeps ordinary CREATE TABLE defaults on v2 while allowing timestamp_ns schemas to pass validation.
Are these changes tested?
Add UTs
Co-authored-by: @codex