fix: support Java HadoopCatalog's v<n>.metadata.json naming#2537
Open
SreeramGarlapati wants to merge 1 commit into
Open
fix: support Java HadoopCatalog's v<n>.metadata.json naming#2537SreeramGarlapati wants to merge 1 commit into
SreeramGarlapati wants to merge 1 commit into
Conversation
…taLocation Java's HadoopCatalog writes metadata files as v1.metadata.json, v2.metadata.json etc., while iceberg-rust expects <version>-<uuid>.metadata.json. This causes MetadataLocation::from_str to fail when reading tables originally created by Java's HadoopCatalog that were later registered into a proper catalog (REST, Glue, DynamoDB). The fix extends parse_file_name to try the standard format first, then fall back to the v<n> format. The id field becomes Option<Uuid> (None for Hadoop format). Display faithfully reproduces the original format, but after with_next_version() the output always uses iceberg-rust's <version>-<uuid> convention since it generates a new UUID. Closes apache#2533 Co-authored-by: rawataaryan9 <rawataaryan9@users.noreply.github.com>
a9ad50e to
93ed812
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
MetadataLocation::from_strfails when parsing metadata file paths produced by Java's HadoopCatalog (v1.metadata.json,v2.metadata.json, etc.) because it only accepted the<version>-<uuid>.metadata.jsonformat.Scope: what this enables, what it does not
Supported workflow (the bug this fixes): A table is created by Java's HadoopCatalog and then re-registered into a pointer-based catalog (REST, Glue, DynamoDB). iceberg-rust reads metadata via the catalog's current pointer, and on the next commit calls
parse_file_name(...)against the existingv<n>.metadata.jsonpath. Before this fix that call fails; after this fix it succeeds, and the next metadata file written by iceberg-rust uses iceberg-rust's<version>-<uuid>convention (the catalog updates its pointer; filename format is irrelevant for pointer-based lookup).Not supported (explicit non-goal): Committing from iceberg-rust directly against a HadoopCatalog-rooted table. Java's
HadoopTableOperations.findVersion()discovers the current metadata by filesystem scan forv<n>.metadata.json. After iceberg-rust writes<version>-<uuid>.metadata.jsonand updatesversion-hint.text, a subsequent Java HadoopCatalog reader will not recognize the filename pattern. This PR is read + re-registration interop, not a full HadoopCatalog implementation.Changes
parse_file_nameto fall back to thev<n>format when the standard<version>-<uuid>format doesn't matchidfieldOption<Uuid>(Nonefor Hadoop-style locations; field is private so no public API break)Displayfaithfully reproduces the original format for round-trip correctnesswith_next_version(), output always uses iceberg-rust's<version>-<uuid>convention (generates a new UUID)Why this is safe
previous_metadata_locationin table metadata uses the raw string fromtable.metadata_location(), notDisplayof the parsed struct — the original path is always preservedDisplayon a parsedMetadataLocationwithout first callingwith_next_version()(which always produces a UUID)<version>-<uuid>path; filename format is irrelevant for pointer-based discoveryCloses #2533
Test plan
v1.metadata.json,v123.metadata.json,v5.gz.metadata.json— verified in new test casesfrom_str(s).to_string() == sfor both formatswith_next_version()→ produces<version>-<uuid>format