Skip to content

fix: support Java HadoopCatalog's v<n>.metadata.json naming#2537

Open
SreeramGarlapati wants to merge 1 commit into
apache:mainfrom
SreeramGarlapati:fix/metadata-location-java-hadoop-format
Open

fix: support Java HadoopCatalog's v<n>.metadata.json naming#2537
SreeramGarlapati wants to merge 1 commit into
apache:mainfrom
SreeramGarlapati:fix/metadata-location-java-hadoop-format

Conversation

@SreeramGarlapati
Copy link
Copy Markdown
Contributor

@SreeramGarlapati SreeramGarlapati commented May 30, 2026

Summary

MetadataLocation::from_str fails when parsing metadata file paths produced by Java's HadoopCatalog (v1.metadata.json, v2.metadata.json, etc.) because it only accepted the <version>-<uuid>.metadata.json format.

Scope: what this enables, what it does not

Supported workflow (the bug this fixes): A table is created by Java's HadoopCatalog and then re-registered into a pointer-based catalog (REST, Glue, DynamoDB). iceberg-rust reads metadata via the catalog's current pointer, and on the next commit calls parse_file_name(...) against the existing v<n>.metadata.json path. Before this fix that call fails; after this fix it succeeds, and the next metadata file written by iceberg-rust uses iceberg-rust's <version>-<uuid> convention (the catalog updates its pointer; filename format is irrelevant for pointer-based lookup).

Not supported (explicit non-goal): Committing from iceberg-rust directly against a HadoopCatalog-rooted table. Java's HadoopTableOperations.findVersion() discovers the current metadata by filesystem scan for v<n>.metadata.json. After iceberg-rust writes <version>-<uuid>.metadata.json and updates version-hint.text, a subsequent Java HadoopCatalog reader will not recognize the filename pattern. This PR is read + re-registration interop, not a full HadoopCatalog implementation.

Changes

  • Extended parse_file_name to fall back to the v<n> format when the standard <version>-<uuid> format doesn't match
  • Made the id field Option<Uuid> (None for Hadoop-style locations; field is private so no public API break)
  • Display faithfully reproduces the original format for round-trip correctness
  • After with_next_version(), output always uses iceberg-rust's <version>-<uuid> convention (generates a new UUID)

Why this is safe

  • previous_metadata_location in table metadata uses the raw string from table.metadata_location(), not Display of the parsed struct — the original path is always preserved
  • No code path calls Display on a parsed MetadataLocation without first calling with_next_version() (which always produces a UUID)
  • Once iceberg-rust commits via a pointer-based catalog, that catalog stores the new <version>-<uuid> path; filename format is irrelevant for pointer-based discovery

Closes #2533

Test plan

  • Parse v1.metadata.json, v123.metadata.json, v5.gz.metadata.json — verified in new test cases
  • Round-trip: from_str(s).to_string() == s for both formats
  • Version bump: parse Hadoop format → with_next_version() → produces <version>-<uuid> format
  • All 1300+ existing tests pass unchanged
  • Full workspace compiles cleanly

…taLocation

Java's HadoopCatalog writes metadata files as v1.metadata.json, v2.metadata.json
etc., while iceberg-rust expects <version>-<uuid>.metadata.json. This causes
MetadataLocation::from_str to fail when reading tables originally created by
Java's HadoopCatalog that were later registered into a proper catalog (REST,
Glue, DynamoDB).

The fix extends parse_file_name to try the standard format first, then fall back
to the v<n> format. The id field becomes Option<Uuid> (None for Hadoop format).
Display faithfully reproduces the original format, but after with_next_version()
the output always uses iceberg-rust's <version>-<uuid> convention since it
generates a new UUID.

Closes apache#2533

Co-authored-by: rawataaryan9 <rawataaryan9@users.noreply.github.com>
@SreeramGarlapati SreeramGarlapati force-pushed the fix/metadata-location-java-hadoop-format branch from a9ad50e to 93ed812 Compare May 30, 2026 04:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MetadataLocation parser rejects Java HadoopCatalog's v<n>.metadata.json naming

1 participant