-
Notifications
You must be signed in to change notification settings - Fork 48
Description
Avro allows attaching custom properties to any type, including primitives.
In avro-rs, primitive schemas like Schema::Long/Schema::Int/etc are modeled as bare enum variants and there is no field to store custom properties. Schema::custom_attributes() also returns None for primitives.
As a result, parsing an object-form primitive schema silently drops all extra properties, and serializing the parsed schema emits just "long", so the metadata cannot be inspected or round-tripped.
reproducer:
use apache_avro::Schema;
use apache_avro_test_helper::TestResult;
use serde_json::json;
#[test]
fn test1() -> TestResult {
let input = json!({
"type": "long",
"custom-prop": "value"
});
let schema = Schema::parse(&input)?;
assert!(matches!(schema, Schema::Long));
let serialized = serde_json::to_string(&schema)?;
assert!(
serialized.contains("custom-prop"),
"Expected serialized schema to include custom property key, but it was dropped. Serialized: {serialized}. Parsed schema: {schema:?}"
);
Ok(())
}output:
Expected serialized schema to include custom property key, but it was dropped. Serialized: "long". Parsed schema: Long
Iceberg encodes types like timestamptz to Avro as a primitive long with additional properties (e.g. {"type":"long","logicalType":"timestamp-micros","adjust-to-utc":true} or similar depending on precision).
Since apache-avro cannot represent custom properties on primitive types, the adjust-to-utc flag is dropped during parsing and the original schema semantics cannot be preserved.