GH-838: [FlightSQL][JDBC] Fix timestamp unit conversion and sub-millisecond precision in PreparedStatement parameter binding#1081
Conversation
This comment has been minimized.
This comment has been minimized.
…aredStatement parameter binding
c43b2e8 to
6d63367
Compare
…ecision in PreparedStatement parameter binding Intercept raw java.sql.Timestamp objects at the JDBC layer before Avatica serializes them to epoch milliseconds (losing sub-ms nanos). Propagate the original Timestamp through ArrowFlightMetaImpl and AvaticaParameterBinder to TimestampAvaticaParameterConverter, which reconstructs full-precision epoch values from Timestamp.getTime() + Timestamp.getNanos(). Add unit tests for micro/nano/milli/second precision with raw Timestamp, and integration tests exercising the full setTimestamp() path through the mock FlightSQL server.
jbonofre
left a comment
There was a problem hiding this comment.
Good one ! Thanks @dmitry-chirkov-dremio !
I just have a suggestion for the convertFromMillis method.
...ain/java/org/apache/arrow/driver/jdbc/converter/impl/TimestampAvaticaParameterConverter.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
overall LGTM. I'd probably just add a few tests for edge cases. Examples:
- Change the Timestamp object after setting the parameter. If we're keeping a reference to the object, is it possible to change it after the bind?
- Setting timestamp with null on a string with multiple parameters
- Test setObject
- Use setObject for a timestamp parameter, before/after setTimestamp
|
I think there's still a risk of having stale parameters. The final query should be The issue is mainly with the approach taken. Having a map of index to timestamp means that there are other scenarios where the index can be updated but the map isn't unless we override all the setters functions to keep it up-to-date. I'm not sure what is the best approach here. |
c65e38c to
25b1ec5
Compare
|
Putting this back to draft as @xborder and I are looking into synchronization/staleness issues introduced by this approach. |
Rationale for this change
The FlightSQL JDBC driver's
PreparedStatement.setTimestamp()has two bugs (#838):Unit scaling bug:
TimestampAvaticaParameterConverterwrites raw epoch millisecond values directly into Arrow timestamp vectors regardless of the vector's time unit. This causes data corruption when the server uses SECOND, MICROSECOND, or NANOSECOND precision — the millisecond value is misinterpreted as the target unit (e.g., epoch millis treated as epoch micros → dates near 1970).Precision loss bug: The underlying Avatica library serializes
java.sql.Timestampinto along(epoch milliseconds) viaTypedValue.ofJdbc(), discarding the sub-millisecond nanoseconds fromTimestamp.getNanos()before they reach the Arrow converters. This means even with correct unit scaling, microsecond and nanosecond precision is permanently lost.What changes are included in this PR?
Fix 1 — Unit scaling (converter layer):
TimestampAvaticaParameterConverterto store theArrowType.Timestampand scale the epoch millisecond value to the target unit before writing to the vector:Math.floorDivFix 2 — Sub-millisecond precision preservation (JDBC interception layer):
ArrowFlightPreparedStatement: OverridessetTimestamp()to capture the rawjava.sql.Timestampobjects (with full nanosecond precision) before Avatica serializes them to millis. Also overridesclearParameters()to clean up.ArrowFlightMetaImpl: AddedgetRawTimestamps()to bridge the captured timestamps from the statement to the parameter binder.AvaticaParameterBinder: Updatedbind()andBinderVisitorto propagate the optional rawTimestampto converters.TimestampAvaticaParameterConverter: AddedbindParameter(vector, typedValue, index, rawTimestamp)overload andconvertFromTimestamp(Timestamp)that reconstructs full-precision epoch values fromTimestamp.getTime()(epoch seconds) +Timestamp.getNanos()(fractional second in nanos), avoiding the Avatica millis truncation.ListAvaticaParameterConverter,LargeListAvaticaParameterConverter,FixedSizeListAvaticaParameterConverter: Updated to passnullas the raw timestamp argument to the newBinderVisitorconstructor.Fix 3 — Stale raw timestamp handling (overwrite edge cases):
ArrowFlightPreparedStatement: Keeps the side-channelrawTimestampsmap in sync forsetObject(...)overwrites by clearing the affected entry.TimestampAvaticaParameterConverter: Only uses arawTimestampwhen the incomingTypedValue.typeis stillJAVA_SQL_TIMESTAMP, so stale side-channel entries are ignored if a later setter such assetLong(...)overwrites the parameter.Are these changes tested?
Yes.
TimestampAvaticaParameterConverterTest): cover all Arrow timestamp vector units (with and without timezone) for millis-based scaling, the rawTimestamppath for micro/nano precision preservation and milli/second truncation, null-rawTimestampfallback, negative epoch handling, and the stale-map case where a later non-timestamp setter must ignore a stale raw timestamp.ArrowFlightPreparedStatementTest): cover the fullsetTimestamp()→ interception → binding → mock server validation path, plus overwrite scenarios includingsetObject(...)aftersetTimestamp(...),setLong(...)aftersetTimestamp(...), andsetTimestamp(...)aftersetObject(...).Are there any user-facing changes?
This PR includes breaking changes to public APIs.
PreparedStatement#setTimestamp()now correctly converts timestamp values for non-millisecond precision columns. Previously, timestamps inserted via the FlightSQL JDBC driver into SECOND, MICROSECOND, or NANOSECOND columns were silently corrupted. Sub-millisecond precision (microseconds, nanoseconds) is now preserved when the target column supports it.This PR contains a "Critical Fix". The bug caused incorrect data to be produced — epoch milliseconds were written as-is into microsecond/nanosecond vectors, resulting in wildly wrong dates (e.g., 2024 → 1970). Additionally, sub-millisecond precision was silently discarded.
Closes #838