[AURON #1985] Optimize native metrics retrieval by passing keys directly#1982
[AURON #1985] Optimize native metrics retrieval by passing keys directly#1982lsm1 wants to merge 2 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR optimizes native SQL metric registration by letting operators request only the specific native metric keys they need, instead of always creating the full default metric set and filtering afterward.
Changes:
- Refactors
NativeHelper.getDefaultNativeMetricsto accept an explicitSet[String]of metric keys and create only those metrics. - Updates native Spark plan operators to pass their required metric keys directly to
getDefaultNativeMetrics. - Updates native file scan metrics construction to use the new keyed API.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeWindowBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeUnionBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeTakeOrderedBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeSortMergeJoinBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeSortBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeShuffledHashJoinBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeProjectBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeParquetInsertIntoHiveTableBase.scala | Passes explicit metric key set to getDefaultNativeMetrics for native write path metrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeLocalLimitBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeGlobalLimitBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeGenerateBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeFilterBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeExpandBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeBroadcastJoinBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeBroadcastExchangeBase.scala | Switches from “all defaults” to an explicit full key set when registering native metrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeAggBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/ConvertToNativeBase.scala | Passes explicit metric key set to getDefaultNativeMetrics. |
| spark-extension/src/main/scala/org/apache/spark/sql/auron/NativeHelper.scala | Introduces keyed metric creation via defaultNativeMetricCreators; updates file scan metric assembly. |
| spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeShuffleExchangeExec.scala | Passes explicit spill/shuffle metric key set to getDefaultNativeMetrics for shuffle write metrics. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
spark-extension/src/main/scala/org/apache/spark/sql/auron/NativeHelper.scala
Outdated
Show resolved
Hide resolved
spark-extension/src/main/scala/org/apache/spark/sql/auron/NativeHelper.scala
Show resolved
Hide resolved
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
892a9d1 to
8825a55
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| override lazy val metrics: Map[String, SQLMetric] = SortedMap[String, SQLMetric]() ++ Map( | ||
| NativeHelper | ||
| .getDefaultNativeMetrics(sparkContext) | ||
| .filterKeys(Set("stage_id", "output_rows", "elapsed_compute")) | ||
| .getDefaultNativeMetrics(sparkContext, Set("stage_id", "output_rows", "elapsed_compute")) | ||
| .toSeq: _*) |
There was a problem hiding this comment.
getDefaultNativeMetrics now returns a Map directly, but this call site still does .toSeq and rebuilds another Map just to merge into a SortedMap. You can reduce allocations (and keep deterministic ordering) by merging the returned map directly (e.g., SortedMap.empty ++ NativeHelper.getDefaultNativeMetrics(...)).
| def getNativeFileScanMetrics(sc: SparkContext): Map[String, SQLMetric] = TreeMap( | ||
| getDefaultNativeMetrics(sc) | ||
| .filterKeys(Set("stage_id", "output_rows", "elapsed_compute")) | ||
| .toSeq ++ getDefaultNativeFileMetrics(sc).toSeq: _*) | ||
| getDefaultNativeMetrics( | ||
| sc, | ||
| Set("stage_id", "output_rows", "elapsed_compute")).toSeq ++ getDefaultNativeFileMetrics( | ||
| sc).toSeq: _*) |
There was a problem hiding this comment.
getNativeFileScanMetrics still materializes getDefaultNativeMetrics(...).toSeq and reconstructs a TreeMap via varargs. Since getDefaultNativeMetrics now returns a map directly, you can avoid the intermediate sequence/varargs by combining maps with ++ (e.g., getDefaultNativeMetrics(...) ++ getDefaultNativeFileMetrics(sc)), preserving sorted output while being a bit lighter.
Which issue does this PR close?
Closes #1985
Rationale for this change
The previous native metrics retrieval fetched all default metrics for every operator. This changes allows specific operators to request only the relevant metric keys, reducing unnecessary overhead.
What changes are included in this PR?
Updated
getDefaultNativeMetricsto accept aSet[String]of metric keys instead of returning a fixed map.Are there any user-facing changes?
No
How was this patch tested?