ENH: Covariance CPU SPMD support by DDJHB · Pull Request #3507 · uxlfoundation/oneDAL

DDJHB · 2026-02-09T21:57:48Z

Description

This PR adds distributed (SPMD) support for Covariance on CPU.

Checklist:

Completeness and readability

I have commented my code, particularly in hard-to-understand areas.
I have updated the documentation to reflect the changes or created a separate PR with updates and provided its number in the description, if necessary.
Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
I have resolved any merge conflicts that might occur with the base branch.

Testing

I have run it locally and tested the changes extensively.
All CI jobs are green or I have provided justification why they aren't.

No new failures in the CIs.

I have extended testing suite if new functionality was introduced in this PR.

Performance

I have measured performance for affected algorithms using scikit-learn_bench and provided at least a summary table with measured data, if performance change is expected.
I have provided justification why performance and/or quality metrics have changed or why changes are not expected.
I have extended the benchmarking suite and provided a corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.

DDJHB · 2026-02-09T21:58:27Z

/intelci: run

Vika-F · 2026-02-11T10:16:03Z

+    // Simple allreduce of centered crossproducts is incorrect because each
+    // rank uses its local mean. Un-center before allreduce, then re-center
+    // with global statistics after.
+    if (!desc.get_assume_centered()) {


Will it be simpler to use DAAL's Distributed<step2Master> algorithm for aggregation?
https://github.com/uxlfoundation/oneDAL/blob/main/samples/daal/cpp/mpi/sources/covariance_dense_distributed_mpi.cpp#L106

Interesting suggestion, will look into this

DDJHB · 2026-03-29T11:51:25Z

/intelci: run

Fixes the ABI breakage introduced by CPU SPMD algorithms in the PRs #3507, #3585 Make the binding type of the destructors (~compute_input(), ~partial_compute_input(), ~train_input(), etc.) stably "WEAK" and not dependent on the compiler's logic by adding their non-default implementations. Also fixes `set_responses` method in KNN.

Fixes the ABI breakage introduced by CPU SPMD algorithms in the PRs #3507, #3585 Make the binding type of the destructors (~compute_input(), ~partial_compute_input(), ~train_input(), etc.) stably "WEAK" and not dependent on the compiler's logic by adding their non-default implementations. Also fixes `set_responses` method in KNN. (cherry picked from commit 0a5f4cb)

Fixes the ABI breakage introduced by CPU SPMD algorithms in the PRs #3507, #3585 Make the binding type of the destructors (~compute_input(), ~partial_compute_input(), ~train_input(), etc.) stably "WEAK" and not dependent on the compiler's logic by adding their non-default implementations. Also fixes `set_responses` method in KNN. (cherry picked from commit 0a5f4cb) Co-authored-by: Victoriya Fedotova <victoriya.s.fedotova@intel.com>

Vika-F · 2026-04-09T14:30:44Z

/intelci: run

Copilot

Pull request overview

This PR adds CPU distributed (SPMD) execution support for the covariance algorithm in oneDAL’s oneAPI C++ implementation, enabling multi-rank aggregation and unblocking CPU execution of existing SPMD tests and samples.

Changes:

Enable CPU SPMD dispatch for covariance compute operations.
Implement CPU backend SPMD aggregation logic for covariance (partial -> allreduce -> finalize).
Add new distributed covariance samples for MPI and CCL; enable CPU execution of the covariance SPMD integration test.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`cpp/oneapi/dal/algo/covariance/detail/compute_ops.cpp`	Switches covariance CPU dispatch to a universal SPMD-capable kernel dispatcher.
`cpp/oneapi/dal/algo/covariance/backend/cpu/compute_kernel_dense.cpp`	Adds CPU SPMD path: compute local partials, allreduce partial statistics, and finalize on aggregated partials.
`cpp/oneapi/dal/algo/covariance/test/spmd.cpp`	Allows covariance SPMD integration test to run on CPU policies.
`samples/oneapi/cpp/mpi/sources/covariance_distr_mpi.cpp`	New MPI distributed covariance sample using `dal::preview::compute`.
`samples/oneapi/cpp/ccl/sources/covariance_distr_ccl.cpp`	New CCL distributed covariance sample using `dal::preview::compute`.

Copilot · 2026-04-10T09:00:50Z

+static compute_result<Task> call_daal_spmd_kernel(const context_cpu& ctx,
+                                                  const detail::descriptor_base<Task>& desc,
+                                                  const detail::compute_parameters<Task>& params,
+                                                  const table& data) {
+    auto& comm = ctx.get_communicator();
+    const std::int64_t component_count = data.get_column_count();
+
+    // Compute partial results locally on this rank's data
+    partial_compute_input<Task> partial_input(data);
+    auto partial_result =
+        partial_compute_kernel_cpu<Float, method::by_default, Task>{}(ctx, desc, partial_input);


call_daal_spmd_kernel() receives params but never uses it; this means the CPU hyperparameters selected in compute_parameters_cpu (block size, grain size, etc.) are ignored in SPMD mode. Consider passing params through to the DAAL online/finalize calls (instead of default-constructing detail::compute_parameters{} inside partial/finalize kernels), or otherwise document/encode that these parameters are intentionally unused for SPMD.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Copilot · 2026-04-13T11:41:41Z

    compute_parameters<Task> select_parameters(const Policy& ctx,
                                               const descriptor_base<Task>& desc,
                                               const compute_input<Task>& input) const {
-        using kernel_dispatcher_t = dal::backend::kernel_dispatcher<KERNEL_SINGLE_NODE_CPU(
+        using kernel_dispatcher_t = dal::backend::kernel_dispatcher<KERNEL_UNIVERSAL_SPMD_CPU(
            parameters::compute_parameters_cpu<Float, Method, Task>)>;
        return kernel_dispatcher_t{}(ctx, desc, input);


KERNEL_UNIVERSAL_SPMD_CPU dispatcher uses context_cpu{} for host_policy (see cpp/oneapi/dal/backend/dispatcher.hpp:231-233), which drops any non-default settings carried by the passed policy (e.g., enabled/disabled CPU extensions). To preserve single-node behavior while enabling SPMD, consider using a kernel_dispatcher with both KERNEL_SINGLE_NODE_CPU(...) (for host_policy) and KERNEL_UNIVERSAL_SPMD_CPU(...) (for spmd_host_policy), similar to how compute_ops_dpc.cpp mixes kernel specs.

Copilot · 2026-04-13T11:41:41Z

+static compute_result<Task> call_daal_spmd_kernel(const context_cpu& ctx,
+                                                  const detail::descriptor_base<Task>& desc,
+                                                  const detail::compute_parameters<Task>& params,
+                                                  const table& data) {


params is unused in call_daal_spmd_kernel(), and the SPMD path currently ignores covariance CPU hyperparameters (compute_parameters) that are applied in the single-node path via convert_parameters(params). Either thread params through the partial/finalize kernels (so distributed runs honor the same tuning knobs) or explicitly mark/remove the parameter to avoid unused-parameter warnings and clarify that hyperparameters are intentionally not supported in SPMD mode.

Suggested change

const table& data) {

const table& data) {

// SPMD covariance currently does not thread compute_parameters through

// partial/finalize kernels, so keep the argument for interface

// compatibility and mark it as intentionally unused here.

(void)params;

Copilot · 2026-04-13T11:41:42Z

+    comm.allreduce(nobs_ary).wait();
+    comm.allreduce(sums_ary).wait();
+    comm.allreduce(crossproduct_ary).wait();
+


The SPMD implementation performs three separate allreduce() calls with an immediate .wait() after each one. This introduces extra synchronization/latency; consider reducing the number of collectives (e.g., pack nobs, sums, and crossproduct into a single buffer and allreduce once) or at least issuing the collectives first and waiting after to allow potential overlap.

Suggested change

comm.allreduce(nobs_ary).wait();

comm.allreduce(sums_ary).wait();

comm.allreduce(crossproduct_ary).wait();

auto nobs_request = comm.allreduce(nobs_ary);

auto sums_request = comm.allreduce(sums_ary);

auto crossproduct_request = comm.allreduce(crossproduct_ary);

nobs_request.wait();

sums_request.wait();

crossproduct_request.wait();

Vika-F · 2026-04-13T12:22:57Z

/intelci: run

DDJHB and others added 2 commits February 5, 2026 21:36

feat: base cov spmd impl

7e9acb1

Merge branch 'main' into dev_cov_cpu_spmd

29588b5

DDJHB added the enhancement label Feb 9, 2026

Vika-F reviewed Feb 11, 2026

View reviewed changes

feat: cpu spmd cov samples mpi/ccl

09e4749

Vika-F reviewed Feb 13, 2026

View reviewed changes

Comment thread cpp/oneapi/dal/algo/covariance/compute_types.hpp Outdated

DDJHB added 2 commits March 29, 2026 04:22

Merge branch 'main' into dev_cov_cpu_spmd

76206bd

merge with main cleanup

6952625

fix: data path update for ccl cov

55abf06

Vika-F mentioned this pull request Apr 2, 2026

Fix *_input classes destructors visibility #3591

Merged

10 tasks

mergify Bot mentioned this pull request Apr 9, 2026

Fix *_input classes destructors visibility (backport #3591) #3601

Merged

10 tasks

Merge branch 'main' into dev_cov_cpu_spmd

c1f9ecc

Vika-F marked this pull request as ready for review April 10, 2026 08:54

Vika-F requested review from Alexandr-Solovev, avolkov-intel, david-cortes-intel, ethanglaser and icfaust as code owners April 10, 2026 08:54

Copilot AI review requested due to automatic review settings April 10, 2026 08:54

Copilot started reviewing on behalf of Vika-F April 10, 2026 08:54 View session

Vika-F approved these changes Apr 10, 2026

View reviewed changes

Copilot AI reviewed Apr 10, 2026

View reviewed changes

Fix division by zero

69b54a3

Vika-F requested a review from Copilot April 13, 2026 11:36

Copilot started reviewing on behalf of Vika-F April 13, 2026 11:37 View session

Copilot AI reviewed Apr 13, 2026

View reviewed changes

Vika-F merged commit dc0b21d into uxlfoundation:main Apr 14, 2026
29 of 30 checks passed

-                                                  const table& data) {
+                                                  const table& data) {
+    // SPMD covariance currently does not thread compute_parameters through
+    // partial/finalize kernels, so keep the argument for interface
+    // compatibility and mark it as intentionally unused here.
+    (void)params;

-    comm.allreduce(nobs_ary).wait();
-    comm.allreduce(sums_ary).wait();
-    comm.allreduce(crossproduct_ary).wait();
+    auto nobs_request = comm.allreduce(nobs_ary);
+    auto sums_request = comm.allreduce(sums_ary);
+    auto crossproduct_request = comm.allreduce(crossproduct_ary);
+    nobs_request.wait();
+    sums_request.wait();
+    crossproduct_request.wait();

Conversation

DDJHB commented Feb 9, 2026 • edited by Vika-F Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

DDJHB commented Feb 9, 2026

Uh oh!

Vika-F Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

DDJHB Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DDJHB commented Mar 29, 2026

Uh oh!

Vika-F commented Apr 9, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Vika-F commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DDJHB commented Feb 9, 2026 •

edited by Vika-F

Loading