gh-146393: Optimize float division operations by mutating uniquely-referenced operands in place (JIT only) by eendebakpt · Pull Request #146397 · python/cpython

eendebakpt · 2026-03-24T22:40:05Z

We optimize float divisions for the case where one of the operands is a unique reference. This is similar to #146307, but with a guard for division by zero.

We do not add opcodes in tier 1
For tier 2 we can specialize for the case when one of the operations is a unique reference and for the case when there are no unique references. The case _BINARY_TRUEDIV_FLOAT where there are no unique references (or we miss information about the uniqueness) has no performance improvement in itself, but is to propagate types better. This opcode has guards, so that even with input from locals the type is propagated.

Micro-benchmarks (min of 3 runs, 2M iterations)

Pattern	main (ns/iter)	branch (ns/iter)	Speedup	Notes
`(a+b) * c`	10.8	10.9	--	baseline (multiply, already optimized)
`(a+b) + (c+d)`	18.0	18.1	--	baseline (add, already optimized)
`a / b`	20.6	10.8	1.9x	speculative guards + truediv specialization
`(a+b) / c`	26.4	11.0	2.4x	inplace LHS, guard inserted for `c`
`(2.0+x) / y`	25.1	10.9	2.3x	inplace LHS, guard inserted for `y`
`c / (a+b)`	26.0	11.2	2.3x	inplace RHS, guard inserted for `c`
`(a/b) / (c/d)`	41.3	19.1	2.2x	speculative guards enable inplace chain
`(a/b) + (c/d)`	29.1	19.0	1.5x	speculative guards enable inplace add

All patterns are total += <expr> in a tight loop.

Benchmark script

"""Benchmark for float true division tier 2 specialization.

Usage:
    ./python bench_truediv.py
"""
import timeit

N = 2_000_000
INNER = 1000


def bench(label, fn):
    iters = N // INNER
    times = [timeit.timeit(fn, number=iters) for _ in range(3)]
    t = min(times)
    print(f"  {label}: {t/N*1e9:.1f} ns/iter")


def f_chain_mul(n, a, b, c):
    t = 0.0
    for i in range(n):
        t += (a + b) * c
    return t


def f_div(n, a, b):
    t = 0.0
    for i in range(n):
        t += a / b
    return t


def f_chain_div(n, a, b, c):
    t = 0.0
    for i in range(n):
        t += (a + b) / c
    return t


def f_2px_div_y(n, x, y):
    t = 0.0
    for i in range(n):
        t += (2.0 + x) / y
    return t


def f_div_rhs(n, a, b, c):
    t = 0.0
    for i in range(n):
        t += c / (a + b)
    return t


def f_ab_div_cd(n, a, b, c, d):
    t = 0.0
    for i in range(n):
        t += (a / b) / (c / d)
    return t


def f_ab_add_cd(n, a, b, c, d):
    t = 0.0
    for i in range(n):
        t += (a / b) + (c / d)
    return t


def f_add_chain(n, a, b, c, d):
    t = 0.0
    for i in range(n):
        t += (a + b) + (c + d)
    return t


# Warmup
f_chain_mul(10000, 2.0, 3.0, 4.0)
f_div(10000, 10.0, 3.0)
f_chain_div(10000, 2.0, 3.0, 4.0)
f_2px_div_y(10000, 3.0, 4.0)
f_div_rhs(10000, 2.0, 3.0, 4.0)
f_ab_div_cd(10000, 10.0, 3.0, 4.0, 5.0)
f_ab_add_cd(10000, 10.0, 3.0, 4.0, 5.0)
f_add_chain(10000, 1.0, 2.0, 3.0, 4.0)

print("Float truediv benchmark (min of 3 runs):")
bench("(a+b) * c              (baseline) ", lambda: f_chain_mul(INNER, 2.0, 3.0, 4.0))
bench("(a+b) + (c+d)          (baseline) ", lambda: f_add_chain(INNER, 1.0, 2.0, 3.0, 4.0))
bench("a / b                  (spec div) ", lambda: f_div(INNER, 10.0, 3.0))
bench("(a+b) / c              (inplace L)", lambda: f_chain_div(INNER, 2.0, 3.0, 4.0))
bench("(2.0+x) / y            (inplace L)", lambda: f_2px_div_y(INNER, 3.0, 4.0))
bench("c / (a+b)              (inplace R)", lambda: f_div_rhs(INNER, 2.0, 3.0, 4.0))
bench("(a/b) / (c/d)          (spec div) ", lambda: f_ab_div_cd(INNER, 10.0, 3.0, 4.0, 5.0))
bench("(a/b) + (c/d)          (spec div) ", lambda: f_ab_add_cd(INNER, 10.0, 3.0, 4.0, 5.0))

Analysis

The inplace truediv kicks in when at least one operand is a uniquely-referenced float (e.g. the result of a prior add/multiply). The optimizer emits _BINARY_OP_TRUEDIV_FLOAT_INPLACE or _INPLACE_RIGHT, saving one PyFloat_FromDouble allocation + deallocation per iteration.

The optimization works well for several cases. For some (e.g. (a/b) + (c/d) ) the performance gain is not due to an inplace division, but by better type propagation allowing the + to be specialized inplace. The a / b is also faster because of better type propagation and a += in the test script.

In typical code intermediate results are often stored in local variables. For these cases it is important pick up (speculative) type information as soon as possible.

Issue: Optimize float division operations by mutating uniquely-referenced operands in place (JIT only) #146393

…izer Add inplace float true division ops that the tier 2 optimizer emits when at least one operand is a known float: - _BINARY_OP_TRUEDIV_FLOAT_INPLACE (unique LHS) - _BINARY_OP_TRUEDIV_FLOAT_INPLACE_RIGHT (unique RHS) The optimizer inserts _GUARD_TOS_FLOAT / _GUARD_NOS_FLOAT for operands not yet known to be float, enabling specialization in expressions like `(a + b) / c`. Also marks the result of all NB_TRUE_DIVIDE operations as unique float in the abstract interpreter, enabling downstream inplace ops even for generic `a / b` (the `+=` can reuse the division result). Speeds up chain division patterns by ~2.3x and simple `total += a/b` by ~1.5x. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Operations that always return a new float (true division, float**int, int**negative_int, mixed int/float arithmetic) now mark their result as PyJitRef_MakeUnique. This enables downstream operations to mutate the result in place instead of allocating a new float. Int results are NOT marked unique because small ints are cached/immortal. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Only set the result of NB_TRUE_DIVIDE to float when both operands are known int/float. Types like Fraction and Decimal override __truediv__ and return non-float results. The unconditional type propagation caused _POP_TOP_FLOAT to be emitted for Fraction results, crashing with an assertion failure. Fixes the segfault in test_math.testRemainder and test_random.test_binomialvariate. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

eendebakpt requested review from Fidget-Spinner, markshannon, savannahostrowski and tomasr8 as code owners March 24, 2026 22:40

bedevere-app bot added the awaiting review label Mar 24, 2026

bedevere-app bot mentioned this pull request Mar 24, 2026

Optimize float division operations by mutating uniquely-referenced operands in place (JIT only) #146393

Open

eendebakpt marked this pull request as draft March 24, 2026 22:56

bedevere-app bot removed the awaiting review label Mar 24, 2026

eendebakpt and others added 5 commits March 25, 2026 00:01

add guards

5c4e3bf

news entry

228bfa9

Merge branch 'main' into jit_float_truediv

8bf12bf

eendebakpt marked this pull request as ready for review March 25, 2026 12:02

bedevere-app bot added the awaiting review label Mar 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gh-146393: Optimize float division operations by mutating uniquely-referenced operands in place (JIT only)#146397

gh-146393: Optimize float division operations by mutating uniquely-referenced operands in place (JIT only)#146397
eendebakpt wants to merge 6 commits intopython:mainfrom
eendebakpt:jit_float_truediv

eendebakpt commented Mar 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

eendebakpt commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Micro-benchmarks (min of 3 runs, 2M iterations)

Analysis

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

eendebakpt commented Mar 24, 2026 •

edited

Loading