-
Notifications
You must be signed in to change notification settings - Fork 1
[BOT ISSUE] OpenAI: chat completions streaming discards logprobs from chunks #180
Description
Summary
When chat.completions.create(logprobs=True, stream=True) is used, the logprobs data present in each streaming chunk is discarded. The span output hardcodes "logprobs": None instead of accumulating the per-token log probabilities across chunks.
Non-streaming calls correctly capture logprobs because the full choices dict is logged directly (line 204 of oai.py). Streaming calls lose this data because _postprocess_streaming_results (lines 288–356) only accumulates content, tool_calls, role, and finish_reason from deltas, and then hardcodes logprobs to None at line 353.
What is missing
Each ChatCompletionChunk.choices[0].logprobs contains a ChoiceLogprobs object with:
content: list of per-token log probabilities and top alternativesrefusal: list of per-token log probabilities for refusal tokens
These are available in every streaming chunk but never read by the wrapper. The assembled span output sets:
"logprobs": None, # line 353 — hardcodedUsers who set logprobs=True for confidence scoring, model evaluation, or token-level analysis get correct data in non-streaming mode but silently lose it when switching to stream=True.
Braintrust docs status
not_found — The OpenAI integration docs document wrap_openai() and streaming token usage, but do not mention logprobs capture.
Upstream sources
- OpenAI Python SDK type:
ChatCompletionChunk.choices[0].logprobs→ChoiceLogprobs(source) - OpenAI API reference:
logprobsparameter in chat completions
Local files inspected
py/src/braintrust/oai.py:_postprocess_streaming_results(lines 288–356): accumulatescontentandtool_callsfrom deltas but ignoreslogprobson eachChoice; hardcodes outputlogprobstoNoneat line 353- Non-streaming path (line 204): logs
log_response["choices"]which includes the fulllogprobsobject — correct
py/src/braintrust/wrappers/test_openai.py: no test forlogprobs=Truewithstream=True