Describe the feature or problem you'd like to solve
Add option to disable streaming responses from the LLM provider
Proposed solution
Add a way to make Copilot CLI send stream:false on its requests to a BYOK/custom OpenAI-compatible provider — e.g. an env var COPILOT_PROVIDER_DISABLE_STREAMING=true (and/or a config field).
Example prompts or workflows
Some self-hosted backends have streaming-only bugs that corrupt tool-call arguments. On our vLLM + Gemma deployment, the streaming tool-call parser duplicates characters at chunk boundaries in large tool-call arguments (file contents in write/edit): <html → <<htmlhtml, <h1 → <<hh1. The identical request with stream:false is clean. Minimal repro — same body, only stream differs:
# stream:false → arguments contain a clean document
# stream:true → reconstructed arguments contain "<<htmlhtml", "<<headhead", ...
curl -s $EP/v1/chat/completions -d '{"model":"...","stream":<bool>,"tools":[{"type":"function","function":{"name":"write_file","parameters":{"type":"object","properties":{"path":{"type":"string"},"content":{"type":"string"}}}}}],"messages":[{"role":"user","content":"Call write_file with index.html and a complete HTML document as content."}]}'
Because the CLI always streams to the provider, every edit comes back corrupted (<<hh1, "No match found" loops), making it unusable against these backends until the upstream fix ships.
Additional context
- Today the only workaround is a local stream-stripping proxy that forces stream:false upstream. A first-class CLI option would remove the need for that.
- Scope can be limited to openai/azure provider types where non-streaming chat/completions is supported.
Describe the feature or problem you'd like to solve
Add option to disable streaming responses from the LLM provider
Proposed solution
Add a way to make Copilot CLI send stream:false on its requests to a BYOK/custom OpenAI-compatible provider — e.g. an env var
COPILOT_PROVIDER_DISABLE_STREAMING=true(and/or a config field).Example prompts or workflows
Some self-hosted backends have streaming-only bugs that corrupt tool-call arguments. On our vLLM + Gemma deployment, the streaming tool-call parser duplicates characters at chunk boundaries in large tool-call arguments (file contents in write/edit):
<html→<<htmlhtml,<h1→<<hh1. The identical request with stream:false is clean. Minimal repro — same body, only stream differs:Because the CLI always streams to the provider, every edit comes back corrupted (<<hh1, "No match found" loops), making it unusable against these backends until the upstream fix ships.
Additional context