Add support for infinite output model fallback #2631
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When a response exceeds its length limit and the model doesn't support assistant prefill, we currently throw an error. This PR adds support for falling back to a dedicated "infinite output" model in such cases.
Changes
--infinite-output-model
CLI argumentinfinite_output_model
support to Model classImpact
This is particularly valuable for users of models with lower output token limits that don't support prefill:
Implementation Notes
The flow is now:
I haven't added any default infinite output model configurations. The current convention is that default models (main/weak/editor) come from the same provider. Since the whole point of infinite output models is to fall back to a different provider when the main one doesn't support it, this would break that convention.
We could add defaults (e.g. falling back to Claude for Gemini users), but I kept this PR focused on just the core mechanism.