Skip to content

feat: add model capability preset picker for OpenAI Compatible provider#11902

Draft
roomote-v0[bot] wants to merge 6 commits intomainfrom
feature/openai-compatible-model-capability-presets
Draft

feat: add model capability preset picker for OpenAI Compatible provider#11902
roomote-v0[bot] wants to merge 6 commits intomainfrom
feature/openai-compatible-model-capability-presets

Conversation

@roomote-v0
Copy link
Contributor

@roomote-v0 roomote-v0 bot commented Mar 9, 2026

Related GitHub Issue

Closes: #11674

Description

This PR attempts to address Issue #11674 by adding a Model Capability Preset dropdown to the OpenAI Compatible provider settings.

When users select the OpenAI Compatible provider, they now see a searchable dropdown that lists all known models across every provider Roo supports. Selecting a model automatically populates the capability fields (context window, max tokens, image support, prompt caching, pricing, etc.) so users running local models via llama.cpp, exo, NVIDIA NIM, or similar tools get the same model-specific behavior as native provider users.

Key implementation details:

  • New modelCapabilityPresets aggregation in packages/types that collects model definitions from Anthropic, OpenAI, DeepSeek, Gemini, MiniMax, Mistral, Moonshot/Kimi, Qwen, SambaNova, xAI, and ZAi/GLM
  • Searchable combobox UI in OpenAICompatible.tsx using the existing Command/Popover components, grouped by provider
  • "Custom (configure manually)" option preserves the current behavior for models not in the preset list
  • Users can still override any auto-populated field after selecting a preset

Trade-offs:

  • Cloud-only routing providers (OpenRouter, Requesty, etc.) and platform-locked providers (Bedrock, Vertex, etc.) are excluded since their model IDs do not map to local inference
  • Pricing fields are populated from the preset but may not be relevant for local inference -- users can clear them

Test Procedure

  • Unit tests added for modelCapabilityPresets data (6 tests in packages/types)
  • Unit tests added for preset picker UI rendering (4 tests in webview-ui)
  • All existing tests continue to pass
  • Run: cd packages/types && npx vitest run and cd webview-ui && npx vitest run src/components/settings/providers/__tests__/OpenAICompatible.spec.tsx

Pre-Submission Checklist

  • Issue Linked: This PR is linked to an approved GitHub Issue
  • Scope: Changes are focused on the linked issue
  • Self-Review: Performed a self-review of the code
  • Testing: New tests added covering the changes
  • Documentation Impact: No documentation updates required
  • Contribution Guidelines: Read and agree to the Contributor Guidelines

Documentation Updates

  • No documentation updates are required.

Additional Notes

Feedback and guidance are welcome.

Interactively review PR in Roo Code Cloud

roomote added 2 commits March 9, 2026 18:49
Adds a searchable dropdown to the OpenAI Compatible provider settings
that lets users select from all known model capabilities across every
provider Roo supports (Anthropic, OpenAI, DeepSeek, Gemini, MiniMax,
Mistral, Moonshot/Kimi, Qwen, SambaNova, xAI, ZAi/GLM).

When a preset is selected, the model capability fields (context window,
max tokens, image support, prompt caching, pricing, etc.) are
automatically populated. Users can still choose "Custom" to configure
everything manually as before.

Changes:
- packages/types: new all-model-capabilities.ts aggregating presets
- webview-ui: preset picker dropdown in OpenAICompatible.tsx
- i18n: English translation keys for the new UI
- Tests for both the preset data and the UI component

Addresses #11674
@bozoweed
Copy link

bozoweed commented Mar 9, 2026

@roomote please ensure all translations are correctly defined thx

@roomote-v0
Copy link
Contributor Author

roomote-v0 bot commented Mar 9, 2026

Fixaroo Clock   See task

Added capabilityPreset translations to all 17 non-EN locales (ca, de, es, fr, hi, id, it, ja, ko, nl, pl, pt-BR, ru, tr, vi, zh-CN, zh-TW). All JSON files validated, missing translations script reports zero gaps, and all existing tests pass.

View commit | Revert commit

@bozoweed
Copy link

appear to work so fine now <3 please merge that or do something like that , i can finnaly use kimi k2.5 on NVIDIA NIM API that was so usefull and closeing issue about wrong model usage on OpenAI compatible endpoint

what this PR solve ?

  • issue with kimi k2.5 that not supporte thinking correctly
  • issue with deepseek v3.2 with thinking enabled
  • all issue about wrong model usage during LLM auto turn

why should we merge that ?
@hannesrudolph told me that you should have fix that issue from few days ago ( intial Issue about nvidia api usage on roocode) and open new issue if still failed, so because of that i have opend new issue because it's not solved at all roocode still not use models correctly, but with that change all seem work like charmed. i don't know what you think about that changes but i can guaranty that is working perfectly now on NVIDIA i have build that PR and currently using it on my vs_code

thx by advance <3

@bozoweed
Copy link

forget that .... appear that model still locked in thinking loop .... like kimi k2.5 can made loop with only thinking because it's tinternal thinking loop like official documentation explain.... but roocode wont let model only think , roo obsolulty whant content or tool_call that is not align with kimi k2.5 usage ..... @roomote

@roomote-v0
Copy link
Contributor Author

roomote-v0 bot commented Mar 10, 2026

Fixaroo Clock   See task

Added preserveReasoning: true to the kimi-k2.5 model definition in moonshot.ts. This ensures that when Kimi K2.5 is selected via the preset picker, Roo Code properly handles the model's internal thinking/reasoning tokens instead of requiring content or tool_call responses. All lint, type checks, and tests passed.

View commit | Revert commit

@bozoweed
Copy link

no you don't get it ..... i get "[ERROR] You did not use a tool in your previous response! ...." error message , that mean case "error": at ChatRow.js line 1290 have a really bad condition, causing error on internal thinking loop .... @roomote

@roomote-v0
Copy link
Contributor Author

roomote-v0 bot commented Mar 10, 2026

Fixaroo Clock   See task

Hi @bozoweed -- I investigated the case "error": handling in ChatRow.tsx (line 1290). The code there correctly checks for MODEL_NO_TOOLS_USED and MODEL_NO_ASSISTANT_MESSAGES marker strings and renders proper, translated error messages via the ErrorRow component. There's also a fallback for generic errors.

The [ERROR] You did not use a tool in your previous response! message you're seeing is feedback injected into the conversation by the agent loop on the backend -- it's sent to the model to prompt it to use a tool. ChatRow is just displaying it. This isn't a bug in ChatRow's condition handling, and it's unrelated to this PR's changes (which add model capability presets for OpenAI Compatible provider).

If you're consistently hitting this error with a specific model, it's likely a model behavior issue (the model isn't calling tools when expected). I'd suggest opening a separate issue with details about which model/provider you're using so the team can investigate the agent loop behavior.

@bozoweed
Copy link

@roomote thanks for advice but when i did that no one want work on that every one say the same sh*te as you just sayed ..... BUT you was wrong , nvidia nim api have replicate the same way to use models as there original provider, for example kimik k2.5 on nvidia nim api is working EXACTLY like monnshot provider ( just endpoint is openAi like but all working background is SAME) so yes it's related to OpenAi compatible and yes you have to fix that because i sayed that for kimi k2.5 but appear to be same for any other open model like deepeek ones etc , so please use official model usage instead try to break it by changing parsing or attempt .... kimi k2.5 , kimi k2 thinking , etc has there specific use case that is not the same as all other basic model , so yes you have to do something on OpenAi compatible because many other provider support calling model with specific use case provided throu OpenAi end point ..... please read the doc before answer

@roomote-v0
Copy link
Contributor Author

roomote-v0 bot commented Mar 10, 2026

Fixaroo Clock   See task

Fixed the reported issues. When selecting a preset with preserveReasoning (e.g. Kimi K2.5, Kimi K2 thinking), the R1 format is now auto-enabled so reasoning/thinking blocks work correctly via OpenAI Compatible endpoints. Also added a capability flags indicator and new tests. All local checks passed.

View commit | Revert commit

…t picker

When selecting a model preset with preserveReasoning (e.g. Kimi K2.5,
Kimi K2 thinking), the R1 format setting is now automatically enabled
so reasoning/thinking blocks work correctly via OpenAI Compatible endpoints.

Also adds visual indicator showing applied capability flags after
selecting a preset, and new tests covering the auto-enable behavior.
@bozoweed
Copy link

@roomote i'm not sure that is the real issue , we allready can enable R1 using checkbox right ? please read the doc buddy

Model Usage

Once the local inference service is up, you can interact with it through the chat endpoint:

def simple_chat(client: openai.OpenAI, model_name: str):
messages = [
{"role": "system", "content": "You are Kimi, an AI assistant created by Moonshot AI."},
{"role": "user", "content": [{"type": "text", "text": "which one is bigger, 9.11 or 9.9? think carefully."}]},
]
response = client.chat.completions.create(
model=model_name,
messages=messages,
stream=False,
temperature=1.0,
max_tokens=4096
)
print(f"k2 answer: {response.choices[0].message.content}")
print("=====below is reasoning content======")
print(f"reasoning content: {response.choices[0].message.reasoning_content}")

NOTE

The recommended temperature for Kimi K2 Thinking is temperature = 1.0. If no special instructions are required, the system prompt above is a good default.
Tool Calling

Kimi K2 Thinking has the same tool calling settings as Kimi K2 Instruct. To enable them, you need to pass the list of available tools in each request, then the model will autonomously decide when and how to invoke them.

The following example demonstrates calling a weather tool end-to-end:

Your tool implementation

def get_weather(city: str) -> dict:
return {"weather": "Sunny"}

Tool schema definition

tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Retrieve current weather information. Call this when the user asks about the weather.",
"parameters": {
"type": "object",
"required": ["city"],
"properties": {
"city": {
"type": "string",
"description": "Name of the city"
}
}
}
}
}]

Map tool names to their implementations

tool_map = {
"get_weather": get_weather
}

def tool_call_with_client(client: OpenAI, model_name: str):
messages = [
{"role": "system", "content": "You are Kimi, an AI assistant created by Moonshot AI."},
{"role": "user", "content": "What's the weather like in Beijing today? Use the tool to check."}
]
finish_reason = None
while finish_reason is None or finish_reason == "tool_calls":
completion = client.chat.completions.create(
model=model_name,
messages=messages,
temperature=1.0,
tools=tools, # tool list defined above
tool_choice="auto"
)
choice = completion.choices[0]
finish_reason = choice.finish_reason
if finish_reason == "tool_calls":
messages.append(choice.message)
for tool_call in choice.message.tool_calls:
tool_call_name = tool_call.function.name
tool_call_arguments = json.loads(tool_call.function.arguments)
tool_function = tool_map[tool_call_name]
tool_result = tool_function(**tool_call_arguments)
print("tool_result:", tool_result)

            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "name": tool_call_name,
                "content": json.dumps(tool_result)
            })
print("-" * 100)
print(choice.message.content)

The tool_call_with_client function implements the pipeline from user query to tool execution. This pipeline requires the inference engine to support Kimi K2 Thinking's native tool-parsing logic. For more information, see the Tool Calling Guide.

Docs
Getting Started Guide
Using Thinking Models
Using thinking models

Both the kimi-k2-thinking and kimi-k2.5 models have powerful thinking capabilities, supporting deep reasoning and multi-step tool use to solve complex problems.

    kimi-k2-thinking: A dedicated thinking model with thinking forcibly enabled
    [Recommended] kimi-k2.5: A model that can enable or disable thinking capability, enabled by default. You can disable thinking by using {"type": "disabled"}

If you are doing benchmark testing with kimi api, please refer to this benchmark best practice.

You can simply use it by switching the model parameter:

$ curl https://api.moonshot.ai/v1/chat/completions
-H "Content-Type: application/json"
-H "Authorization: Bearer $MOONSHOT_API_KEY"
-d '{
"model": "kimi-k2-thinking",
"messages": [
{
"role": "system",
"content": "You are Kimi."
},
{
"role": "user",
"content": "Please explain why 1+1=2."
}
],
"temperature": 1.0
}'

For the kimi-k2.5 model, thinking is enabled by default, no need to manually specify it:

$ curl https://api.moonshot.ai/v1/chat/completions
-H "Content-Type: application/json"
-H "Authorization: Bearer $MOONSHOT_API_KEY"
-d '{
"model": "kimi-k2.5",
"messages": [
{
"role": "system",
"content": "You are Kimi."
},
{
"role": "user",
"content": "Please explain why 1+1=2."
}
]
}'

Please refer to Disable Thinking Capability Example

In the API response for kimi-k2-thinking or kimi-k2.5 (with thinking enabled) models, we use the reasoning_content field as the carrier for the model's reasoning. About the reasoning_content field:

In the OpenAI SDK, ChoiceDelta and ChatCompletionMessage types do not provide a reasoning_content field directly, so you cannot access it via .reasoning_content. You must use hasattr(obj, "reasoning_content") to check if the field exists, and if so, use getattr(obj, "reasoning_content") to retrieve its value.
If you use other frameworks or directly interface with the HTTP API, you can directly obtain the reasoning_content field at the same level as the content field.
In streaming output (stream=True), the reasoning_content field will always appear before the content field. In your business logic, you can detect if the content field has been output to determine if the reasoning (inference process) is finished.
Tokens in reasoning_content are also controlled by the max_tokens parameter: the sum of tokens in reasoning_content and content must be less than or equal to max_tokens.

Both kimi-k2-thinking and kimi-k2.5 (with thinking enabled) are designed to perform deep reasoning across multiple tool calls, enabling them to tackle highly complex tasks.

To get reliable results, whether using kimi-k2-thinking or kimi-k2.5 (with thinking enabled by default), always follow these configuration rules:

Include the entire reasoning content from the context (the reasoning_content field) in your input. The model will decide which parts are necessary and forward them for further reasoning.
Set max_tokens ≥ 16,000 to ensure the full reasoning_content and final content can be returned without truncation.
Set temperature = 1.0 to get the best performance. Note that kimi-k2.5 model uses a fixed temperature of 1.0.
Enable streaming (stream = true). Because thinking models return both reasoning_content and regular content, the response is larger than usual. Streaming delivers a better user experience and helps avoid network-timeout issues.

We walk through a complete example that shows how to properly use thinking models together with official tools for multi-step tool call and extended reasoning.

The example below demonstrates a "Daily News Report Generation" scenario. The model will sequentially call official tools like date (to get the date) and web_search (to search today's news), and will present deep reasoning throughout this process.

import os
import json
import httpx
import openai

class FormulaChatClient:
def init(self, base_url: str, api_key: str):
"""Initialize Formula client"""
self.base_url = base_url
self.api_key = api_key
self.openai = openai.Client(
base_url=base_url,
api_key=api_key,
)
self.httpx = httpx.Client(
base_url=base_url,
headers={"Authorization": f"Bearer {api_key}"},
timeout=30.0,
)
# Using kimi-k2-thinking model
# If using kimi-k2.5 model, change to "kimi-k2.5". Thinking is enabled by default
self.model = "kimi-k2-thinking"

def get_tools(self, formula_uri: str):
    """Get tool definitions from Formula API"""
    response = self.httpx.get(f"/formulas/{formula_uri}/tools")
    response.raise_for_status()
    
    try:
        return response.json().get("tools", [])
    except json.JSONDecodeError as e:
        print(f"Error: Unable to parse JSON (status code: {response.status_code})")
        print(f"Response content: {response.text[:500]}")
        raise

def call_tool(self, formula_uri: str, function: str, args: dict):
    """Call an official tool"""
    response = self.httpx.post(
        f"/formulas/{formula_uri}/fibers",
        json={"name": function, "arguments": json.dumps(args)},
    )
    response.raise_for_status()
    fiber = response.json()
    
    if fiber.get("status", "") == "succeeded":
        return fiber["context"].get("output") or fiber["context"].get("encrypted_output")
    
    if "error" in fiber:
        return f"Error: {fiber['error']}"
    if "error" in fiber.get("context", {}):
        return f"Error: {fiber['context']['error']}"
    return "Error: Unknown error"

def close(self):
    """Close the client connection"""
    self.httpx.close()

Initialize client

base_url = os.getenv("MOONSHOT_BASE_URL", "https://api.moonshot.ai/v1")
api_key = os.getenv("MOONSHOT_API_KEY")

if not api_key:
raise ValueError("MOONSHOT_API_KEY environment variable not set. Please set your API key.")

print(f"Base URL: {base_url}")
print(f"API Key: {api_key[:10]}...{api_key[-10:] if len(api_key) > 20 else api_key}\n")

client = FormulaChatClient(base_url, api_key)

Define the official tool Formula URIs to use

formula_uris = [
"moonshot/date:latest",
"moonshot/web-search:latest"
]

Load all tool definitions and build mapping

print("Loading official tools...")
all_tools = []
tool_to_uri = {} # function.name -> formula_uri

for uri in formula_uris:
try:
tools = client.get_tools(uri)
for tool in tools:
func = tool.get("function")
if func:
func_name = func.get("name")
if func_name:
tool_to_uri[func_name] = uri
all_tools.append(tool)
print(f" Loaded tool: {func_name} from {uri}")
except Exception as e:
print(f" Warning: Failed to load tool {uri}: {e}")
continue

print(f"Loaded {len(all_tools)} tools in total\n")

if not all_tools:
raise ValueError("No tools loaded. Please check API key and network connection.")

Initialize message list

messages = [
{
"role": "system",
"content": "You are Kimi, a professional news analyst. You excel at collecting, analyzing, and organizing information to generate high-quality news reports.",
},
]

User request to generate today's news report

user_request = "Please help me generate a daily news report including important technology, economy, and society news."
messages.append({
"role": "user",
"content": user_request
})

print(f"User request: {user_request}\n")

Begin multi-step conversation loop

max_iterations = 10 # Prevent infinite loops
for iteration in range(max_iterations):
try:
completion = client.openai.chat.completions.create(
model=client.model,
messages=messages,
max_tokens=1024 * 32,
tools=all_tools,
temperature=1.0,
)
except openai.AuthenticationError as e:
print(f"Authentication error: {e}")
print("Please check if the API key is correct and has the required permissions")
raise
except Exception as e:
print(f"Error while calling the model: {e}")
raise

# Get response
message = completion.choices[0].message

# Print reasoning process
if hasattr(message, "reasoning_content"):
    print(f"=============Reasoning round {iteration + 1} starts=============")
    reasoning = getattr(message, "reasoning_content")
    if reasoning:
        print(reasoning[:500] + "..." if len(reasoning) > 500 else reasoning)
    print(f"=============Reasoning round {iteration + 1} ends=============\n")

# Add assistant message to context (preserve reasoning_content)
messages.append(message)

# If the model did not call any tools, conversation is done
if not message.tool_calls:
    print("=============Final Answer=============")
    print(message.content)
    break

# Handle tool calls
print(f"The model decided to call {len(message.tool_calls)} tool(s):\n")

for tool_call in message.tool_calls:
    func_name = tool_call.function.name
    args = json.loads(tool_call.function.arguments)
    
    print(f"Calling tool: {func_name}")
    print(f"Arguments: {json.dumps(args, ensure_ascii=False, indent=2)}")
    
    # Get corresponding formula_uri
    formula_uri = tool_to_uri.get(func_name)
    if not formula_uri:
        print(f"Error: Could not find Formula URI for tool {func_name}")
        continue
    
    # Call the tool
    result = client.call_tool(formula_uri, func_name, args)
    
    # Print result (truncate if too long)
    if len(str(result)) > 200:
        print(f"Tool result: {str(result)[:200]}...\n")
    else:
        print(f"Tool result: {result}\n")
    
    # Add tool result to message list
    tool_message = {
        "role": "tool",
        "tool_call_id": tool_call.id,
        "name": func_name,
        "content": result
    }
    messages.append(tool_message)

print("\nConversation completed!")

Cleanup

client.close()

This process demonstrates how the kimi-k2-thinking or kimi-k2.5 (with thinking enabled) model uses deep reasoning to plan and execute complex multi-step tasks, with detailed reasoning steps (reasoning_content) preserved in the context to ensure accurate tool use at every stage.

A: Keeping the reasoning_content ensures the model maintains reasoning continuity in multi-step reasoning scenarios, especially when calling tools. The server will automatically handle these fields; users do not need to manage them manually.

A: Yes, reasoning_content counts towards your input/output token quota. For detailed pricing, please refer to MoonshotAI's pricing documentation.

@roomote-v0
Copy link
Contributor Author

roomote-v0 bot commented Mar 10, 2026

Fixaroo Clock   See task

Fixed the preset picker to comprehensively apply model-specific settings: auto-sets modelTemperature from preset's defaultTemperature (e.g. Kimi K2 models get temp=1.0), properly toggles R1 format for non-reasoning models, and adds missing locale translations. All tests and checks pass.

View commit | Revert commit

…cale translations

- Auto-set modelTemperature when selecting a preset with defaultTemperature (e.g. Kimi K2 models require temp=1.0)
- Reset openAiR1FormatEnabled to false when selecting non-reasoning models (not just on "custom")
- Reset modelTemperature to null when selecting "custom" preset
- Add defaultTemp flag to capability display showing auto-applied temperature
- Add missing appliedFlags/flags translation keys to all non-EN locales
- Add tests for temperature auto-apply and R1 reset behavior
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ENHANCEMENT] For OpenAI Compatible endpoints provide a dropdown from which users can select among all the models that Roo has specific capabilities for

2 participants