Draft: Add new providers via adapters by andrejvysny · Pull Request #60 · huggingface/ml-intern

andrejvysny · 2026-04-22T16:46:32Z

Summary

builds on top of feat: Refactor existing LLM resolution into provider adapters #55 (provider adapter refactor)
adds new provider adapters for LM Studio, OpenRouter, OpenCode Zen, and OpenCode Go
adds session-scoped model switching, local LM Studio discovery, and restore-summary UX fixes

Verification

python3 -m py_compile agent/core/provider_adapters.py agent/core/llm_params.py
uv run --with pytest pytest tests/test_provider_adapters.py

Resolve conflicts by taking upstream for llm_params.py (will be rewritten as thin adapter dispatcher) and main.py (model_switcher extraction supersedes our changes). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Split NativeAdapter into AnthropicAdapter (thinking config + output_config.effort) and OpenAIAdapter (reasoning_effort top-level). Each adapter owns its accepted effort set and raises UnsupportedEffortError in strict mode, preserving the effort_probe cascade with zero changes to effort_probe.py or agent_loop.py. llm_params.py becomes a thin dispatcher delegating to resolve_adapter().build_params() while keeping the litellm effort-validation patch and re-exporting UnsupportedEffortError. model_switcher.py reads suggested models from the adapter registry instead of maintaining a separate SUGGESTED_MODELS list. backend/routes/agent.py replaces AVAILABLE_MODELS with build_model_catalog(). OpenCodeGoAdapter deferred to PR huggingface#60. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Restore the existing web model behavior so PR huggingface#55 stays a behavior-preserving refactor while keeping shared runtime and CLI validation logic.

Classify auth, credits, and missing-model failures once so the CLI, model switcher, and health checks show clean user-facing errors instead of raw provider traces.

andrejvysny · 2026-04-23T11:51:00Z

Heads up: this PR builds on #55 and expands the adapter-based provider layer with additional providers / routing support.

Overlap notes:

addresses the local/OpenAI-compatible backend request in Feature request: Allow routing to local OpenAI-compatible backends (Ollama / vLLM / LM Studio) #67
overlaps with Add local OpenAI-compatible backend routing #68, which implements a narrower local-backend slice
partially addresses Support for Open-source #56 by expanding support beyond the current built-in providers

If the final version of this PR includes Ollama/local model support directly, it may also supersede the earlier overlapping attempts in #44 and #47.

# Conflicts: # agent/core/llm_params.py

…_params

Incorporate Bedrock adapter, error handling for missing adapters, upstream main merge, and cleanup from PR#55. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Native LiteLLM adapter with reasoning_effort support (low/medium/high), minimal->low normalization, suggested models (2.5 Pro, Flash). Supersedes PR#95 approach with proper adapter pattern. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

akseljoonas · 2026-04-24T14:08:20Z

@claude please review this PR. Focus on correctness, security, LiteLLM/Bedrock routing impact, and whether it's safe to merge against current main. Keep it concise and prefer inline comments where it matters.

claude · 2026-04-24T14:08:39Z

Claude finished @akseljoonas's task in 3m 5s —— View job

PR Review: Add new providers via adapters

Post initial comment
Read CLAUDE.md and gather context
Review changed files (provider_adapters.py, llm_params.py, agent_loop.py, model_switcher.py, llm_errors.py)
Review backend changes (routes/agent.py, session_manager.py)
Review frontend changes (ChatInput.tsx, CustomModelDialog.tsx, model.ts)
Review tests
Analyze security, correctness, LiteLLM/Bedrock routing impact
Post final review

Summary

Solid architecture — the adapter hierarchy is clean and the tests are thorough. Two things need to be fixed before merging: a quota/auth gate that fires incorrectly on OpenRouter→Anthropic routes, and blocking I/O in an async endpoint. Everything else is either low-risk or advisory.

🔴 Blocker 1 — `_is_anthropic_model` matches OpenRouter Anthropic routes

File: backend/routes/agent.py:44

def _is_anthropic_model(model_id: str) -> bool:
    return "anthropic" in model_id   # ← too broad

"anthropic" in "openrouter/anthropic/claude-sonnet-4.5" is True. This means:

_require_hf_for_anthropic (line 48) will 403 non-HF-org users trying to use their own OpenRouter key to access Claude — they paid for it, and HF isn't billing them.
_enforce_claude_quota (line 89) will charge their daily HF Claude quota for a session that is billed to their OpenRouter account, not the Space's ANTHROPIC_API_KEY. Users could hit their HF daily cap from OpenRouter usage.

The pre-existing check was correct when only anthropic/ was possible. Adding OpenRouter as a new provider in this PR is what breaks it.

Fix: return model_id.startswith("anthropic/")

Fix this →

🔴 Blocker 2 — Blocking `urlopen` in async `GET /api/config/model`

File: agent/core/provider_adapters.py:96

with urlopen(f"{api_base}/models", timeout=_DISCOVERY_TIMEOUT_SECONDS) as response:

urllib.request.urlopen is synchronous. It is called from _discover_models() → available_models() → get_available_models() → build_model_catalog() → GET /api/config/model (an async def route). Three adapters probe on each cold cache: Ollama, LM Studio, vLLM. When all three are unreachable this blocks the event loop for up to 6 seconds, freezing every concurrent request/SSE stream on the server.

The cache (30s TTL) limits re-probing but not the cold-start cost.

Fix: Wrap in asyncio.to_thread at the call site in the route, or use httpx.AsyncClient inside _discover_models. The simplest surgical fix is to make GET /api/config/model do the discovery in a thread:

@router.get("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/config/model")
async def get_model() -> dict:
    return await asyncio.to_thread(
        build_model_catalog, session_manager.config.model_name
    )

Fix this →

🟡 Medium — LM Studio `use_raw_model_name=True` may send the wrong model id on the wire

File: agent/core/provider_adapters.py:397-399

"model": model_name if self.use_raw_model_name else f"openai/{model_id}",

With use_raw_model_name=True, model_name is lm_studio/google/gemma-3-12b. LiteLLM doesn't know the lm_studio/ prefix, so it either (a) fails to route, or (b) sends the full string — including the prefix — to LM Studio's API, which then can't find lm_studio/google/gemma-3-12b. The Ollama pattern ("openai/{model_id}") avoids this because LiteLLM strips openai/ before putting the id on the wire.

The tests only assert the params dict shape, not that LiteLLM routes correctly. This should be verified against a live LM Studio instance. If LM Studio expects google/gemma-3-12b without the namespace prefix, the fix is to drop use_raw_model_name and use the default f"openai/{model_id}" instead.

🟡 Medium — `restore-session-summary` has no message-count limit

File: backend/routes/agent.py:278 / backend/session_manager.py:226

messages = body.get("messages")
if not isinstance(messages, list) or not messages:
    raise HTTPException(status_code=400, detail="Missing 'messages' array")

An authenticated user can send an arbitrarily large messages list (100k items) triggering an expensive LLM summarization call billed to the Space's API key. A simple guard at the route level (e.g. len(messages) > 2000) would contain this.

🟢 Low — `openai-compat/` switch succeeds even when `OPENAI_COMPAT_BASE_URL` is unset

File: backend/routes/agent.py:363 / agent/core/provider_adapters.py:530-533

is_valid_model_name("openai-compat/my-model") returns True regardless of env state, so the /session/{id}/model endpoint accepts the switch. The ValueError("OPENAI_COMPAT_BASE_URL is required") only fires at the next LLM call, producing an unrelated 500. Better to check at switch time:

adapter = resolve_adapter(model_id)
if adapter and not adapter.should_show():
    raise HTTPException(status_code=400, detail=f"Provider not configured for: {model_id}")

🟢 Low — LiteLLM patch is fragile but well-guarded

File: agent/core/llm_params.py:11-42

The monkey-patch on AnthropicConfig._is_opus_4_6_model is robust: _hf_agent_patched guards double-application, the entire function is wrapped in except Exception: return, and getattr guards against missing attributes. Risk here is low. Leave a changelog note for the next major LiteLLM upgrade.

What's good

Architecture: The ProviderAdapter hierarchy is clean. OpenAICompatAdapter as a base for Ollama/LM Studio/OpenRouter/OpenCode is a good reuse story.
Discovery: Negative caching (storing empty tuple on failure) + 30s TTL is appropriate for local services.
_discover_models error handling: Catches OSError, URLError, TimeoutError, ValueError — covers all realistic failure modes.
Test coverage: 391 lines of adapter tests cover happy paths, strict/non-strict effort rejection, and catalog visibility logic. The test_llm_errors.py suite is solid.
Session-scoped model switching: Correctly deep-copies config per session; quota gate fires at message-submit not at switch time.
CustomModelDialog: Cleans up state on close, disables backdrop click while submitting, shows preview of final model id — UX is sensible.

Merge verdict

Not yet — blockers 1 (quota/gate misfire) and 2 (event-loop blocking) should be fixed first. Both are small targeted changes. Once those land this is in good shape to merge.

- Fix blocking urlopen in async GET /api/config/model via asyncio.to_thread - Fix LM Studio use_raw_model_name sending prefixed model id to API - Fix _is_anthropic_model matching openrouter/anthropic/ routes - Add message-count limit (2000) on restore-session-summary - Add provider config check on model switch (openai-compat/ env guard) - Cache _all_adapter_prefixes with lru_cache - Extract useModelCatalog hook from ChatInput (635 -> 528 lines) - Memoize provider groups calculation - Add Enter key support to CustomModelDialog - Fix useEffect dependency loop on selectedModelInfo Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

lewtun · 2026-06-12T15:33:35Z

Apologies for the late follow-up; we're catching up on the backlog.

Closing this as close-outdated.

This draft is conflicted, depends on stale PR #55, and has unresolved blocker feedback.

If this is still relevant, please open a fresh PR against current main with the narrower/current approach.

andrejvysny added 5 commits April 22, 2026 15:58

Refactor LLM param resolution into adapters

267618d

Merge upstream main into provider adapters

a0eba84

Remove unused session_hf_token parameter from NativeAdapter.build_params

dcb4aff

Unify model catalog around provider adapters

2f85650

Add OpenCode Go provider adapter

f1f9116

Pro777 mentioned this pull request Apr 23, 2026

Feature request: Allow routing to local OpenAI-compatible backends (Ollama / vLLM / LM Studio) #67

Closed

andrejvysny and others added 2 commits April 23, 2026 10:29

Merge upstream main into provider adapters

a5bdf29

Resolve conflicts by taking upstream for llm_params.py (will be rewritten as thin adapter dispatcher) and main.py (model_switcher extraction supersedes our changes). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

andrejvysny changed the title ~~Add configurable provider registry and new LLM provider adapters~~ Draft: Add new LLM provider adapters Apr 23, 2026

Merge upstream main into provider adapters

e5b113a

andrejvysny closed this Apr 23, 2026

andrejvysny force-pushed the provider-adapters-phase-2 branch from fb54391 to 5d357ba Compare April 23, 2026 10:46

andrejvysny reopened this Apr 23, 2026

andrejvysny changed the title ~~Draft: Add new LLM provider adapters~~ Refactor existing LLM resolution into provider adapters Apr 23, 2026

andrejvysny changed the title ~~Refactor existing LLM resolution into provider adapters~~ Draft: Add new providers via adapters Apr 23, 2026

andrejvysny added 2 commits April 23, 2026 13:05

Trim provider adapters to params and validation

a4a7532

Restore the existing web model behavior so PR huggingface#55 stays a behavior-preserving refactor while keeping shared runtime and CLI validation logic.

Clean provider error handling

d559a37

Classify auth, credits, and missing-model failures once so the CLI, model switcher, and health checks show clean user-facing errors instead of raw provider traces.

andrejvysny mentioned this pull request Apr 23, 2026

feat: Refactor existing LLM resolution into provider adapters #55

Closed

andrejvysny added 2 commits April 23, 2026 14:27

Add Phase 1 provider adapters

a3a1b8f

Build adapter-driven model catalog and picker

8b1abd9

statikkkkk mentioned this pull request Apr 24, 2026

Add local OpenAI-compatible backend routing #68

Closed

andrejvysny and others added 7 commits April 24, 2026 11:02

Merge remote-tracking branch 'origin/main' into feat/provider-adapters

958c893

# Conflicts: # agent/core/llm_params.py

feat: add error handling for missing provider adapter in _resolve_llm…

88a95a6

…_params

refactor: revert unnecessary formatting changes

590496f

refactor: cleanup before merge

cc6cff7

Merge feat/provider-adapters (PR#55) into phase-2 branch

0307e23

Incorporate Bedrock adapter, error handling for missing adapters, upstream main merge, and cleanup from PR#55. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore: remove outdated TODO list

4ac4f1a

lewtun mentioned this pull request May 4, 2026

ML Intern backlog prioritization report - 2026-05-04 #223

Closed

lewtun added the close-outdated Closed because the PR is stale, conflicted, superseded, or targets old code. label Jun 12, 2026

lewtun closed this Jun 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft: Add new providers via adapters#60

Draft: Add new providers via adapters#60
andrejvysny wants to merge 20 commits into
huggingface:mainfrom
andrejvysny:provider-adapters-phase-2

andrejvysny commented Apr 22, 2026 •

edited

Loading

Uh oh!

andrejvysny commented Apr 23, 2026

Uh oh!

akseljoonas commented Apr 24, 2026

Uh oh!

claude Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

lewtun commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

andrejvysny commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Uh oh!

andrejvysny commented Apr 23, 2026

Uh oh!

akseljoonas commented Apr 24, 2026

Uh oh!

claude Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Add new providers via adapters

Summary

🔴 Blocker 1 — _is_anthropic_model matches OpenRouter Anthropic routes

🔴 Blocker 2 — Blocking urlopen in async GET /api/config/model

🟡 Medium — LM Studio use_raw_model_name=True may send the wrong model id on the wire

🟡 Medium — restore-session-summary has no message-count limit

🟢 Low — openai-compat/ switch succeeds even when OPENAI_COMPAT_BASE_URL is unset

🟢 Low — LiteLLM patch is fragile but well-guarded

What's good

Merge verdict

Uh oh!

lewtun commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

andrejvysny commented Apr 22, 2026 •

edited

Loading

claude Bot commented Apr 24, 2026 •

edited

Loading

🔴 Blocker 1 — `_is_anthropic_model` matches OpenRouter Anthropic routes

🔴 Blocker 2 — Blocking `urlopen` in async `GET /api/config/model`

🟡 Medium — LM Studio `use_raw_model_name=True` may send the wrong model id on the wire

🟡 Medium — `restore-session-summary` has no message-count limit

🟢 Low — `openai-compat/` switch succeeds even when `OPENAI_COMPAT_BASE_URL` is unset