Skip to content

Draft: Add new providers via adapters#60

Closed
andrejvysny wants to merge 20 commits into
huggingface:mainfrom
andrejvysny:provider-adapters-phase-2
Closed

Draft: Add new providers via adapters#60
andrejvysny wants to merge 20 commits into
huggingface:mainfrom
andrejvysny:provider-adapters-phase-2

Conversation

@andrejvysny

@andrejvysny andrejvysny commented Apr 22, 2026

Copy link
Copy Markdown

Summary

Verification

  • python3 -m py_compile agent/core/provider_adapters.py agent/core/llm_params.py
  • uv run --with pytest pytest tests/test_provider_adapters.py

andrejvysny and others added 2 commits April 23, 2026 10:29
Resolve conflicts by taking upstream for llm_params.py (will be
rewritten as thin adapter dispatcher) and main.py (model_switcher
extraction supersedes our changes).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Split NativeAdapter into AnthropicAdapter (thinking config +
output_config.effort) and OpenAIAdapter (reasoning_effort top-level).
Each adapter owns its accepted effort set and raises
UnsupportedEffortError in strict mode, preserving the effort_probe
cascade with zero changes to effort_probe.py or agent_loop.py.

llm_params.py becomes a thin dispatcher delegating to
resolve_adapter().build_params() while keeping the litellm
effort-validation patch and re-exporting UnsupportedEffortError.

model_switcher.py reads suggested models from the adapter registry
instead of maintaining a separate SUGGESTED_MODELS list.

backend/routes/agent.py replaces AVAILABLE_MODELS with
build_model_catalog().

OpenCodeGoAdapter deferred to PR huggingface#60.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@andrejvysny andrejvysny changed the title Add configurable provider registry and new LLM provider adapters Draft: Add new LLM provider adapters Apr 23, 2026
@andrejvysny andrejvysny force-pushed the provider-adapters-phase-2 branch from fb54391 to 5d357ba Compare April 23, 2026 10:46
@andrejvysny andrejvysny reopened this Apr 23, 2026
@andrejvysny andrejvysny changed the title Draft: Add new LLM provider adapters Refactor existing LLM resolution into provider adapters Apr 23, 2026
@andrejvysny andrejvysny changed the title Refactor existing LLM resolution into provider adapters Draft: Add new providers via adapters Apr 23, 2026
Restore the existing web model behavior so PR huggingface#55 stays a behavior-preserving refactor while keeping shared runtime and CLI validation logic.
Classify auth, credits, and missing-model failures once so the CLI, model switcher, and health checks show clean user-facing errors instead of raw provider traces.
@andrejvysny

Copy link
Copy Markdown
Author

Heads up: this PR builds on #55 and expands the adapter-based provider layer with additional providers / routing support.

Overlap notes:

If the final version of this PR includes Ollama/local model support directly, it may also supersede the earlier overlapping attempts in #44 and #47.

andrejvysny and others added 7 commits April 24, 2026 11:02
Incorporate Bedrock adapter, error handling for missing adapters,
upstream main merge, and cleanup from PR#55.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Native LiteLLM adapter with reasoning_effort support (low/medium/high),
minimal->low normalization, suggested models (2.5 Pro, Flash).
Supersedes PR#95 approach with proper adapter pattern.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@akseljoonas

Copy link
Copy Markdown
Contributor

@claude please review this PR. Focus on correctness, security, LiteLLM/Bedrock routing impact, and whether it's safe to merge against current main. Keep it concise and prefer inline comments where it matters.

@claude

claude Bot commented Apr 24, 2026

Copy link
Copy Markdown

Claude finished @akseljoonas's task in 3m 5s —— View job


PR Review: Add new providers via adapters

  • Post initial comment
  • Read CLAUDE.md and gather context
  • Review changed files (provider_adapters.py, llm_params.py, agent_loop.py, model_switcher.py, llm_errors.py)
  • Review backend changes (routes/agent.py, session_manager.py)
  • Review frontend changes (ChatInput.tsx, CustomModelDialog.tsx, model.ts)
  • Review tests
  • Analyze security, correctness, LiteLLM/Bedrock routing impact
  • Post final review

Summary

Solid architecture — the adapter hierarchy is clean and the tests are thorough. Two things need to be fixed before merging: a quota/auth gate that fires incorrectly on OpenRouter→Anthropic routes, and blocking I/O in an async endpoint. Everything else is either low-risk or advisory.


🔴 Blocker 1 — _is_anthropic_model matches OpenRouter Anthropic routes

File: backend/routes/agent.py:44

def _is_anthropic_model(model_id: str) -> bool:
    return "anthropic" in model_id   # ← too broad

"anthropic" in "openrouter/anthropic/claude-sonnet-4.5" is True. This means:

  1. _require_hf_for_anthropic (line 48) will 403 non-HF-org users trying to use their own OpenRouter key to access Claude — they paid for it, and HF isn't billing them.
  2. _enforce_claude_quota (line 89) will charge their daily HF Claude quota for a session that is billed to their OpenRouter account, not the Space's ANTHROPIC_API_KEY. Users could hit their HF daily cap from OpenRouter usage.

The pre-existing check was correct when only anthropic/ was possible. Adding OpenRouter as a new provider in this PR is what breaks it.

Fix: return model_id.startswith("anthropic/")

Fix this →


🔴 Blocker 2 — Blocking urlopen in async GET /api/config/model

File: agent/core/provider_adapters.py:96

with urlopen(f"{api_base}/models", timeout=_DISCOVERY_TIMEOUT_SECONDS) as response:

urllib.request.urlopen is synchronous. It is called from _discover_models()available_models()get_available_models()build_model_catalog()GET /api/config/model (an async def route). Three adapters probe on each cold cache: Ollama, LM Studio, vLLM. When all three are unreachable this blocks the event loop for up to 6 seconds, freezing every concurrent request/SSE stream on the server.

The cache (30s TTL) limits re-probing but not the cold-start cost.

Fix: Wrap in asyncio.to_thread at the call site in the route, or use httpx.AsyncClient inside _discover_models. The simplest surgical fix is to make GET /api/config/model do the discovery in a thread:

@router.get("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/config/model")
async def get_model() -> dict:
    return await asyncio.to_thread(
        build_model_catalog, session_manager.config.model_name
    )

Fix this →


🟡 Medium — LM Studio use_raw_model_name=True may send the wrong model id on the wire

File: agent/core/provider_adapters.py:397-399

"model": model_name if self.use_raw_model_name else f"openai/{model_id}",

With use_raw_model_name=True, model_name is lm_studio/google/gemma-3-12b. LiteLLM doesn't know the lm_studio/ prefix, so it either (a) fails to route, or (b) sends the full string — including the prefix — to LM Studio's API, which then can't find lm_studio/google/gemma-3-12b. The Ollama pattern ("openai/{model_id}") avoids this because LiteLLM strips openai/ before putting the id on the wire.

The tests only assert the params dict shape, not that LiteLLM routes correctly. This should be verified against a live LM Studio instance. If LM Studio expects google/gemma-3-12b without the namespace prefix, the fix is to drop use_raw_model_name and use the default f"openai/{model_id}" instead.


🟡 Medium — restore-session-summary has no message-count limit

File: backend/routes/agent.py:278 / backend/session_manager.py:226

messages = body.get("messages")
if not isinstance(messages, list) or not messages:
    raise HTTPException(status_code=400, detail="Missing 'messages' array")

An authenticated user can send an arbitrarily large messages list (100k items) triggering an expensive LLM summarization call billed to the Space's API key. A simple guard at the route level (e.g. len(messages) > 2000) would contain this.


🟢 Low — openai-compat/ switch succeeds even when OPENAI_COMPAT_BASE_URL is unset

File: backend/routes/agent.py:363 / agent/core/provider_adapters.py:530-533

is_valid_model_name("openai-compat/my-model") returns True regardless of env state, so the /session/{id}/model endpoint accepts the switch. The ValueError("OPENAI_COMPAT_BASE_URL is required") only fires at the next LLM call, producing an unrelated 500. Better to check at switch time:

adapter = resolve_adapter(model_id)
if adapter and not adapter.should_show():
    raise HTTPException(status_code=400, detail=f"Provider not configured for: {model_id}")

🟢 Low — LiteLLM patch is fragile but well-guarded

File: agent/core/llm_params.py:11-42

The monkey-patch on AnthropicConfig._is_opus_4_6_model is robust: _hf_agent_patched guards double-application, the entire function is wrapped in except Exception: return, and getattr guards against missing attributes. Risk here is low. Leave a changelog note for the next major LiteLLM upgrade.


What's good

  • Architecture: The ProviderAdapter hierarchy is clean. OpenAICompatAdapter as a base for Ollama/LM Studio/OpenRouter/OpenCode is a good reuse story.
  • Discovery: Negative caching (storing empty tuple on failure) + 30s TTL is appropriate for local services.
  • _discover_models error handling: Catches OSError, URLError, TimeoutError, ValueError — covers all realistic failure modes.
  • Test coverage: 391 lines of adapter tests cover happy paths, strict/non-strict effort rejection, and catalog visibility logic. The test_llm_errors.py suite is solid.
  • Session-scoped model switching: Correctly deep-copies config per session; quota gate fires at message-submit not at switch time.
  • CustomModelDialog: Cleans up state on close, disables backdrop click while submitting, shows preview of final model id — UX is sensible.

Merge verdict

Not yet — blockers 1 (quota/gate misfire) and 2 (event-loop blocking) should be fixed first. Both are small targeted changes. Once those land this is in good shape to merge.

- Fix blocking urlopen in async GET /api/config/model via asyncio.to_thread
- Fix LM Studio use_raw_model_name sending prefixed model id to API
- Fix _is_anthropic_model matching openrouter/anthropic/ routes
- Add message-count limit (2000) on restore-session-summary
- Add provider config check on model switch (openai-compat/ env guard)
- Cache _all_adapter_prefixes with lru_cache
- Extract useModelCatalog hook from ChatInput (635 -> 528 lines)
- Memoize provider groups calculation
- Add Enter key support to CustomModelDialog
- Fix useEffect dependency loop on selectedModelInfo

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@lewtun lewtun added the close-outdated Closed because the PR is stale, conflicted, superseded, or targets old code. label Jun 12, 2026
@lewtun

lewtun commented Jun 12, 2026

Copy link
Copy Markdown
Member

Apologies for the late follow-up; we're catching up on the backlog.

Closing this as close-outdated.

This draft is conflicted, depends on stale PR #55, and has unresolved blocker feedback.

If this is still relevant, please open a fresh PR against current main with the narrower/current approach.

@lewtun lewtun closed this Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

close-outdated Closed because the PR is stale, conflicted, superseded, or targets old code.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants