vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 18k
Star 82.8k

Code
Issues 2k
Pull requests 3.4k
Discussions
Actions
Projects
Security and quality 49
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security and quality
Insights

Pull requests: vllm-project/vllm

Labels 59 Milestones 1

New pull request New

3,378 Open 25,419 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

feat(serve): Add /readiness/stages endpoint for model-ready autoscaling documentation

Improvements or additions to documentation

frontend

#45551 opened Jun 13, 2026 by zhihuidu-amd

Loading…

Fix fp8 cache gather block table bounds

#45550 opened Jun 13, 2026 by johnnyychiu

Loading…

Fix LMCache MP LoRA cache salt keying kv-connector v1

#45549 opened Jun 13, 2026 by johnnyychiu

Loading…

[Chore] Consolidate reasoning/tool parser attributes into unified Parser in chat serving frontend

#45548 opened Jun 13, 2026 by sfeng33 Collaborator

Loading…

Fix gpt-oss required tool choice constraints frontend gpt-oss

Related to GPT-OSS models

tool-calling

#45547 opened Jun 13, 2026 by johnnyychiu

Loading…

[Bug Fix] [MiniMax-M3] Implement EAGLE3 support on the AMD MiniMax M3 bug

Something isn't working

ready

ONLY add when PR is ready to merge/full CI is needed

rocm

Related to AMD ROCm

#45546 opened Jun 13, 2026 by functionstackx

Loading…

Add Kimi video chunk splitting frontend multi-modality

Related to multi-modality (#4194)

#45545 opened Jun 13, 2026 by maxlang

Loading…

[Bugfix] Default tie_weights to sharing the weight (fix tied quantized embeddings, e.g. ModelOpt Gemma) bug

Something isn't working

#45544 opened Jun 13, 2026 by mikekg Contributor

Loading…

3 of 4 tasks

[Bugfix] CuMemAllocator: zero discarded-tag pages on wake_up bug

Something isn't working

#45542 opened Jun 13, 2026 by terafin

Loading…

1 of 2 tasks

[Bugfix] Fix SamplingParams repr, docstrings, top_k validation order, and convert StructuredOutputsParams to msgspec.Struct bug

Something isn't working

#45541 opened Jun 13, 2026 by shernshiou

Loading…

1 of 4 tasks

Fix moe_wna16 sorted token bounds

#45539 opened Jun 13, 2026 by johnnyychiu

Loading…

Fix cp_gather_cache block table bounds

#45537 opened Jun 13, 2026 by johnnyychiu

Loading…

[Model] Wire quant_config/prefix into input embeddings for GPTNeoX and Llama llama

Related to Llama models

#45535 opened Jun 13, 2026 by KKothuri

Loading…

Allow memory profiling free memory increases v1

#45534 opened Jun 13, 2026 by johnnyychiu

Loading…

[BugFix] Fix clang spinloop mwaitx include bug

Something isn't working

ready

ONLY add when PR is ready to merge/full CI is needed

verified

Run pre-commit for new contributors without triggering other tests

#45532 opened Jun 13, 2026 by johnnyychiu

Loading…

[Bugfix] Bounds-check moe_permute reverse-map write (#45492) bug

Something isn't working

#45530 opened Jun 13, 2026 by waynehacking8 Contributor

Loading…

fix: resolve SyntaxWarning for invalid escape sequences in OpenAI protocol files frontend

#45529 opened Jun 13, 2026 by codewithyug06

Loading…

[v1] Initialize InputBatch in initialize_kv_cache instead of __init__ v1

#45528 opened Jun 13, 2026 by wenyili Contributor

Loading…

[Bugfix][Kernel] Fix int32/uint overflow in merge_attn_states and permute_cols bug

Something isn't working

#45527 opened Jun 13, 2026 by techmonk000

Loading…

[Frontend] Add /health/decode endpoint for engine forward-progress liveness (re-opens #45097) frontend v1

#45526 opened Jun 13, 2026 by terafin

Loading…

[Bugfix] Fix HF-format Mistral3 crash with --tokenizer-mode mistral bug

Something isn't working

mistral

Related to Mistral models

multi-modality

Related to multi-modality (#4194)

#45525 opened Jun 13, 2026 by discobot

Loading…

3 of 4 tasks

[Core] Expose engine pause/resume state as prometheus metrics v1

#45524 opened Jun 13, 2026 by dangoldbj Contributor

Loading…

4 tasks

[Docs] Document NIXL KV connector metrics aggregation semantics for TP > 1 documentation

Improvements or additions to documentation

kv-connector

#45523 opened Jun 13, 2026 by Sugumaran-Balasubramaniyan

Loading…

fix: cache bad_words tokenization to avoid 'Already borrowed' errors under concurrency

#45522 opened Jun 13, 2026 by Oxygen56 Contributor

Loading…

3 tasks done

Revert "[Render] Add /derender endpoints for disaggregated postprocessing" (#43606) frontend

#45521 opened Jun 13, 2026 by vllm-agent Contributor • Draft

Previous 1 2 3 4 5 … 135 136 Next

Previous Next

ProTip! Follow long discussions with comments:>50.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!