- [2026-06-10] DefTruth, Butterfingrz (2026). FFPA: Efficient Flash Prefill Attention for Large Head Dimensions via Split-D. Zenodo, 2026.
🎉🎉🎉
xlite-dev
Pinned Loading
Repositories
- cutlass Public Forked from NVIDIA/cutlass
CUDA Templates and Python DSLs for High-Performance Linear Algebra
xlite-dev/cutlass’s past year of commit activity - flash-attention Public Forked from Dao-AILab/flash-attention
Fast and memory-efficient exact attention
xlite-dev/flash-attention’s past year of commit activity - diffusers Public Forked from huggingface/diffusers
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
xlite-dev/diffusers’s past year of commit activity - sglang Public Forked from sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
xlite-dev/sglang’s past year of commit activity - vllm Public Forked from vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
xlite-dev/vllm’s past year of commit activity - Awesome-DiT-Inference Public
📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉
xlite-dev/Awesome-DiT-Inference’s past year of commit activity - cache-dit Public Forked from vipshop/cache-dit
A PyTorch-native inference engine with cache, parallelism, quantization for Diffusion Transformers.
xlite-dev/cache-dit’s past year of commit activity - cllms-for-copilot Public Forked from appledragon/cllms-for-copilot
Pick Qwen, GLM, MiniMax, Xiaomi MiMo, Moonshot Kimi & Tencent Hunyuan models from the Copilot Chat model picker. Vision, thinking, BYOK.
xlite-dev/cllms-for-copilot’s past year of commit activity - ffpa-attn Public
🤖FFPA: Extends FlashAttention-2 via Split-D for large headdims, 1.5x~3×↑🎉 vs SDPA, up to 430T🎉 on H200.
xlite-dev/ffpa-attn’s past year of commit activity
Top languages
Loading…
Most used topics
Loading…
