vllm.v1.attention.backends.mla.prefill.selector ¶
Selector for MLA prefill backends.
This module provides functions for selecting the appropriate MLA prefill backend based on device capabilities and configuration.
MLAPrefillSelectorConfig ¶
Bases: NamedTuple
Hashable configuration for MLA prefill backend selection.
This is analogous to AttentionSelectorConfig and contains model-specific configuration needed to select an MLA prefill backend, extracted from VllmConfig into a hashable form for caching.
Source code in vllm/v1/attention/backends/mla/prefill/selector.py
_auto_select_mla_prefill_backend cached ¶
_auto_select_mla_prefill_backend(
device_capability: DeviceCapability,
selector_config: MLAPrefillSelectorConfig,
) -> type[MLAPrefillBackend]
Auto-select the best available MLA prefill backend.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
device_capability | DeviceCapability | The device's compute capability. | required |
selector_config | MLAPrefillSelectorConfig | Hashable configuration for backend selection. | required |
Returns:
| Type | Description |
|---|---|
type[MLAPrefillBackend] | The selected prefill backend class. |
Source code in vllm/v1/attention/backends/mla/prefill/selector.py
_get_mla_prefill_backend_priorities ¶
_get_mla_prefill_backend_priorities(
device_capability: DeviceCapability,
) -> list[MLAPrefillBackendEnum]
Get MLA prefill backend priorities based on device capability.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
device_capability | DeviceCapability | The device's compute capability. | required |
Returns:
| Type | Description |
|---|---|
list[MLAPrefillBackendEnum] | List of backends in priority order (highest priority first). |
Source code in vllm/v1/attention/backends/mla/prefill/selector.py
get_mla_prefill_backend ¶
get_mla_prefill_backend(
vllm_config: VllmConfig,
) -> type[MLAPrefillBackend]
Select the MLA prefill backend based on configuration and device.
This function first checks for explicit user preferences via mla.prefill_backend in AttentionConfig, then falls back to automatic priority-based selection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vllm_config | VllmConfig | The vLLM configuration. | required |
Returns:
| Type | Description |
|---|---|
type[MLAPrefillBackend] | The selected prefill backend class. |
Source code in vllm/v1/attention/backends/mla/prefill/selector.py
is_deepseek_r1_mla_compatible ¶
is_deepseek_r1_mla_compatible(
vllm_config: VllmConfig,
) -> bool
Check if model has DeepSeek R1 compatible MLA dimensions.
DeepSeek R1 MLA dimensions are: - qk_nope_head_dim = 128 - qk_rope_head_dim = 64 - v_head_dim = 128
These dimensions are required for optimized backends like TRTLLM_RAGGED, FLASHINFER, and CUDNN on Blackwell.