vllm.model_executor.layers.fused_moe.shared_fused_moe ¶

SharedFusedMoE ¶

Bases: FusedMoE

A FusedMoE operation that also computes the results of shared experts. If an all2all communicator is being used the shared expert computation can be interleaved with the fused all2all dispatch communication step.

Source code in vllm/model_executor/layers/fused_moe/shared_fused_moe.py

class SharedFusedMoE(FusedMoE):
    """
    A FusedMoE operation that also computes the results of shared experts.
    If an all2all communicator is being used the shared expert computation
    can be interleaved with the fused all2all dispatch communication step.
    """

    def forward(
        self,
        hidden_states: torch.Tensor,
        router_logits: torch.Tensor,
    ) -> tuple[torch.Tensor, torch.Tensor]:
        result = super().forward(
            hidden_states=hidden_states,
            router_logits=router_logits,
        )
        if self.shared_experts is None:
            return None, result
        else:
            return result