vllm.tool_parsers.gemma4_utils ¶

Gemma4 tool call parsing utilities for offline inference.

Standalone functions that parse decoded model text to extract tool calls from Gemma4 models. These are pure-Python utilities with zero heavy dependencies — they work on raw decoded strings from any inference backend (vLLM, HuggingFace, TGI, etc.).

For the OpenAI-compatible API server tool parser (streaming + non-streaming), see vllm.tool_parsers.gemma4_tool_parser. For thinking/reasoning output parsing, see vllm.reasoning.gemma4_utils.

Usage with vLLM offline inference::

from vllm import LLM, SamplingParams
from vllm.tool_parsers.gemma4_utils import (
    parse_tool_calls,
    has_tool_response_tag,
)

llm = LLM(model="google/gemma-4-it")
outputs = llm.generate(prompt, SamplingParams(...))
text = tokenizer.decode(outputs[0].outputs[0].token_ids, skip_special_tokens=False)

# Extract tool calls
tool_calls = parse_tool_calls(text)
for tc in tool_calls:
    print(f"{tc['name']}({tc['arguments']})")

Ported from transformers.models.gemma4.utils_gemma4 so that vLLM users do not need a transformers dependency for output parsing.

_parse_tool_arguments ¶

_parse_tool_arguments(args_str: str) -> dict[str, str]

Parse tool call arguments from the Gemma4 compact format.

Handles the key:<|"|>value<|"|> format used by Gemma4, with fallback to heuristic key-value extraction. Also tolerates the slightly different key: "value" format (space + plain quotes) that some chat templates produce.

Parameters:

Name	Type	Description	Default
`args_str`	`str`	Raw argument string from inside `call:name{...}`.	required

Returns:

Type	Description
`dict[str, str]`	Dictionary of argument name → value.

Source code in vllm/tool_parsers/gemma4_utils.py

def _parse_tool_arguments(args_str: str) -> dict[str, str]:
    """Parse tool call arguments from the Gemma4 compact format.

    Handles the ``key:<|"|>value<|"|>`` format used by Gemma4, with fallback
    to heuristic key-value extraction. Also tolerates the slightly different
    ``key: "value"`` format (space + plain quotes) that some chat templates
    produce.

    Args:
        args_str: Raw argument string from inside ``call:name{...}``.

    Returns:
        Dictionary of argument name → value.
    """
    if not args_str or not args_str.strip():
        return {}

    # Replace Gemma4 escape tokens with standard quotes.
    cleaned = args_str.replace(_ESCAPE_TOKEN, '"')

    # Try JSON parsing first (handles nested values, arrays, etc.).
    try:
        parsed = json.loads("{" + cleaned + "}")
        # Ensure all values are strings for consistency.
        return {k: str(v) if not isinstance(v, str) else v for k, v in parsed.items()}
    except (json.JSONDecodeError, ValueError):
        pass

    # Fallback: extract key:"value" pairs (allow optional space after colon).
    arguments = {}
    for key, value in re.findall(r'(\w+):\s*"([^"]*)"', cleaned):
        arguments[key] = value

    if not arguments:
        # Last resort: extract key:value pairs (unquoted).
        for key, value in re.findall(r"(\w+):\s*([^,}]+)", args_str):
            arguments[key] = value.strip().strip('"').replace(_ESCAPE_TOKEN, "")

    return arguments

has_tool_response_tag ¶

has_tool_response_tag(text: str) -> bool

Check if model output properly ends with a tool response tag.

Some Gemma4 models sometimes emit <eos> instead of <|tool_response> after a tool call. This helper detects whether the model used the proper termination, so callers can decide whether to inject <|tool_response> into the next prompt.

Parameters:

Name	Type	Description	Default
`text`	`str`	Decoded model output text.	required

Returns:

Type	Description
`bool`	`True` if the output ends with `<\|tool_response>`
`bool`	(proper behavior), `False` otherwise.

Example::

>>> from vllm.tool_parsers.gemma4_utils import has_tool_response_tag
>>> if not has_tool_response_tag(model_output):
...     # Model used <eos> instead — inject <|tool_response> manually
...     next_prompt = "<|tool_response>" + tool_result

Source code in vllm/tool_parsers/gemma4_utils.py

def has_tool_response_tag(text: str) -> bool:
    """Check if model output properly ends with a tool response tag.

    Some Gemma4 models sometimes emit ``<eos>`` instead of
    ``<|tool_response>`` after a tool call. This helper detects
    whether the model used the proper termination, so callers can
    decide whether to inject ``<|tool_response>`` into the next prompt.

    Args:
        text: Decoded model output text.

    Returns:
        ``True`` if the output ends with ``<|tool_response>``
        (proper behavior), ``False`` otherwise.

    Example::

        >>> from vllm.tool_parsers.gemma4_utils import has_tool_response_tag
        >>> if not has_tool_response_tag(model_output):
        ...     # Model used <eos> instead — inject <|tool_response> manually
        ...     next_prompt = "<|tool_response>" + tool_result
    """
    stripped = text.rstrip()
    return stripped.endswith(_TOOL_RESPONSE_START_TAG)

parse_tool_calls ¶

parse_tool_calls(
    text: str, *, strict: bool = False
) -> list[dict]

Parse tool calls from decoded Gemma4 model output.

Uses a tiered parsing strategy to handle known output variations in Gemma4 models, which may emit non-standard tool call formats.

Parsing tiers

Standard: <|tool_call>call:name{args}<tool_call|> (special token IDs 48/49 in decoded text)
Fallback (when strict=False): bare call:name{args} patterns, including <call>name{args} (fragmented tokens from multimodal inputs)

Parameters:

Name	Type	Description	Default
`text`	`str`	Decoded model output text (from `tokenizer.decode(..., skip_special_tokens=False)`).	required
`strict`	`bool`	If `True`, only match the standard `<\|tool_call>` format. If `False` (default), also try fallback patterns for known Gemma4 output variations.	`False`

Returns:

Type	Description
`list[dict]`	A list of dicts, each with keys: - `"name"`: The tool function name (e.g. `"get_weather"`). - `"arguments"`: A dict of argument name → value.

Example::

>>> from vllm.tool_parsers.gemma4_utils import parse_tool_calls
>>> output = tokenizer.decode(outputs[0], skip_special_tokens=False)
>>> tool_calls = parse_tool_calls(output)
>>> for tc in tool_calls:
...     print(f"Call: {tc['name']}({tc['arguments']})")

Source code in vllm/tool_parsers/gemma4_utils.py

def parse_tool_calls(text: str, *, strict: bool = False) -> list[dict]:
    """Parse tool calls from decoded Gemma4 model output.

    Uses a tiered parsing strategy to handle known output variations in
    Gemma4 models, which may emit
    non-standard tool call formats.

    Parsing tiers:
        1. **Standard**: ``<|tool_call>call:name{args}<tool_call|>``
           (special token IDs 48/49 in decoded text)
        2. **Fallback** (when ``strict=False``): bare ``call:name{args}``
           patterns, including ``<call>name{args}`` (fragmented tokens from
           multimodal inputs)

    Args:
        text: Decoded model output text (from ``tokenizer.decode(...,
            skip_special_tokens=False)``).
        strict: If ``True``, only match the standard ``<|tool_call>`` format.
            If ``False`` (default), also try fallback patterns for
            known Gemma4 output variations.

    Returns:
        A list of dicts, each with keys:
            - ``"name"``: The tool function name (e.g. ``"get_weather"``).
            - ``"arguments"``: A dict of argument name → value.

    Example::

        >>> from vllm.tool_parsers.gemma4_utils import parse_tool_calls
        >>> output = tokenizer.decode(outputs[0], skip_special_tokens=False)
        >>> tool_calls = parse_tool_calls(output)
        >>> for tc in tool_calls:
        ...     print(f"Call: {tc['name']}({tc['arguments']})")
    """
    results = []

    # Tier 1: Standard format with special tokens.
    # <|tool_call>call:name{args}<tool_call|>
    # Note: Some Gemma4 models emit <turn|> instead of <tool_call|>.
    standard_pattern = r"<\|tool_call\>call:(\w+)\{(.*?)\}(?:<tool_call\|>|<turn\|>)"
    for match in re.finditer(standard_pattern, text, re.DOTALL):
        name, args_str = match.group(1), match.group(2)
        results.append(
            {
                "name": name,
                "arguments": _parse_tool_arguments(args_str),
            }
        )

    if results or strict:
        return results

    # Tier 2: Fallback for known Gemma4 output variations.
    # Matches: <call>name{args}, call:name{args}, or bare call:name{args}<eos>
    fallback_pattern = r"(?:<call>|(?:^|\s)call:)(\w+)\{(.*?)\}"
    for match in re.finditer(fallback_pattern, text, re.DOTALL):
        name, args_str = match.group(1), match.group(2)
        results.append(
            {
                "name": name,
                "arguments": _parse_tool_arguments(args_str),
            }
        )

    return results