--- summary: "Place outbound and accept inbound voice calls via Twilio, Telnyx, or Plivo, with optional realtime voice and streaming transcription" read_when: - You want to place an outbound voice call from OpenClaw - You are configuring or developing the voice-call plugin - You need realtime voice or streaming transcription on telephony title: "Voice call plugin" sidebarTitle: "Voice call" --- Voice calls for OpenClaw via a plugin. Supports outbound notifications, multi-turn conversations, full-duplex realtime voice, streaming transcription, and inbound calls with allowlist policies. **Current providers:** `twilio` (Programmable Voice + Media Streams), `telnyx` (Call Control v2), `plivo` (Voice API + XML transfer + GetInput speech), `mock` (dev/no network). The Voice Call plugin runs **inside the Gateway process**. If you use a remote Gateway, install and configure the plugin on the machine running the Gateway, then restart the Gateway to load it. ## Quick start ```bash openclaw plugins install @openclaw/voice-call ``` ```bash PLUGIN_SRC=./path/to/local/voice-call-plugin openclaw plugins install "$PLUGIN_SRC" cd "$PLUGIN_SRC" && pnpm install ``` If npm reports the OpenClaw-owned package as deprecated, that package version is from an older external package train; use a current packaged OpenClaw build or the local folder path until a newer npm package is published. Restart the Gateway afterwards so the plugin loads. Set config under `plugins.entries.voice-call.config` (see [Configuration](#configuration) below for the full shape). At minimum: `provider`, provider credentials, `fromNumber`, and a publicly reachable webhook URL. ```bash openclaw voicecall setup ``` The default output is readable in chat logs and terminals. It checks plugin enablement, provider credentials, webhook exposure, and that only one audio mode (`streaming` or `realtime`) is active. Use `--json` for scripts. ```bash openclaw voicecall smoke openclaw voicecall smoke --to "+15555550123" ``` Both are dry runs by default. Add `--yes` to actually place a short outbound notify call: ```bash openclaw voicecall smoke --to "+15555550123" --yes ``` For Twilio, Telnyx, and Plivo, setup must resolve to a **public webhook URL**. If `publicUrl`, the tunnel URL, the Tailscale URL, or the serve fallback resolves to loopback or private network space, setup fails instead of starting a provider that cannot receive carrier webhooks. ## Configuration If `enabled: true` but the selected provider is missing credentials, Gateway startup logs a setup-incomplete warning with the missing keys and skips starting the runtime. Commands, RPC calls, and agent tools still return the exact missing provider configuration when used. Voice-call credentials accept SecretRefs. `plugins.entries.voice-call.config.twilio.authToken` and `plugins.entries.voice-call.config.tts.providers.*.apiKey` resolve through the standard SecretRef surface; see [SecretRef credential surface](/reference/secretref-credential-surface). ```json5 { plugins: { entries: { "voice-call": { enabled: true, config: { provider: "twilio", // or "telnyx" | "plivo" | "mock" fromNumber: "+15550001234", // or TWILIO_FROM_NUMBER for Twilio toNumber: "+15550005678", twilio: { accountSid: "ACxxxxxxxx", authToken: "...", }, telnyx: { apiKey: "...", connectionId: "...", // Telnyx webhook public key from the Mission Control Portal // (Base64; can also be set via TELNYX_PUBLIC_KEY). publicKey: "...", }, plivo: { authId: "MAxxxxxxxxxxxxxxxxxxxx", authToken: "...", }, // Webhook server serve: { port: 3334, path: "/voice/webhook", }, // Webhook security (recommended for tunnels/proxies) webhookSecurity: { allowedHosts: ["voice.example.com"], trustedProxyIPs: ["100.64.0.1"], }, // Public exposure (pick one) // publicUrl: "https://example.ngrok.app/voice/webhook", // tunnel: { provider: "ngrok" }, // tailscale: { mode: "funnel", path: "/voice/webhook" }, outbound: { defaultMode: "notify", // notify | conversation }, streaming: { enabled: true /* see Streaming transcription */ }, realtime: { enabled: false /* see Realtime voice */ }, }, }, }, }, } ``` - Twilio, Telnyx, and Plivo all require a **publicly reachable** webhook URL. - `mock` is a local dev provider (no network calls). - Telnyx requires `telnyx.publicKey` (or `TELNYX_PUBLIC_KEY`) unless `skipSignatureVerification` is true. - `skipSignatureVerification` is for local testing only. - On ngrok free tier, set `publicUrl` to the exact ngrok URL; signature verification is always enforced. - `tunnel.allowNgrokFreeTierLoopbackBypass: true` allows Twilio webhooks with invalid signatures **only** when `tunnel.provider="ngrok"` and `serve.bind` is loopback (ngrok local agent). Local dev only. - Ngrok free-tier URLs can change or add interstitial behaviour; if `publicUrl` drifts, Twilio signatures fail. Production: prefer a stable domain or a Tailscale funnel. - `streaming.preStartTimeoutMs` closes sockets that never send a valid `start` frame. - `streaming.maxPendingConnections` caps total unauthenticated pre-start sockets. - `streaming.maxPendingConnectionsPerIp` caps unauthenticated pre-start sockets per source IP. - `streaming.maxConnections` caps total open media stream sockets (pending + active). Older configs using `provider: "log"`, `twilio.from`, or legacy `streaming.*` OpenAI keys are rewritten by `openclaw doctor --fix`. Runtime fallback still accepts the old voice-call keys for now, but the rewrite path is `openclaw doctor --fix` and the compat shim is temporary. Auto-migrated streaming keys: - `streaming.sttProvider` → `streaming.provider` - `streaming.openaiApiKey` → `streaming.providers.openai.apiKey` - `streaming.sttModel` → `streaming.providers.openai.model` - `streaming.silenceDurationMs` → `streaming.providers.openai.silenceDurationMs` - `streaming.vadThreshold` → `streaming.providers.openai.vadThreshold` ## Realtime voice conversations `realtime` selects a full-duplex realtime voice provider for live call audio. It is separate from `streaming`, which only forwards audio to realtime transcription providers. `realtime.enabled` cannot be combined with `streaming.enabled`. Pick one audio mode per call. Current runtime behaviour: - `realtime.enabled` is supported for Twilio Media Streams. - `realtime.provider` is optional. If unset, Voice Call uses the first registered realtime voice provider. - Bundled realtime voice providers: Google Gemini Live (`google`) and OpenAI (`openai`), registered by their provider plugins. - Provider-owned raw config lives under `realtime.providers.`. - Voice Call exposes the shared `openclaw_agent_consult` realtime tool by default. The realtime model can call it when the caller asks for deeper reasoning, current information, or normal OpenClaw tools. - If `realtime.provider` points at an unregistered provider, or no realtime voice provider is registered at all, Voice Call logs a warning and skips realtime media instead of failing the whole plugin. - Consult session keys reuse the existing voice session when available, then fall back to the caller/callee phone number so follow-up consult calls keep context during the call. ### Tool policy `realtime.toolPolicy` controls the consult run: | Policy | Behavior | | ---------------- | ---------------------------------------------------------------------------------------------------------------------------------------- | | `safe-read-only` | Expose the consult tool and limit the regular agent to `read`, `web_search`, `web_fetch`, `x_search`, `memory_search`, and `memory_get`. | | `owner` | Expose the consult tool and let the regular agent use the normal agent tool policy. | | `none` | Do not expose the consult tool. Custom `realtime.tools` are still passed through to the realtime provider. | ### Realtime provider examples Defaults: API key from `realtime.providers.google.apiKey`, `GEMINI_API_KEY`, or `GOOGLE_GENERATIVE_AI_API_KEY`; model `gemini-2.5-flash-native-audio-preview-12-2025`; voice `Kore`. ```json5 { plugins: { entries: { "voice-call": { config: { provider: "twilio", inboundPolicy: "allowlist", allowFrom: ["+15550005678"], realtime: { enabled: true, provider: "google", instructions: "Speak briefly. Call openclaw_agent_consult before using deeper tools.", toolPolicy: "safe-read-only", providers: { google: { apiKey: "${GEMINI_API_KEY}", model: "gemini-2.5-flash-native-audio-preview-12-2025", voice: "Kore", }, }, }, }, }, }, }, } ``` ```json5 { plugins: { entries: { "voice-call": { config: { realtime: { enabled: true, provider: "openai", providers: { openai: { apiKey: "${OPENAI_API_KEY}" }, }, }, }, }, }, }, } ``` See [Google provider](/providers/google) and [OpenAI provider](/providers/openai) for provider-specific realtime voice options. ## Streaming transcription `streaming` selects a realtime transcription provider for live call audio. Current runtime behavior: - `streaming.provider` is optional. If unset, Voice Call uses the first registered realtime transcription provider. - Bundled realtime transcription providers: Deepgram (`deepgram`), ElevenLabs (`elevenlabs`), Mistral (`mistral`), OpenAI (`openai`), and xAI (`xai`), registered by their provider plugins. - Provider-owned raw config lives under `streaming.providers.`. - If `streaming.provider` points at an unregistered provider, or none is registered, Voice Call logs a warning and skips media streaming instead of failing the whole plugin. ### Streaming provider examples Defaults: API key `streaming.providers.openai.apiKey` or `OPENAI_API_KEY`; model `gpt-4o-transcribe`; `silenceDurationMs: 800`; `vadThreshold: 0.5`. ```json5 { plugins: { entries: { "voice-call": { config: { streaming: { enabled: true, provider: "openai", streamPath: "/voice/stream", providers: { openai: { apiKey: "sk-...", // optional if OPENAI_API_KEY is set model: "gpt-4o-transcribe", silenceDurationMs: 800, vadThreshold: 0.5, }, }, }, }, }, }, }, } ``` Defaults: API key `streaming.providers.xai.apiKey` or `XAI_API_KEY`; endpoint `wss://api.x.ai/v1/stt`; encoding `mulaw`; sample rate `8000`; `endpointingMs: 800`; `interimResults: true`. ```json5 { plugins: { entries: { "voice-call": { config: { streaming: { enabled: true, provider: "xai", streamPath: "/voice/stream", providers: { xai: { apiKey: "${XAI_API_KEY}", // optional if XAI_API_KEY is set endpointingMs: 800, language: "en", }, }, }, }, }, }, }, } ``` ## TTS for calls Voice Call uses the core `messages.tts` configuration for streaming speech on calls. You can override it under the plugin config with the **same shape** — it deep-merges with `messages.tts`. ```json5 { tts: { provider: "elevenlabs", providers: { elevenlabs: { voiceId: "pMsXgVXv3BLzUgSXRplE", modelId: "eleven_multilingual_v2", }, }, }, } ``` **Microsoft speech is ignored for voice calls.** Telephony audio needs PCM; the current Microsoft transport does not expose telephony PCM output. Behavior notes: - Legacy `tts.` keys inside plugin config (`openai`, `elevenlabs`, `microsoft`, `edge`) are repaired by `openclaw doctor --fix`; committed config should use `tts.providers.`. - Core TTS is used when Twilio media streaming is enabled; otherwise calls fall back to provider-native voices. - If a Twilio media stream is already active, Voice Call does not fall back to TwiML ``. If telephony TTS is unavailable in that state, the playback request fails instead of mixing two playback paths. - When telephony TTS falls back to a secondary provider, Voice Call logs a warning with the provider chain (`from`, `to`, `attempts`) for debugging. - When Twilio barge-in or stream teardown clears the pending TTS queue, queued playback requests settle instead of hanging callers awaiting playback completion. ### TTS examples ```json5 { messages: { tts: { provider: "openai", providers: { openai: { voice: "alloy" }, }, }, }, } ``` ```json5 { plugins: { entries: { "voice-call": { config: { tts: { provider: "elevenlabs", providers: { elevenlabs: { apiKey: "elevenlabs_key", voiceId: "pMsXgVXv3BLzUgSXRplE", modelId: "eleven_multilingual_v2", }, }, }, }, }, }, }, } ``` ```json5 { plugins: { entries: { "voice-call": { config: { tts: { providers: { openai: { model: "gpt-4o-mini-tts", voice: "marin", }, }, }, }, }, }, }, } ``` ## Inbound calls Inbound policy defaults to `disabled`. To enable inbound calls, set: ```json5 { inboundPolicy: "allowlist", allowFrom: ["+15550001234"], inboundGreeting: "Hello! How can I help?", } ``` `inboundPolicy: "allowlist"` is a low-assurance caller-ID screen. The plugin normalizes the provider-supplied `From` value and compares it to `allowFrom`. Webhook verification authenticates provider delivery and payload integrity, but it does **not** prove PSTN/VoIP caller-number ownership. Treat `allowFrom` as caller-ID filtering, not strong caller identity. Auto-responses use the agent system. Tune with `responseModel`, `responseSystemPrompt`, and `responseTimeoutMs`. ### Spoken output contract For auto-responses, Voice Call appends a strict spoken-output contract to the system prompt: ```text {"spoken":"..."} ``` Voice Call extracts speech text defensively: - Ignores payloads marked as reasoning/error content. - Parses direct JSON, fenced JSON, or inline `"spoken"` keys. - Falls back to plain text and removes likely planning/meta lead-in paragraphs. This keeps spoken playback focused on caller-facing text and avoids leaking planning text into audio. ### Conversation startup behavior For outbound `conversation` calls, first-message handling is tied to live playback state: - Barge-in queue clear and auto-response are suppressed only while the initial greeting is actively speaking. - If initial playback fails, the call returns to `listening` and the initial message remains queued for retry. - Initial playback for Twilio streaming starts on stream connect without extra delay. - Barge-in aborts active playback and clears queued-but-not-yet-playing Twilio TTS entries. Cleared entries resolve as skipped, so follow-up response logic can continue without waiting on audio that will never play. - Realtime voice conversations use the realtime stream's own opening turn. Voice Call does **not** post a legacy `` TwiML update for that initial message, so outbound `` sessions stay attached. ### Twilio stream disconnect grace When a Twilio media stream disconnects, Voice Call waits **2000 ms** before auto-ending the call: - If the stream reconnects during that window, auto-end is canceled. - If no stream re-registers after the grace period, the call is ended to prevent stuck active calls. ## Stale call reaper Use `staleCallReaperSeconds` to end calls that never receive a terminal webhook (for example, notify-mode calls that never complete). The default is `0` (disabled). Recommended ranges: - **Production:** `120`–`300` seconds for notify-style flows. - Keep this value **higher than `maxDurationSeconds`** so normal calls can finish. A good starting point is `maxDurationSeconds + 30–60` seconds. ```json5 { plugins: { entries: { "voice-call": { config: { maxDurationSeconds: 300, staleCallReaperSeconds: 360, }, }, }, }, } ``` ## Webhook security When a proxy or tunnel sits in front of the Gateway, the plugin reconstructs the public URL for signature verification. These options control which forwarded headers are trusted: Allowlist hosts from forwarding headers. Trust forwarded headers without an allowlist. Only trust forwarded headers when the request remote IP matches the list. Additional protections: - Webhook **replay protection** is enabled for Twilio and Plivo. Replayed valid webhook requests are acknowledged but skipped for side effects. - Twilio conversation turns include a per-turn token in `` callbacks, so stale/replayed speech callbacks cannot satisfy a newer pending transcript turn. - Unauthenticated webhook requests are rejected before body reads when the provider's required signature headers are missing. - The voice-call webhook uses the shared pre-auth body profile (64 KB / 5 seconds) plus a per-IP in-flight cap before signature verification. Example with a stable public host: ```json5 { plugins: { entries: { "voice-call": { config: { publicUrl: "https://voice.example.com/voice/webhook", webhookSecurity: { allowedHosts: ["voice.example.com"], }, }, }, }, }, } ``` ## CLI ```bash openclaw voicecall call --to "+15555550123" --message "Hello from OpenClaw" openclaw voicecall start --to "+15555550123" # alias for call openclaw voicecall continue --call-id --message "Any questions?" openclaw voicecall speak --call-id --message "One moment" openclaw voicecall dtmf --call-id --digits "ww123456#" openclaw voicecall end --call-id openclaw voicecall status --call-id openclaw voicecall tail openclaw voicecall latency # summarize turn latency from logs openclaw voicecall expose --mode funnel ``` `latency` reads `calls.jsonl` from the default voice-call storage path. Use `--file ` to point at a different log and `--last ` to limit analysis to the last N records (default 200). Output includes p50/p90/p99 for turn latency and listen-wait times. ## Agent tool Tool name: `voice_call`. | Action | Args | | --------------- | ------------------------- | | `initiate_call` | `message`, `to?`, `mode?` | | `continue_call` | `callId`, `message` | | `speak_to_user` | `callId`, `message` | | `send_dtmf` | `callId`, `digits` | | `end_call` | `callId` | | `get_status` | `callId` | This repo ships a matching skill doc at `skills/voice-call/SKILL.md`. ## Gateway RPC | Method | Args | | -------------------- | ------------------------- | | `voicecall.initiate` | `to?`, `message`, `mode?` | | `voicecall.continue` | `callId`, `message` | | `voicecall.speak` | `callId`, `message` | | `voicecall.dtmf` | `callId`, `digits` | | `voicecall.end` | `callId` | | `voicecall.status` | `callId` | ## Related - [Talk mode](/nodes/talk) - [Text-to-speech](/tools/tts) - [Voice wake](/nodes/voicewake)