fix(cli): streamline local model probes

2026-04-30 14:02:56 +08:00 · 2026-04-27 23:02:26 +01:00
parent d7dcd0e21e
commit 42dddbbe78
14 changed files with 605 additions and 56 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -16,6 +16,7 @@ Docs: https://docs.openclaw.ai

 ### Fixes

+- CLI/Ollama: run local `infer model run` through the lean provider completion path and skip global model discovery for one-shot local probes, so Ollama smoke tests no longer pay full chat-agent/tool startup cost or hang before the native `/api/chat` request. Fixes #72851. Thanks @TotalRes2020.
 - Channels/commands: make generated `/dock-*` commands switch the active session reply route through `session.identityLinks` instead of falling through to normal chat. Fixes #69206; carries forward #73033. Thanks @clawbones and @michaelatamuk.
 - Providers/Cloudflare AI Gateway: strip assistant prefill turns from Anthropic Messages payloads when thinking is enabled, so Claude requests through Cloudflare AI Gateway no longer fail Anthropic conversation-ending validation. Fixes #72905; carries forward #73005. Thanks @AaronFaby and @sahilsatralkar.
 - Gateway/startup: keep primary-model startup prewarm on scoped metadata preparation, let native approval bootstraps retry outside channel startup, and skip the global hook runner when no `gateway_start` hook is registered, so clean post-ready sidecar work stays off the critical path. Refs #72846. Thanks @RayWoo, @livekm0309, and @mrz1836.
--- a/docs/cli/infer.md
+++ b/docs/cli/infer.md
@@ -130,7 +130,8 @@ This table maps common inference tasks to the corresponding infer command.
 - Stateless execution commands default to local.
 - Gateway-managed state commands default to gateway.
 - The normal local path does not require the gateway to be running.
- `model run` is one-shot. MCP servers opened through the agent runtime for that command are retired after the reply for both local and `--gateway` execution, so repeated scripted invocations do not keep stdio MCP child processes alive.
+- Local `model run` is a lean one-shot provider completion. It resolves the configured agent model and auth, but does not start a chat-agent turn, load tools, or open bundled MCP servers.
+- `model run --gateway` still uses the Gateway agent runtime so it can exercise the same routed runtime path as a normal Gateway-backed turn. MCP servers opened through that runtime are retired after the reply, so repeated scripted invocations do not keep stdio MCP child processes alive.

 ## Model

@@ -143,10 +144,22 @@ openclaw infer model providers --json
 openclaw infer model inspect --name gpt-5.5 --json
 ```

+Use full `<provider/model>` refs to smoke-test a specific provider without
+starting the Gateway or loading the full agent tool surface:
+
+```bash
+openclaw infer model run --local --model anthropic/claude-sonnet-4-6 --prompt "Reply with exactly: pong" --json
+openclaw infer model run --local --model cerebras/zai-glm-4.7 --prompt "Reply with exactly: pong" --json
+openclaw infer model run --local --model google/gemini-2.5-flash --prompt "Reply with exactly: pong" --json
+openclaw infer model run --local --model groq/llama-3.1-8b-instant --prompt "Reply with exactly: pong" --json
+openclaw infer model run --local --model mistral/mistral-small-latest --prompt "Reply with exactly: pong" --json
+openclaw infer model run --local --model openai/gpt-4.1 --prompt "Reply with exactly: pong" --json
+```
+
 Notes:

- `model run` reuses the agent runtime so provider/model overrides behave like normal agent execution.
- Because `model run` is intended for headless automation, it does not retain per-session bundled MCP runtimes after the command finishes.
+- Local `model run` is the narrowest CLI smoke for provider/model/auth health because it sends only the supplied prompt to the selected model.
+- Use `model run --gateway` when you need to test Gateway routing, agent-runtime setup, or Gateway-managed provider state instead of the lean local completion path.
 - `model auth login`, `model auth logout`, and `model auth status` manage saved provider auth state.

 ## Image
--- a/docs/gateway/local-models.md
+++ b/docs/gateway/local-models.md
@@ -239,14 +239,20 @@ Compatibility notes for stricter OpenAI-compatible backends:
  ```

 - Some smaller or stricter local backends are unstable with OpenClaw's full
-  agent-runtime prompt shape, especially when tool schemas are included. If the
-  backend works for tiny direct `/v1/chat/completions` calls but fails on normal
-  OpenClaw agent turns, first try
+  agent-runtime prompt shape, especially when tool schemas are included. First
+  verify the provider path with the lean local probe:
+
+  ```bash
+  openclaw infer model run --local --model <provider/model> --prompt "Reply with exactly: pong" --json
+  ```
+
+  If that succeeds but normal OpenClaw agent turns fail, first try
  `agents.defaults.experimental.localModelLean: true` to drop heavyweight
  default tools like `browser`, `cron`, and `message`; this is an experimental
  flag, not a stable default-mode setting. See
  [Experimental Features](/concepts/experimental-features). If that still fails, try
  `models.providers.<provider>.models[].compat.supportsTools: false`.
+
 - If the backend still fails only on larger OpenClaw runs, the remaining issue
  is usually upstream model/server capacity or a backend bug, not OpenClaw's
  transport layer.
@@ -264,10 +270,11 @@ Compatibility notes for stricter OpenAI-compatible backends:
 - Context errors? Lower `contextWindow` or raise your server limit.
 - OpenAI-compatible server returns `messages[].content ... expected a string`?
  Add `compat.requiresStringContent: true` on that model entry.
- Direct tiny `/v1/chat/completions` calls work, but `openclaw infer model run`
-  fails on Gemma or another local model? Disable tool schemas first with
-  `compat.supportsTools: false`, then retest. If the server still crashes only
-  on larger OpenClaw prompts, treat it as an upstream server/model limitation.
+- Direct tiny `/v1/chat/completions` calls work, but `openclaw infer model run --local`
+  fails on Gemma or another local model? Check the provider URL, model ref, auth
+  marker, and server logs first; local `model run` does not include agent tools.
+  If local `model run` succeeds but larger agent turns fail, reduce the agent
+  tool surface with `localModelLean` or `compat.supportsTools: false`.
 - Tool calls show up as raw JSON/XML/ReAct text, or the provider returns an
  empty `tool_calls` array? Do not add a proxy that blindly converts assistant
  text into tool execution. Fix the server chat template/parser first. If the
--- a/docs/providers/ollama.md
+++ b/docs/providers/ollama.md
@@ -185,7 +185,7 @@ When you set `OLLAMA_API_KEY` (or an auth profile) and **do not** define `models
 | Token limits         | Sets `maxTokens` to the default Ollama max-token cap used by OpenClaw                                                                                               |
 | Costs                | Sets all costs to `0`                                                                                                                                               |

-This avoids manual model entries while keeping the catalog aligned with the local Ollama instance.
+This avoids manual model entries while keeping the catalog aligned with the local Ollama instance. You can use a full ref such as `ollama/<pulled-model>:latest` in local `infer model run`; OpenClaw resolves that installed model from Ollama's live catalog without requiring a hand-written `models.json` entry.

 ```bash
 # See what models are available
@@ -193,6 +193,31 @@ ollama list
 openclaw models list
 ```

+For a narrow text-generation smoke test that avoids the full agent tool surface,
+use local `infer model run` with a full Ollama model ref:
+
+```bash
+OLLAMA_API_KEY=ollama-local \
+  openclaw infer model run \
+    --local \
+    --model ollama/llama3.2:latest \
+    --prompt "Reply with exactly: pong" \
+    --json
+```
+
+That path still uses OpenClaw's configured provider, auth, and native Ollama
+transport, but it does not start a chat-agent turn or load MCP/tool context. If
+this succeeds while normal agent replies fail, troubleshoot the model's agent
+prompt/tool capacity next.
+
+Live-verify the local text path, native stream path, and embeddings against
+local Ollama with:
+
+```bash
+OPENCLAW_LIVE_TEST=1 OPENCLAW_LIVE_OLLAMA=1 OPENCLAW_LIVE_OLLAMA_WEB_SEARCH=0 \
+  pnpm test:live -- extensions/ollama/ollama.live.test.ts
+```
+
 To add a new model, simply pull it with Ollama:

 ```bash
--- a/extensions/ollama/index.test.ts
+++ b/extensions/ollama/index.test.ts
@@ -369,6 +369,57 @@ describe("ollama plugin", () => {
    });
  });

+  it("resolves dynamic local models from Ollama without generating PI models.json", async () => {
+    const provider = registerProvider();
+    const previous = process.env.OLLAMA_API_KEY;
+    process.env.OLLAMA_API_KEY = "ollama-local";
+    buildOllamaProviderMock.mockResolvedValueOnce({
+      baseUrl: "http://127.0.0.1:11434",
+      api: "ollama",
+      models: [
+        {
+          id: "llama3.2:latest",
+          name: "llama3.2:latest",
+          reasoning: false,
+          input: ["text"],
+          cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
+          contextWindow: 8192,
+          maxTokens: 2048,
+        },
+      ],
+    });
+
+    try {
+      await provider.prepareDynamicModel?.({
+        config: {},
+        provider: "ollama",
+        modelId: "llama3.2:latest",
+        modelRegistry: { find: vi.fn(() => null) },
+      } as never);
+
+      expect(
+        provider.resolveDynamicModel?.({
+          config: {},
+          provider: "ollama",
+          modelId: "llama3.2:latest",
+          modelRegistry: { find: vi.fn(() => null) },
+        } as never),
+      ).toMatchObject({
+        provider: "ollama",
+        id: "llama3.2:latest",
+        api: "ollama",
+        baseUrl: "http://127.0.0.1:11434",
+      });
+      expect(buildOllamaProviderMock).toHaveBeenCalledWith(undefined, { quiet: true });
+    } finally {
+      if (previous === undefined) {
+        delete process.env.OLLAMA_API_KEY;
+      } else {
+        process.env.OLLAMA_API_KEY = previous;
+      }
+    }
+  });
+
  it("skips implicit localhost discovery when a custom remote Ollama provider is configured", async () => {
    const provider = registerProvider();

--- a/extensions/ollama/index.ts
+++ b/extensions/ollama/index.ts
@@ -7,8 +7,13 @@ import {
  type ProviderAuthMethodNonInteractiveContext,
  type ProviderAuthResult,
  type ProviderDiscoveryContext,
+  type ProviderRuntimeModel,
 } from "openclaw/plugin-sdk/plugin-entry";
 import { buildApiKeyCredential } from "openclaw/plugin-sdk/provider-auth";
+import type {
+  ModelDefinitionConfig,
+  ModelProviderConfig,
+} from "openclaw/plugin-sdk/provider-model-shared";
 import {
  buildOpenAICompatibleReplayPolicy,
  OPENAI_COMPATIBLE_REPLAY_HOOKS,
@@ -57,6 +62,44 @@ function usesOllamaOpenAICompatTransport(model: {
  );
 }

+const dynamicModelCache = new Map<string, ProviderRuntimeModel[]>();
+
+function buildDynamicCacheKey(provider: string, baseUrl: string | undefined): string {
+  return `${provider}\0${baseUrl ?? ""}`;
+}
+
+function hasOllamaDiscoverySignal(providerConfig: ModelProviderConfig | undefined): boolean {
+  return (
+    Boolean(process.env.OLLAMA_API_KEY?.trim()) ||
+    shouldUseSyntheticOllamaAuth(providerConfig) ||
+    Boolean(providerConfig?.apiKey)
+  );
+}
+
+function toDynamicOllamaModel(params: {
+  provider: string;
+  providerConfig: ModelProviderConfig;
+  model: ModelDefinitionConfig;
+}): ProviderRuntimeModel {
+  const input = (params.model.input ?? ["text"]).filter(
+    (value): value is "text" | "image" => value === "text" || value === "image",
+  );
+  return {
+    id: params.model.id,
+    name: params.model.name ?? params.model.id,
+    provider: params.provider,
+    api: "ollama",
+    baseUrl: readProviderBaseUrl(params.providerConfig) ?? "",
+    reasoning: params.model.reasoning ?? false,
+    input: input.length > 0 ? input : ["text"],
+    cost: params.model.cost ?? { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
+    contextWindow: params.model.contextWindow ?? 8192,
+    maxTokens: params.model.maxTokens ?? 8192,
+    ...(params.model.compat ? { compat: params.model.compat as never } : {}),
+    ...(params.model.params ? { params: params.model.params } : {}),
+  };
+}
+
 export default definePluginEntry({
  id: "ollama",
  name: "Ollama Provider",
@@ -215,6 +258,36 @@ export default definePluginEntry({
      },
      shouldDeferSyntheticProfileAuth: ({ resolvedApiKey }) =>
        resolvedApiKey?.trim() === OLLAMA_DEFAULT_API_KEY,
+      prepareDynamicModel: async (ctx) => {
+        const providerConfig = resolveConfiguredOllamaProviderConfig({
+          config: ctx.config,
+          providerId: ctx.provider,
+        });
+        if (!hasOllamaDiscoverySignal(providerConfig)) {
+          return;
+        }
+        const baseUrl = readProviderBaseUrl(providerConfig);
+        const provider = await buildOllamaProvider(baseUrl, { quiet: true });
+        dynamicModelCache.set(
+          buildDynamicCacheKey(ctx.provider, baseUrl),
+          (provider.models ?? []).map((model) =>
+            toDynamicOllamaModel({
+              provider: ctx.provider,
+              providerConfig: provider,
+              model,
+            }),
+          ),
+        );
+      },
+      resolveDynamicModel: (ctx) => {
+        const providerConfig = resolveConfiguredOllamaProviderConfig({
+          config: ctx.config,
+          providerId: ctx.provider,
+        });
+        return dynamicModelCache
+          .get(buildDynamicCacheKey(ctx.provider, readProviderBaseUrl(providerConfig)))
+          ?.find((model) => model.id === ctx.modelId);
+      },
      buildUnknownModelHint: () =>
        "Ollama requires authentication to be registered as a provider. " +
        'Set OLLAMA_API_KEY="ollama-local" (any value works) or run "openclaw configure". ' +
--- a/extensions/ollama/ollama.live.test.ts
+++ b/extensions/ollama/ollama.live.test.ts
@@ -1,3 +1,8 @@
+import { spawnSync } from "node:child_process";
+import * as fsSync from "node:fs";
+import fs from "node:fs/promises";
+import os from "node:os";
+import path from "node:path";
 import { describe, expect, it } from "vitest";
 import { createOllamaEmbeddingProvider } from "./src/embedding-provider.js";
 import { createOllamaStreamFn } from "./src/stream.js";
@@ -20,7 +25,133 @@ async function collectStreamEvents<T>(stream: AsyncIterable<T>): Promise<T[]> {
  return events;
 }

+async function withTempOpenClawState<T>(run: (paths: { root: string }) => Promise<T>): Promise<T> {
+  const root = await fs.mkdtemp(path.join(os.tmpdir(), "openclaw-ollama-cli-live-"));
+  try {
+    await fs.writeFile(
+      path.join(root, "openclaw.json"),
+      JSON.stringify(
+        {
+          models: {
+            providers: {
+              ollama: {
+                api: "ollama",
+                baseUrl: OLLAMA_BASE_URL,
+                apiKey: "ollama-local",
+                models: [],
+              },
+            },
+          },
+        },
+        null,
+        2,
+      ),
+    );
+    return await run({ root });
+  } finally {
+    await fs.rm(root, { recursive: true, force: true });
+  }
+}
+
+async function runOpenClawCli(args: string[], env: NodeJS.ProcessEnv) {
+  const outputRoot = fsSync.mkdtempSync(path.join(os.tmpdir(), "openclaw-ollama-cli-output-"));
+  const stdoutPath = path.join(outputRoot, "stdout.txt");
+  const stderrPath = path.join(outputRoot, "stderr.txt");
+  const stdoutFd = fsSync.openSync(stdoutPath, "w");
+  const stderrFd = fsSync.openSync(stderrPath, "w");
+  let stdoutClosed = false;
+  let stderrClosed = false;
+  try {
+    const result = spawnSync(process.execPath, ["openclaw.mjs", ...args], {
+      cwd: process.cwd(),
+      env,
+      timeout: 90_000,
+      stdio: ["ignore", stdoutFd, stderrFd],
+    });
+    fsSync.closeSync(stdoutFd);
+    stdoutClosed = true;
+    fsSync.closeSync(stderrFd);
+    stderrClosed = true;
+    return {
+      exitCode: result.status ?? (result.error ? 1 : 0),
+      stdout: fsSync.readFileSync(stdoutPath, "utf8"),
+      stderr: fsSync.readFileSync(stderrPath, "utf8"),
+    };
+  } finally {
+    if (!stdoutClosed) {
+      fsSync.closeSync(stdoutFd);
+    }
+    if (!stderrClosed) {
+      fsSync.closeSync(stderrFd);
+    }
+    fsSync.rmSync(outputRoot, { recursive: true, force: true });
+  }
+}
+
+function parseJsonEnvelope(stdout: string): Record<string, unknown> {
+  const trimmed = stdout.trim();
+  const jsonStart = trimmed.lastIndexOf("\n{");
+  const rawJson = jsonStart >= 0 ? trimmed.slice(jsonStart + 1) : trimmed;
+  return JSON.parse(rawJson) as Record<string, unknown>;
+}
+
+function buildCliEnv(root: string): NodeJS.ProcessEnv {
+  return {
+    PATH: process.env.PATH,
+    HOME: process.env.HOME,
+    USER: process.env.USER,
+    TMPDIR: process.env.TMPDIR,
+    NODE_PATH: process.env.NODE_PATH,
+    NODE_OPTIONS: process.env.NODE_OPTIONS,
+    OPENCLAW_LIVE_TEST: "1",
+    OPENCLAW_LIVE_OLLAMA: "1",
+    OPENCLAW_LIVE_OLLAMA_WEB_SEARCH: "0",
+    OPENCLAW_STATE_DIR: path.join(root, "state"),
+    OPENCLAW_CONFIG_PATH: path.join(root, "openclaw.json"),
+    OPENCLAW_NO_RESPAWN: "1",
+    OPENCLAW_TEST_FAST: "1",
+    OLLAMA_API_KEY: "ollama-local",
+  };
+}
+
 describe.skipIf(!LIVE)("ollama live", () => {
+  it("runs infer model run through the local CLI path without PI model discovery", async () => {
+    await withTempOpenClawState(async ({ root }) => {
+      const result = await runOpenClawCli(
+        [
+          "infer",
+          "model",
+          "run",
+          "--local",
+          "--model",
+          `ollama/${CHAT_MODEL}`,
+          "--prompt",
+          "Reply with exactly one word: pong",
+          "--json",
+        ],
+        buildCliEnv(root),
+      );
+
+      expect(result.exitCode).toBe(0);
+      expect(result.stderr).not.toContain("[agents/auth-profiles]");
+      expect(result.stdout.trim(), result.stderr).not.toHaveLength(0);
+      const payload = parseJsonEnvelope(result.stdout) as {
+        ok?: boolean;
+        transport?: string;
+        provider?: string;
+        model?: string;
+        outputs?: Array<{ text?: string }>;
+      };
+      expect(payload).toMatchObject({
+        ok: true,
+        transport: "local",
+        provider: "ollama",
+        model: CHAT_MODEL,
+      });
+      expect(payload.outputs?.[0]?.text?.trim().length ?? 0).toBeGreaterThan(0);
+    });
+  }, 120_000);
+
  it("runs native chat with a custom provider prefix and normalized tool schemas", async () => {
    const streamFn = createOllamaStreamFn(OLLAMA_BASE_URL);
    let payload:
--- a/src/agents/model-auth.test.ts
+++ b/src/agents/model-auth.test.ts
@@ -15,6 +15,9 @@ vi.mock("../plugins/plugin-registry.js", () => ({
      {
        origin: "bundled",
        nonSecretAuthMarkers: ["gcp-vertex-credentials", "ollama-local"],
+        providerAuthEnvVars: {
+          ollama: ["OLLAMA_API_KEY"],
+        },
      },
    ],
  }),
@@ -163,6 +166,20 @@ async function withoutEnv<T>(key: string, fn: () => Promise<T>): Promise<T> {
  }
 }

+async function withEnv<T>(key: string, value: string, fn: () => Promise<T>): Promise<T> {
+  const previous = process.env[key];
+  process.env[key] = value;
+  try {
+    return await fn();
+  } finally {
+    if (previous === undefined) {
+      delete process.env[key];
+    } else {
+      process.env[key] = previous;
+    }
+  }
+}
+
 function createCustomProviderConfig(
  baseUrl: string,
  modelId = "llama3",
@@ -809,6 +826,30 @@ describe("resolveApiKeyForProvider", () => {
      mode: "api-key",
    });
  });
+
+  it("prefers non-secret local env markers over ambient profiles", async () => {
+    const resolved = await withEnv("OLLAMA_API_KEY", "ollama-local", () =>
+      resolveApiKeyForProvider({
+        provider: "ollama",
+        store: {
+          version: 1,
+          profiles: {
+            "ollama:default": {
+              type: "api_key",
+              provider: "ollama",
+              key: "ollama-cloud-profile", // pragma: allowlist secret
+            },
+          },
+        },
+      }),
+    );
+
+    expect(resolved).toMatchObject({
+      apiKey: "ollama-local",
+      mode: "api-key",
+    });
+    expect(resolved.source).toContain("OLLAMA_API_KEY");
+  });
 });

 describe("resolveApiKeyForProvider – synthetic local auth for custom providers", () => {
--- a/src/agents/model-auth.ts
+++ b/src/agents/model-auth.ts
@@ -523,6 +523,22 @@ export async function resolveApiKeyForProvider(params: {
  }

  const providerConfig = resolveProviderConfig(cfg, provider);
+  const configuredLocalKey = resolveUsableCustomProviderApiKey({ cfg, provider });
+  if (configuredLocalKey && isNonSecretApiKeyMarker(configuredLocalKey.apiKey)) {
+    return {
+      apiKey: configuredLocalKey.apiKey,
+      source: configuredLocalKey.source,
+      mode: "api-key",
+    };
+  }
+  const localMarkerEnv = resolveEnvApiKey(provider);
+  if (localMarkerEnv && isNonSecretApiKeyMarker(localMarkerEnv.apiKey)) {
+    return {
+      apiKey: localMarkerEnv.apiKey,
+      source: localMarkerEnv.source,
+      mode: "api-key",
+    };
+  }
  const store = params.store ?? ensureAuthProfileStore(params.agentDir);
  const order = resolveAuthProfileOrder({
    cfg,
--- a/src/agents/pi-embedded-runner/model.test.ts
+++ b/src/agents/pi-embedded-runner/model.test.ts
@@ -1,5 +1,5 @@
 import { beforeEach, describe, expect, it, vi } from "vitest";
-import { discoverModels } from "../pi-model-discovery.js";
+import { discoverAuthStorage, discoverModels } from "../pi-model-discovery.js";
 import { createProviderRuntimeTestMock } from "./model.provider-runtime.test-support.js";

 vi.mock("../model-suppression.js", () => ({
@@ -55,6 +55,8 @@ import {

 beforeEach(() => {
  resetMockDiscoverModels(discoverModels);
+  vi.mocked(discoverModels).mockClear();
+  vi.mocked(discoverAuthStorage).mockClear();
  mockGetOpenRouterModelCapabilities.mockReset();
  mockGetOpenRouterModelCapabilities.mockReturnValue(undefined);
  mockLoadOpenRouterModelCapabilities.mockReset();
@@ -110,6 +112,27 @@ function resolveModelAsyncForTest(
 }

 describe("resolveModel", () => {
+  it("skips PI auth and model discovery during dynamic model resolution", async () => {
+    const result = await resolveModelAsync(
+      "openrouter",
+      "openrouter/auto",
+      "/tmp/agent",
+      undefined,
+      {
+        runtimeHooks: createRuntimeHooks(),
+        skipPiDiscovery: true,
+      },
+    );
+
+    expect(result.error).toBeUndefined();
+    expect(result.model).toMatchObject({
+      provider: "openrouter",
+      id: "openrouter/auto",
+    });
+    expect(discoverAuthStorage).not.toHaveBeenCalled();
+    expect(discoverModels).not.toHaveBeenCalled();
+  });
+
  it("defaults model input to text when discovery omits input", () => {
    mockDiscoveredModel(discoverModels, {
      provider: "custom",
--- a/src/agents/simple-completion-runtime.test.ts
+++ b/src/agents/simple-completion-runtime.test.ts
@@ -1,16 +1,29 @@
+import type { Model } from "@mariozechner/pi-ai";
 import { beforeAll, beforeEach, describe, expect, it, vi } from "vitest";

 const hoisted = vi.hoisted(() => ({
  resolveModelMock: vi.fn(),
+  resolveModelAsyncMock: vi.fn(),
  getApiKeyForModelMock: vi.fn(),
  applyLocalNoAuthHeaderOverrideMock: vi.fn(),
  setRuntimeApiKeyMock: vi.fn(),
  resolveCopilotApiTokenMock: vi.fn(),
  prepareProviderRuntimeAuthMock: vi.fn(),
+  prepareModelForSimpleCompletionMock: vi.fn((params: { model: unknown }) => params.model),
+  completeMock: vi.fn(),
+}));
+
+vi.mock("@mariozechner/pi-ai", () => ({
+  complete: hoisted.completeMock,
 }));

 vi.mock("./pi-embedded-runner/model.js", () => ({
  resolveModel: hoisted.resolveModelMock,
+  resolveModelAsync: hoisted.resolveModelAsyncMock,
+}));
+
+vi.mock("./simple-completion-transport.js", () => ({
+  prepareModelForSimpleCompletion: hoisted.prepareModelForSimpleCompletionMock,
 }));

 vi.mock("./model-auth.js", () => ({
@@ -26,21 +39,30 @@ vi.mock("../plugins/provider-runtime.runtime.js", () => ({
  prepareProviderRuntimeAuth: hoisted.prepareProviderRuntimeAuthMock,
 }));

+let completeWithPreparedSimpleCompletionModel: typeof import("./simple-completion-runtime.js").completeWithPreparedSimpleCompletionModel;
 let prepareSimpleCompletionModel: typeof import("./simple-completion-runtime.js").prepareSimpleCompletionModel;

 beforeAll(async () => {
-  ({ prepareSimpleCompletionModel } = await import("./simple-completion-runtime.js"));
+  ({ completeWithPreparedSimpleCompletionModel, prepareSimpleCompletionModel } =
+    await import("./simple-completion-runtime.js"));
 });

 beforeEach(() => {
  hoisted.resolveModelMock.mockReset();
+  hoisted.resolveModelAsyncMock.mockReset();
  hoisted.getApiKeyForModelMock.mockReset();
  hoisted.applyLocalNoAuthHeaderOverrideMock.mockReset();
  hoisted.setRuntimeApiKeyMock.mockReset();
  hoisted.resolveCopilotApiTokenMock.mockReset();
  hoisted.prepareProviderRuntimeAuthMock.mockReset();
+  hoisted.prepareModelForSimpleCompletionMock.mockReset();
+  hoisted.completeMock.mockReset();

  hoisted.applyLocalNoAuthHeaderOverrideMock.mockImplementation((model: unknown) => model);
+  hoisted.prepareModelForSimpleCompletionMock.mockImplementation(
+    (params: { model: unknown }) => params.model,
+  );
+  hoisted.completeMock.mockResolvedValue({ content: [{ type: "text", text: "ok" }] });

  hoisted.resolveModelMock.mockReturnValue({
    model: {
@@ -52,6 +74,9 @@ beforeEach(() => {
    },
    modelRegistry: {},
  });
+  hoisted.resolveModelAsyncMock.mockImplementation((...args: unknown[]) =>
+    Promise.resolve(hoisted.resolveModelMock(...args)),
+  );
  hoisted.getApiKeyForModelMock.mockResolvedValue({
    apiKey: "sk-test",
    source: "env:TEST_API_KEY",
@@ -405,4 +430,86 @@ describe("prepareSimpleCompletionModel", () => {
      }),
    );
  });
+
+  it("can skip Pi model/auth discovery for config-scoped one-shot completions", async () => {
+    hoisted.resolveModelAsyncMock.mockResolvedValueOnce({
+      model: {
+        provider: "ollama",
+        id: "llama3.2:latest",
+      },
+      authStorage: {
+        setRuntimeApiKey: hoisted.setRuntimeApiKeyMock,
+      },
+      modelRegistry: {},
+    });
+    hoisted.getApiKeyForModelMock.mockResolvedValueOnce({
+      apiKey: "ollama-local",
+      source: "models.json (local marker)",
+      mode: "api-key",
+    });
+
+    const result = await prepareSimpleCompletionModel({
+      cfg: undefined,
+      provider: "ollama",
+      modelId: "llama3.2:latest",
+      skipPiDiscovery: true,
+    });
+
+    expect(result).not.toHaveProperty("error");
+    expect(hoisted.resolveModelMock).not.toHaveBeenCalled();
+    expect(hoisted.resolveModelAsyncMock).toHaveBeenCalledWith(
+      "ollama",
+      "llama3.2:latest",
+      undefined,
+      undefined,
+      {
+        skipPiDiscovery: true,
+      },
+    );
+  });
+});
+
+describe("completeWithPreparedSimpleCompletionModel", () => {
+  it("prepares provider-owned stream APIs before running a completion", async () => {
+    const model = {
+      provider: "ollama",
+      id: "llama3.2:latest",
+      name: "llama3.2:latest",
+      api: "ollama",
+      baseUrl: "http://127.0.0.1:11434",
+      reasoning: false,
+      input: ["text"],
+      cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
+      contextWindow: 8192,
+      maxTokens: 1024,
+    } satisfies Model<"ollama">;
+    const preparedModel = {
+      ...model,
+      api: "openclaw-ollama-simple-test",
+    };
+    hoisted.prepareModelForSimpleCompletionMock.mockReturnValueOnce(preparedModel);
+
+    await completeWithPreparedSimpleCompletionModel({
+      model,
+      auth: {
+        apiKey: "ollama-local",
+        source: "models.json (local marker)",
+        mode: "api-key",
+      },
+      context: {
+        messages: [{ role: "user", content: "pong", timestamp: 1 }],
+      },
+    });
+
+    expect(hoisted.prepareModelForSimpleCompletionMock).toHaveBeenCalledWith({ model });
+    expect(hoisted.completeMock).toHaveBeenCalledWith(
+      preparedModel,
+      {
+        messages: [{ role: "user", content: "pong", timestamp: 1 }],
+      },
+      {
+        apiKey: "ollama-local",
+      },
+    );
+  });
 });
--- a/src/agents/simple-completion-runtime.ts
+++ b/src/agents/simple-completion-runtime.ts
@@ -15,7 +15,8 @@ import {
  resolveDefaultModelForAgent,
  resolveModelRefFromString,
 } from "./model-selection.js";
-import { resolveModel } from "./pi-embedded-runner/model.js";
+import { resolveModel, resolveModelAsync } from "./pi-embedded-runner/model.js";
+import { prepareModelForSimpleCompletion } from "./simple-completion-transport.js";

 type SimpleCompletionAuthStorage = {
  setRuntimeApiKey: (provider: string, apiKey: string) => void;
@@ -158,8 +159,13 @@ export async function prepareSimpleCompletionModel(params: {
  profileId?: string;
  preferredProfile?: string;
  allowMissingApiKeyModes?: ReadonlyArray<AllowedMissingApiKeyMode>;
+  skipPiDiscovery?: boolean;
 }): Promise<PreparedSimpleCompletionModel> {
-  const resolved = resolveModel(params.provider, params.modelId, params.agentDir, params.cfg);
+  const resolved = params.skipPiDiscovery
+    ? await resolveModelAsync(params.provider, params.modelId, params.agentDir, params.cfg, {
+        skipPiDiscovery: true,
+      })
+    : resolveModel(params.provider, params.modelId, params.agentDir, params.cfg);
  if (!resolved.model) {
    return {
      error: resolved.error ?? `Unknown model: ${params.provider}/${params.modelId}`,
@@ -233,6 +239,7 @@ export async function prepareSimpleCompletionModelForAgent(params: {
  modelRef?: string;
  preferredProfile?: string;
  allowMissingApiKeyModes?: ReadonlyArray<AllowedMissingApiKeyMode>;
+  skipPiDiscovery?: boolean;
 }): Promise<PreparedSimpleCompletionModelForAgent> {
  const selection = resolveSimpleCompletionSelectionForAgent({
    cfg: params.cfg,
@@ -252,6 +259,7 @@ export async function prepareSimpleCompletionModelForAgent(params: {
    profileId: selection.profileId,
    preferredProfile: params.preferredProfile,
    allowMissingApiKeyModes: params.allowMissingApiKeyModes,
+    skipPiDiscovery: params.skipPiDiscovery,
  });
  if ("error" in prepared) {
    return {
@@ -272,7 +280,8 @@ export async function completeWithPreparedSimpleCompletionModel(params: {
  context: Parameters<typeof complete>[1];
  options?: SimpleCompletionModelOptions;
 }) {
-  return await complete(params.model, params.context, {
+  const completionModel = prepareModelForSimpleCompletion({ model: params.model });
+  return await complete(completionModel, params.context, {
    ...params.options,
    apiKey: params.auth.apiKey,
  });
--- a/src/cli/capability-cli.test.ts
+++ b/src/cli/capability-cli.test.ts
@@ -34,9 +34,25 @@ const mocks = vi.hoisted(() => ({
  ),
  resolveMemorySearchConfig: vi.fn(() => null),
  loadModelCatalog: vi.fn(async () => []),
-  agentCommand: vi.fn(async () => ({
-    payloads: [{ text: "local reply" }],
-    meta: { agentMeta: { provider: "openai", model: "gpt-5.4" } },
+  prepareSimpleCompletionModelForAgent: vi.fn(async () => ({
+    selection: {
+      provider: "openai",
+      modelId: "gpt-5.4",
+      agentDir: "/tmp/agent",
+    },
+    model: {
+      provider: "openai",
+      id: "gpt-5.4",
+      maxTokens: 128,
+    },
+    auth: {
+      apiKey: "sk-test",
+      source: "env:TEST_API_KEY",
+      mode: "api-key",
+    },
+  })),
+  completeWithPreparedSimpleCompletionModel: vi.fn(async () => ({
+    content: [{ type: "text", text: "local reply" }],
  })),
  callGateway: vi.fn(async ({ method }: { method: string }) => {
    if (method === "tts.status") {
@@ -131,11 +147,6 @@ vi.mock("../config/config.js", () => ({
  loadConfig: mocks.loadConfig as typeof import("../config/config.js").loadConfig,
 }));

-vi.mock("../agents/agent-command.js", () => ({
-  agentCommand:
-    mocks.agentCommand as unknown as typeof import("../agents/agent-command.js").agentCommand,
-}));
-
 vi.mock("../agents/agent-scope.js", () => ({
  resolveDefaultAgentId: () => "main",
  resolveAgentDir: () => "/tmp/agent",
@@ -146,6 +157,13 @@ vi.mock("../agents/model-catalog.js", () => ({
    mocks.loadModelCatalog as typeof import("../agents/model-catalog.js").loadModelCatalog,
 }));

+vi.mock("../agents/simple-completion-runtime.js", () => ({
+  prepareSimpleCompletionModelForAgent:
+    mocks.prepareSimpleCompletionModelForAgent as unknown as typeof import("../agents/simple-completion-runtime.js").prepareSimpleCompletionModelForAgent,
+  completeWithPreparedSimpleCompletionModel:
+    mocks.completeWithPreparedSimpleCompletionModel as unknown as typeof import("../agents/simple-completion-runtime.js").completeWithPreparedSimpleCompletionModel,
+}));
+
 vi.mock("../agents/auth-profiles.js", () => ({
  loadAuthProfileStoreForRuntime:
    mocks.loadAuthProfileStoreForRuntime as unknown as typeof import("../agents/auth-profiles.js").loadAuthProfileStoreForRuntime,
@@ -291,7 +309,8 @@ describe("capability cli", () => {
        return store;
      });
    mocks.resolveMemorySearchConfig.mockReset().mockReturnValue(null);
-    mocks.agentCommand.mockClear();
+    mocks.prepareSimpleCompletionModelForAgent.mockClear();
+    mocks.completeWithPreparedSimpleCompletionModel.mockClear();
    mocks.callGateway.mockClear().mockImplementation((async ({ method }: { method: string }) => {
      if (method === "tts.status") {
        return { enabled: true, provider: "openai" };
@@ -362,7 +381,8 @@ describe("capability cli", () => {
      argv: ["capability", "model", "run", "--prompt", "hello", "--json"],
    });

-    expect(mocks.agentCommand).toHaveBeenCalledTimes(1);
+    expect(mocks.prepareSimpleCompletionModelForAgent).toHaveBeenCalledTimes(1);
+    expect(mocks.completeWithPreparedSimpleCompletionModel).toHaveBeenCalledTimes(1);
    expect(mocks.callGateway).not.toHaveBeenCalled();
    expect(mocks.runtime.writeJson).toHaveBeenCalledWith(
      expect.objectContaining({
@@ -372,20 +392,30 @@ describe("capability cli", () => {
    );
  });

-  it("runs local model probes without chat-agent prompt policy or tools", async () => {
+  it("runs local model probes through the lean completion path", async () => {
    await runRegisteredCli({
      register: registerCapabilityCli as (program: Command) => void,
      argv: ["capability", "model", "run", "--prompt", "hello", "--json"],
    });

-    expect(mocks.agentCommand).toHaveBeenCalledWith(
+    expect(mocks.prepareSimpleCompletionModelForAgent).toHaveBeenCalledWith(
      expect.objectContaining({
-        cleanupBundleMcpOnRunEnd: true,
-        modelRun: true,
-        promptMode: "none",
+        agentId: "main",
+        allowMissingApiKeyModes: ["aws-sdk"],
+        skipPiDiscovery: true,
+      }),
+    );
+    expect(mocks.completeWithPreparedSimpleCompletionModel).toHaveBeenCalledWith(
+      expect.objectContaining({
+        context: {
+          messages: [
+            expect.objectContaining({
+              role: "user",
+              content: "hello",
+            }),
+          ],
+        },
      }),
-      expect.anything(),
-      expect.anything(),
    );
  });

--- a/src/cli/capability-cli.ts
+++ b/src/cli/capability-cli.ts
@@ -4,7 +4,6 @@ import path from "node:path";
 import { Readable } from "node:stream";
 import { pipeline } from "node:stream/promises";
 import type { Command } from "commander";
-import { agentCommand } from "../agents/agent-command.js";
 import { resolveAgentDir, resolveDefaultAgentId } from "../agents/agent-scope.js";
 import {
  listProfilesForProvider,
@@ -13,6 +12,10 @@ import {
 import { updateAuthProfileStoreWithLock } from "../agents/auth-profiles/store.js";
 import { resolveMemorySearchConfig } from "../agents/memory-search.js";
 import { loadModelCatalog } from "../agents/model-catalog.js";
+import {
+  completeWithPreparedSimpleCompletionModel,
+  prepareSimpleCompletionModelForAgent,
+} from "../agents/simple-completion-runtime.js";
 import { getRuntimeConfig } from "../config/config.js";
 import { resolveAgentModelPrimaryValue } from "../config/model-input.js";
 import type { OpenClawConfig } from "../config/types.openclaw.js";
@@ -79,7 +82,6 @@ import {
  runWebSearch,
 } from "../web-search/runtime.js";
 import { runCommandWithRuntime } from "./cli-utils.js";
-import { createDefaultDeps } from "./deps.js";
 import { removeCommandByName } from "./program/command-tree.js";
 import { collectOption } from "./program/helpers.js";

@@ -576,34 +578,54 @@ async function runModelRun(params: {
  const cfg = getRuntimeConfig();
  const agentId = resolveDefaultAgentId(cfg);
  if (params.transport === "local") {
-    const result = await agentCommand(
-      {
-        message: params.prompt,
-        agentId,
-        model: params.model,
-        json: false,
-        modelRun: true,
-        promptMode: "none",
-        cleanupBundleMcpOnRunEnd: true,
+    const prepared = await prepareSimpleCompletionModelForAgent({
+      cfg,
+      agentId,
+      modelRef: params.model,
+      allowMissingApiKeyModes: ["aws-sdk"],
+      skipPiDiscovery: true,
+    });
+    if ("error" in prepared) {
+      throw new Error(prepared.error);
+    }
+    const result = await completeWithPreparedSimpleCompletionModel({
+      model: prepared.model,
+      auth: prepared.auth,
+      context: {
+        messages: [
+          {
+            role: "user",
+            content: params.prompt,
+            timestamp: Date.now(),
+          },
+        ],
      },
-      {
-        ...defaultRuntime,
-        log: () => {},
+      options: {
+        maxTokens:
+          typeof prepared.model.maxTokens === "number" && Number.isFinite(prepared.model.maxTokens)
+            ? prepared.model.maxTokens
+            : undefined,
      },
-      createDefaultDeps(),
-    );
+    });
+    const text = result.content
+      .map((block) => (block.type === "text" ? block.text : ""))
+      .join("")
+      .trim();
    return {
      ok: true,
      capability: "model.run",
      transport: "local" as const,
-      provider: result?.meta?.agentMeta?.provider,
-      model: result?.meta?.agentMeta?.model,
+      provider: prepared.selection.provider,
+      model: prepared.selection.modelId,
      attempts: [],
-      outputs: (result?.payloads ?? []).map((payload) => ({
-        text: payload.text,
-        mediaUrl: payload.mediaUrl,
-        mediaUrls: payload.mediaUrls,
-      })),
+      outputs: text
+        ? [
+            {
+              text,
+              mediaUrl: null,
+            },
+          ]
+        : [],
    } satisfies CapabilityEnvelope;
  }