fix(cron): preserve job model fallbacks

2026-05-01 06:36:23 +08:00 · 2026-04-28 00:02:14 +01:00
parent da6d8940a0
commit ff2b2e769f
8 changed files with 131 additions and 9 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -21,6 +21,7 @@ Docs: https://docs.openclaw.ai
 - Control UI/Agents: redact tool-call args, partial/final results, derived exec output, and configured custom secret patterns before streaming tool events to the Control UI, so tool output cannot expose provider or channel credentials. Fixes #72283. (#72319) Thanks @volcano303 and @BunsDev.
 - Agents/sessions: keep `sessions_history` recall redaction enabled even when general log redaction is disabled, and clarify that safety-boundary UI/tool/diagnostic payloads still redact independently of `logging.redactSensitive`. Carries forward #72319. Thanks @volcano303 and @BunsDev.
 - Providers/Codex: pass agent and workspace directories into provider stream wrappers so Codex native `web_search` activation can evaluate the correct auth context, and smoke-test the built status-message runtime by resolving the emitted bundle name. Carries forward #67843; refs #65909. Thanks @neilofneils404.
+- Cron/models: keep `payload.model` as a per-job primary that can use configured fallbacks, while still letting `payload.fallbacks: []` make cron runs strict and avoid hidden agent-primary retries. Refs #73023. Thanks @pavelyortho-cyber.
 - Models/fallbacks: treat user-selected session models as exact choices, so `/model ollama/...` and model-picker switches fail visibly when the selected provider is unreachable instead of answering from an unrelated configured fallback. Fixes #73023. Thanks @pavelyortho-cyber.
 - CLI/model probes: fail local `infer model run` probes when the provider returns no text output, so unreachable local providers and empty completions no longer look like successful smoke tests. Refs #73023. Thanks @pavelyortho-cyber.
 - CLI/Ollama: run local `infer model run` through the lean provider completion path and skip global model discovery for one-shot local probes, so Ollama smoke tests no longer pay full chat-agent/tool startup cost or hang before the native `/api/chat` request. Fixes #72851. Thanks @TotalRes2020.
--- a/docs/automation/cron-jobs.md
+++ b/docs/automation/cron-jobs.md
@@ -129,7 +129,9 @@ This fires ~5–6 times per month instead of 0–1 times per month. OpenClaw use
  Restrict which tools the job can use, for example `--tools exec,read`.
 </ParamField>

-`--model` uses the selected allowed model for that job. If the requested model is not allowed, cron logs a warning and falls back to the job's agent/default model selection instead. Configured fallback chains still apply, but a plain model override with no explicit per-job fallback list no longer appends the agent primary as a hidden extra retry target.
+`--model` uses the selected allowed model as that job's primary model. It is not the same as a chat-session `/model` override: configured fallback chains still apply when the job primary fails. If the requested model is not allowed, cron logs a warning and falls back to the job's agent/default model selection instead.
+
+Cron jobs can also carry payload-level `fallbacks`. When present, that list replaces the configured fallback chain for the job. Use `fallbacks: []` in the job payload/API when you want a strict cron run that tries only the selected model. If a job has `--model` but neither payload nor configured fallbacks, OpenClaw passes an explicit empty fallback override so the agent primary is not appended as a hidden extra retry target.

 Model-selection precedence for isolated jobs is:

@@ -257,7 +259,7 @@ Query-string tokens are rejected.
      -d '{"message":"Summarize inbox","name":"Email","model":"openai/gpt-5.4"}'
    ```

-    Fields: `message` (required), `name`, `agentId`, `wakeMode`, `deliver`, `channel`, `to`, `model`, `thinking`, `timeoutSeconds`.
+    Fields: `message` (required), `name`, `agentId`, `wakeMode`, `deliver`, `channel`, `to`, `model`, `fallbacks`, `thinking`, `timeoutSeconds`.

  </Accordion>
  <Accordion title="Mapped hooks (POST /hooks/<name>)">
@@ -375,7 +377,9 @@ Model override note:
 - `openclaw cron add|edit --model ...` changes the job's selected model.
 - If the model is allowed, that exact provider/model reaches the isolated agent run.
 - If it is not allowed, cron warns and falls back to the job's agent/default model selection.
- Configured fallback chains still apply, but a plain `--model` override with no explicit per-job fallback list no longer falls through to the agent primary as a silent extra retry target.
+- Configured fallback chains still apply because cron `--model` is a job primary, not a session `/model` override.
+- Payload `fallbacks` replaces configured fallbacks for that job; `fallbacks: []` disables fallback and makes the run strict.
+- A plain `--model` with no explicit or configured fallback list does not fall through to the agent primary as a silent extra retry target.
  </Note>

 ## Configuration
--- a/docs/cli/cron.md
+++ b/docs/cli/cron.md
@@ -98,9 +98,16 @@ Note: cron job definitions live in `jobs.json`, while pending runtime state live
 `cron add|edit --model <ref>` selects an allowed model for the job.

 <Warning>
-If the model is not allowed, cron warns and falls back to the job's agent or default model selection. Configured fallback chains still apply, but a plain model override with no explicit per-job fallback list no longer appends the agent primary as a hidden extra retry target.
+If the model is not allowed, cron warns and falls back to the job's agent or default model selection.
 </Warning>

+Cron `--model` is a **job primary**, not a chat-session `/model` override. That means:
+
+- Configured model fallbacks still apply when the selected job model fails.
+- Per-job payload `fallbacks` replaces the configured fallback list when present.
+- An empty per-job fallback list (`fallbacks: []` in the job payload/API) makes the cron run strict.
+- When a job has `--model` but no fallback list is configured, OpenClaw passes an explicit empty fallback override so the agent primary is not appended as a hidden retry target.
+
 ### Isolated cron model precedence

 Isolated cron resolves the active model in this order:
--- a/docs/concepts/model-failover.md
+++ b/docs/concepts/model-failover.md
@@ -24,7 +24,7 @@ For a normal text run, OpenClaw evaluates candidates in this order:
    Resolve the active session model and auth-profile preference.
  </Step>
  <Step title="Build candidate chain">
-    Build the model candidate chain from the configured model or an auto-selected fallback model, then `agents.defaults.model.fallbacks` in order. Explicit user model selections are strict and do not silently fall back to a different model.
+    Build the model candidate chain from the current model selection and the fallback policy for that selection source. Configured defaults, cron job primaries, and auto-selected fallback models can use configured fallbacks; explicit user session selections are strict.
  </Step>
  <Step title="Try the current provider">
    Try the current provider with auth-profile rotation/cooldown rules.
@@ -54,6 +54,16 @@ This is intentionally narrower than "save and restore the whole session". The re

 That prevents a failed fallback retry from overwriting newer unrelated session mutations such as manual `/model` changes or session rotation updates that happened while the attempt was running.

+## Selection source policy
+
+OpenClaw separates the selected provider/model from why it was selected. That source controls whether the fallback chain is allowed:
+
+- **Configured default**: `agents.defaults.model.primary` (or an agent-specific primary) uses the configured fallback chain.
+- **Auto fallback override**: a runtime fallback writes `providerOverride`, `modelOverride`, and `modelOverrideSource: "auto"` before retrying. That auto override can keep walking the configured fallback chain and is cleared by `/new`, `/reset`, and `sessions.reset`.
+- **User session override**: `/model`, the model picker, `session_status(model=...)`, and `sessions.patch` write `modelOverrideSource: "user"`. That is an exact session selection. If the selected provider/model fails before producing a reply, OpenClaw reports the failure instead of answering from an unrelated configured fallback.
+- **Legacy session override**: older session entries may have `modelOverride` without `modelOverrideSource`. OpenClaw treats those as user overrides so an explicit old selection is not silently converted into fallback behavior.
+- **Cron payload model**: a cron job `payload.model` / `--model` is a job primary, not a user session override. It uses configured fallbacks unless the job provides `payload.fallbacks`; `payload.fallbacks: []` makes the cron run strict.
+
 ## Auth storage (keys + OAuth)

 OpenClaw uses **auth profiles** for both API keys and OAuth tokens.
@@ -207,7 +217,7 @@ If all profiles for a provider fail, OpenClaw moves to the next model in `agents

 Overloaded and rate-limit errors are handled more aggressively than billing cooldowns. By default, OpenClaw allows one same-provider auth-profile retry, then switches to the next configured model fallback without waiting. Provider-busy signals such as `ModelNotReadyException` land in that overloaded bucket. Tune this with `auth.cooldowns.overloadedProfileRotations`, `auth.cooldowns.overloadedBackoffMs`, and `auth.cooldowns.rateLimitedProfileRotations`.

-When a run starts from the configured primary or an auto-selected fallback override, OpenClaw can walk the configured fallback chain. Explicit user selections (for example `/model ollama/qwen3.5:27b`, the model picker, or one-off CLI provider/model overrides) are strict: if that provider/model is unreachable or fails before producing a reply, OpenClaw reports the failure instead of answering from an unrelated fallback.
+When a run starts from the configured primary, a cron job primary, or an auto-selected fallback override, OpenClaw can walk the configured fallback chain. Explicit user selections (for example `/model ollama/qwen3.5:27b`, the model picker, `sessions.patch`, or one-off CLI provider/model overrides) are strict: if that provider/model is unreachable or fails before producing a reply, OpenClaw reports the failure instead of answering from an unrelated fallback.

 ### Candidate chain rules

@@ -219,7 +229,8 @@ OpenClaw builds the candidate list from the currently requested `provider/model`
    - Explicit configured fallbacks are deduplicated but not filtered by the model allowlist. They are treated as explicit operator intent.
    - If the current run is already on a configured fallback in the same provider family, OpenClaw keeps using the full configured chain.
    - If the current run is on a different provider than config and that current model is not already part of the configured fallback chain, OpenClaw does not append unrelated configured fallbacks from another provider.
-    - When the run started from an override, the configured primary is appended at the end so the chain can settle back onto the normal default once earlier candidates are exhausted.
+    - When no explicit fallback override is supplied to the fallback runner, the configured primary is appended at the end so the chain can settle back onto the normal default once earlier candidates are exhausted.
+    - When a caller supplies `fallbacksOverride`, the runner uses exactly the requested model plus that override list. An empty list disables model fallback and prevents the configured primary from being appended as a hidden retry target.
  </Accordion>
 </AccordionGroup>

--- a/docs/concepts/models.md
+++ b/docs/concepts/models.md
@@ -53,6 +53,15 @@ OpenClaw selects models in this order:
  </Accordion>
 </AccordionGroup>

+## Selection source and fallback behavior
+
+The same `provider/model` can mean different things depending on where it came from:
+
+- Configured defaults (`agents.defaults.model.primary` and agent-specific primaries) are the normal starting point and use `agents.defaults.model.fallbacks`.
+- Auto fallback selections are temporary recovery state. They are stored with `modelOverrideSource: "auto"` so later turns can keep using the fallback chain without probing a known-bad primary first.
+- User session selections are exact. `/model`, the model picker, `session_status(model=...)`, and `sessions.patch` store `modelOverrideSource: "user"`; if that selected provider/model is unreachable, OpenClaw fails visibly instead of falling through to another configured model.
+- Cron `--model` / payload `model` is a per-job primary. It still uses configured fallbacks unless the job supplies explicit payload `fallbacks` (use `fallbacks: []` for a strict cron run).
+
 ## Quick model policy

 - Set your primary to the strongest latest-generation model available to you.
@@ -156,7 +165,7 @@ You can switch models for the current session without restarting:
    - If the agent is idle, the next run uses the new model right away.
    - If a run is already active, OpenClaw marks a live switch as pending and only restarts into the new model at a clean retry point.
    - If tool activity or reply output has already started, the pending switch can stay queued until a later retry opportunity or the next user turn.
-    - A user-selected `/model` ref is strict for that session: if the selected provider/model is unreachable, the reply fails visibly instead of silently answering from `agents.defaults.model.fallbacks`.
+    - A user-selected `/model` ref is strict for that session: if the selected provider/model is unreachable, the reply fails visibly instead of silently answering from `agents.defaults.model.fallbacks`. This is different from configured defaults and cron job primaries, which can still use fallback chains.
    - `/model status` is the detailed view (auth candidates and, when configured, provider endpoint `baseUrl` + `api` mode).
  </Accordion>
  <Accordion title="Ref parsing">
--- a/src/cron/isolated-agent/run-fallback-policy.test.ts
+++ b/src/cron/isolated-agent/run-fallback-policy.test.ts
@@ -0,0 +1,86 @@
+import { describe, expect, it } from "vitest";
+import type { OpenClawConfig } from "../../config/types.openclaw.js";
+import type { CronJob } from "../types.js";
+import { resolveCronFallbacksOverride } from "./run-fallback-policy.js";
+
+function makeJob(payload: CronJob["payload"]): CronJob {
+  return {
+    id: "cron-fallback-policy",
+    name: "Cron fallback policy",
+    schedule: { kind: "cron", expr: "0 9 * * *", tz: "UTC" },
+    sessionTarget: "isolated",
+    payload,
+    state: {},
+  } as CronJob;
+}
+
+function makeConfig(fallbacks?: string[]): OpenClawConfig {
+  return {
+    agents: {
+      defaults: {
+        model: {
+          primary: "anthropic/claude-opus-4-6",
+          ...(fallbacks !== undefined ? { fallbacks } : {}),
+        },
+      },
+    },
+  };
+}
+
+describe("resolveCronFallbacksOverride", () => {
+  it("keeps configured fallbacks for cron payload model overrides", () => {
+    expect(
+      resolveCronFallbacksOverride({
+        cfg: makeConfig(["openai/gpt-5.4", "google/gemini-3-pro"]),
+        agentId: "main",
+        job: makeJob({
+          kind: "agentTurn",
+          message: "summarize",
+          model: "google/gemini-2.0-flash",
+        }),
+      }),
+    ).toEqual(["openai/gpt-5.4", "google/gemini-3-pro"]);
+  });
+
+  it("returns an empty override for payload model overrides without configured fallbacks", () => {
+    expect(
+      resolveCronFallbacksOverride({
+        cfg: makeConfig(),
+        agentId: "main",
+        job: makeJob({
+          kind: "agentTurn",
+          message: "summarize",
+          model: "google/gemini-2.0-flash",
+        }),
+      }),
+    ).toEqual([]);
+  });
+
+  it("lets payload fallbacks override the configured fallback policy", () => {
+    expect(
+      resolveCronFallbacksOverride({
+        cfg: makeConfig(["openai/gpt-5.4"]),
+        agentId: "main",
+        job: makeJob({
+          kind: "agentTurn",
+          message: "summarize",
+          model: "google/gemini-2.0-flash",
+          fallbacks: [],
+        }),
+      }),
+    ).toEqual([]);
+  });
+
+  it("leaves the default model path to the fallback runner when no payload model is set", () => {
+    expect(
+      resolveCronFallbacksOverride({
+        cfg: makeConfig(["openai/gpt-5.4"]),
+        agentId: "main",
+        job: makeJob({
+          kind: "agentTurn",
+          message: "summarize",
+        }),
+      }),
+    ).toBeUndefined();
+  });
+});
--- a/src/cron/isolated-agent/run-fallback-policy.ts
+++ b/src/cron/isolated-agent/run-fallback-policy.ts
@@ -17,6 +17,7 @@ export function resolveCronFallbacksOverride(params: {
      cfg: params.cfg,
      agentId: params.agentId,
      hasSessionModelOverride: hasCronPayloadModelOverride,
+      modelOverrideSource: hasCronPayloadModelOverride ? "auto" : undefined,
    })
  );
 }
--- a/src/cron/isolated-agent/run.test-harness.ts
+++ b/src/cron/isolated-agent/run.test-harness.ts
@@ -293,13 +293,16 @@ function resetRunConfigMocks(): void {
  resolveAgentConfigMock.mockReturnValue(undefined);
  resolveEffectiveModelFallbacksMock.mockReset();
  resolveEffectiveModelFallbacksMock.mockImplementation(
-    ({ cfg, agentId, hasSessionModelOverride }) => {
+    ({ cfg, agentId, hasSessionModelOverride, modelOverrideSource }) => {
      const agentFallbacksOverride = resolveAgentModelFallbacksOverrideMock(cfg, agentId) as
        | string[]
        | undefined;
      if (!hasSessionModelOverride) {
        return agentFallbacksOverride;
      }
+      if (modelOverrideSource !== "auto") {
+        return [];
+      }
      const defaultFallbacks = resolveAgentModelFallbackValues(cfg?.agents?.defaults?.model);
      return agentFallbacksOverride ?? defaultFallbacks;
    },