diff --git a/README.md b/README.md index c38b063..8c993f4 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,7 @@ [![License](https://img.shields.io/badge/license-MIT-green)](./LICENSE) [![QQ Bot](https://img.shields.io/badge/QQ_Bot-API_v2-red)](https://bot.q.qq.com/wiki/) -[![Platform](https://img.shields.io/badge/platform-OpenClaw-orange)](https://github.com/tencent-connect/openclaw-qq) +[![Platform](https://img.shields.io/badge/platform-OpenClaw-orange)](https://github.com/tencent-connect/openclaw-qqbot) [![Node.js](https://img.shields.io/badge/Node.js->=18-339933?logo=node.js&logoColor=white)](https://nodejs.org/) [![TypeScript](https://img.shields.io/badge/TypeScript-5.9-3178C6?logo=typescript&logoColor=white)](https://www.typescriptlang.org/) @@ -50,6 +50,8 @@ Scan to join the QQ group chat
🎙️ Voice Messages (STT) — AI understands voice messages, auto-transcribes speech to text +With STT configured, the plugin automatically transcribes voice messages to text before passing them to AI. The whole process is transparent to the user — sending voice feels as natural as sending text. + > **You**: *(send a voice message)* "What's the weather like tomorrow in Shenzhen?" > > **QQBot**: Tomorrow (March 7, Saturday) Shenzhen weather forecast 🌤️ ... @@ -61,6 +63,8 @@ Scan to join the QQ group chat
📄 File Understanding — Send any file, AI reads and understands it +Send any file to the bot — novels, reports, spreadsheets — AI automatically recognizes the content and gives an intelligent reply. + > **You**: *(send a TXT file of "War and Peace")* > > **QQBot**: Got it! You uploaded the Chinese version of "War and Peace" by Leo Tolstoy. This appears to be the opening of Chapter 1... @@ -72,6 +76,8 @@ Scan to join the QQ group chat
🖼️ Image Understanding — Vision-capable models can see and describe images +If your main model supports vision (e.g. Tencent Hunyuan `hunyuan-vision`), AI can understand images too. This is a general multimodal capability, not plugin-specific. + > **You**: *(send an image)* > > **QQBot**: Haha, so cute! Is that a QQ penguin in a lobster costume? 🦞🐧 ... @@ -87,6 +93,12 @@ Scan to join the QQ group chat > > **QQBot**: Here you go! 🐱 +AI uses the `` tag to send images. Both local file paths and URLs are supported. Formats: jpg/png/gif/webp/bmp. + +``` +~/.openclaw/qqbot/images/cute-cat.png +``` + Image Generation Demo
@@ -98,6 +110,12 @@ Scan to join the QQ group chat > > **QQBot**: *(sends a voice message)* +AI uses the `` tag to send voice messages. Formats: mp3/wav/silk/ogg. Works without ffmpeg. + +``` +~/.openclaw/qqbot/tts/joke.silk +``` + TTS Voice Demo
@@ -109,6 +127,12 @@ Scan to join the QQ group chat > > **QQBot**: *(sends a .txt file)* +AI uses the `` tag to send files. PDF, Excel, ZIP, TXT — any format, up to 20MB. + +``` +~/.openclaw/qqbot/downloads/war-and-peace-ch1.txt +``` + File Sending Demo
@@ -120,11 +144,40 @@ Scan to join the QQ group chat > > **QQBot**: *(sends a video)* +AI uses the `` tag to send videos. Both local files and URLs are supported. Large files (>5MB) auto-show "uploading..." status. + +``` +~/.openclaw/qqbot/downloads/demo.mp4 +``` + Video Sending Demo -> For a deep dive into rich media capabilities, see the [Media Guide](docs/qqbot-media-guide.md). +### Rich Media Tag Reference + +| Tag | Direction | Usage | Notes | +|-----|-----------|-------|-------| +| `path` | Send | Image | Local path or URL, jpg/png/gif/webp/bmp | +| `path` | Send | Voice | mp3/wav/silk/ogg, no ffmpeg required | +| `path` | Send | File | Any format, up to 20MB | +| `path` | Send | Video | Local path or URL | +| Voice message | Receive | STT | Auto-transcribe with configured STT model | +| File attachment | Receive | File | Auto-download and feed content to AI | +| Image attachment | Receive | Vision | Requires vision-capable model | + +### Try It Yourself + +| Direction | You say | AI does | +|-----------|---------|---------| +| Receive voice | Send a voice message asking about weather | STT auto-transcribes, AI replies with text | +| Receive file | Send a file to the bot | AI reads file content, gives intelligent reply | +| Send image | "Draw me a cat" | Calls drawing tool, sends image back | +| Send voice | "Tell me a joke in voice" | TTS generates voice, sends voice message | +| Send file | "Generate a file for me" | Creates file, sends via `` | +| Send video | "Send me a video" | Sends video via `` | + +**Under the hood:** Tag variant auto-correction (30+ variants like ``, ``, `<qqimg>` are all recognized), upload caching (dedup within short windows), ordered queue delivery, and multi-layer audio format fallback. --- @@ -165,7 +218,7 @@ Scan to join the QQ group chat **Option A: One-Click Install & Run (Recommended)** ```bash -git clone https://github.com/tencent-connect/openclaw-qq.git && cd openclaw-qq +git clone https://github.com/tencent-connect/openclaw-qqbot.git && cd openclaw-qqbot bash ./scripts/upgrade-and-run.sh --appid YOUR_APPID --secret YOUR_SECRET ``` @@ -174,7 +227,7 @@ The script handles everything: cleanup old plugins → install deps → register **Option B: Manual Step-by-Step** ```bash -git clone https://github.com/tencent-connect/openclaw-qq.git && cd openclaw-qq +git clone https://github.com/tencent-connect/openclaw-qqbot.git && cd openclaw-qqbot npm install --omit=dev openclaw plugins install . ``` @@ -412,7 +465,7 @@ bash ./scripts/pull-latest.sh --repo # use a different repo ### From Source ```bash -git clone https://github.com/tencent-connect/openclaw-qq.git && cd openclaw-qq +git clone https://github.com/tencent-connect/openclaw-qqbot.git && cd openclaw-qqbot npm install --omit=dev bash ./scripts/upgrade.sh openclaw plugins install . @@ -424,7 +477,7 @@ openclaw gateway restart ## 📚 Documentation -- [Rich Media Guide](docs/qqbot-media-guide.md) — images, voice, video, files +- [Rich Media Guide](docs/qqbot-media-guide.md) — detailed STT/TTS config examples and tag usage - [Command Reference](docs/commands.md) — OpenClaw CLI commands - [Changelog](docs/changelog/) — release notes ([latest: 1.5.4](docs/changelog/1.5.4.md)) diff --git a/README.zh.md b/README.zh.md index 401db32..8a5d875 100644 --- a/README.zh.md +++ b/README.zh.md @@ -10,7 +10,7 @@ [![License](https://img.shields.io/badge/license-MIT-green)](./LICENSE) [![QQ Bot](https://img.shields.io/badge/QQ_Bot-API_v2-red)](https://bot.q.qq.com/wiki/) -[![Platform](https://img.shields.io/badge/platform-OpenClaw-orange)](https://github.com/tencent-connect/openclaw-qq) +[![Platform](https://img.shields.io/badge/platform-OpenClaw-orange)](https://github.com/tencent-connect/openclaw-qqbot) [![Node.js](https://img.shields.io/badge/Node.js->=18-339933?logo=node.js&logoColor=white)](https://nodejs.org/) [![TypeScript](https://img.shields.io/badge/TypeScript-5.9-3178C6?logo=typescript&logoColor=white)](https://www.typescriptlang.org/) @@ -47,6 +47,8 @@
🎙️ 语音消息(STT) — 配置 STT 后,自动将语音转录为文字理解 +配置 STT 后,插件会自动将语音转录为文字再交给 AI 处理。整个过程对用户完全透明——发语音就像发文字一样自然,AI 听得懂你在说什么。 + > **你**:*(发送一段语音)*"明天深圳天气怎么样" > > **QQBot**:明天(3月7日 周六)深圳的天气预报 🌤️ ... @@ -58,6 +60,8 @@
📄 文件理解 — 发文件给 AI,自动识别内容并智能回复 +用户发文件给 AI,AI 同样能接住。不管是一本小说还是一份报告,AI 会自动识别文件内容并给出智能回复。 + > **你**:*(发送《战争与和平》TXT 文件)* > > **QQBot**:收到!你上传了列夫·托尔斯泰的《战争与和平》中文版文本。从内容来看,这是第一章的开头……你想让我做什么? @@ -69,6 +73,8 @@
🖼️ 图片理解 — 主模型支持视觉能力时,发图片 AI 也能看懂 +如果主模型支持视觉(如腾讯混元 `hunyuan-vision`),用户发图片 AI 也能看懂。这是多模态模型的通用能力,非插件专属功能。 + > **你**:*(发送一张图片)* > > **QQBot**:哈哈,好可爱!这是QQ企鹅穿上小龙虾套装吗?🦞🐧 ... @@ -84,6 +90,12 @@ > > **QQBot**:给你画好了!🐱 +AI 通过 `` 标签发送图片,支持本地文件路径和网络 URL。格式:jpg/png/gif/webp/bmp。 + +``` +~/.openclaw/qqbot/images/cute-cat.png +``` + 发图片演示
@@ -95,6 +107,12 @@ > > **QQBot**:*(发送一条语音消息)* +AI 通过 `` 标签发送语音消息。格式:mp3/wav/silk/ogg,无需安装 ffmpeg。 + +``` +~/.openclaw/qqbot/tts/joke.silk +``` + 发语音演示
@@ -106,6 +124,12 @@ > > **QQBot**:*(发送 .txt 文件)* +AI 通过 `` 标签发送文件。PDF、Excel、ZIP、TXT,什么格式都能发,最大 20MB。 + +``` +~/.openclaw/qqbot/downloads/战争与和平_第一章.txt +``` + 发文件演示
@@ -117,11 +141,40 @@ > > **QQBot**:*(发送视频)* +AI 通过 `` 标签发送视频,支持本地文件和公网 URL。大文件(>5MB)自动提示"正在上传..."。 + +``` +~/.openclaw/qqbot/downloads/demo.mp4 +``` + 发视频演示 -> 富媒体能力(图片、语音、视频、文件)的完整说明请参阅 [富媒体指南](docs/qqbot-media-guide.md)。 +### 富媒体标签参考 + +| 标签 | 方向 | 用途 | 说明 | +|------|------|------|------| +| `路径` | 发送 | 图片 | 本地路径或 URL,jpg/png/gif/webp/bmp | +| `路径` | 发送 | 语音 | mp3/wav/silk/ogg,无需 ffmpeg | +| `路径` | 发送 | 文件 | 任意格式,最大 20MB | +| `路径` | 发送 | 视频 | 本地路径或 URL | +| 语音消息 | 接收 | STT | 自动转录,需配置 STT 模型 | +| 文件附件 | 接收 | 文件 | 自动下载并将内容交给 AI | +| 图片附件 | 接收 | 视觉 | 需主模型支持视觉能力 | + +### 体验演示场景 + +| 方向 | 你说 | AI 做 | +|------|------|-------| +| 接收语音 | 发送一段语音提问天气 | STT 自动转录,AI 理解后文字回复 | +| 接收文件 | 发送一个文件给机器人 | AI 识别文件内容,智能分析回复 | +| 发送图片 | "帮我画一只猫咪" | 调用绘图工具,生成图片发回 | +| 发送语音 | "用语音讲个笑话" | TTS 生成语音,直接发送语音消息 | +| 发送文件 | "帮我生成一个文件" | 生成文件并通过 `` 发送 | +| 发送视频 | "发个视频给我" | 通过 `` 直接发送视频 | + +**底层细节:** 标签容错(自动纠正 ``、``、`<qqimg>` 等 30 多种变体写法)、上传缓存(短时间内相同文件自动复用)、有序队列发送(混合消息按顺序投递,单项失败不影响其他)、音频格式多层降级。 --- @@ -161,7 +214,7 @@ **方式一:一键安装并启动(推荐)** ```bash -git clone https://github.com/tencent-connect/openclaw-qq.git && cd openclaw-qq +git clone https://github.com/tencent-connect/openclaw-qqbot.git && cd openclaw-qqbot bash ./scripts/upgrade-and-run.sh --appid YOUR_APPID --secret YOUR_SECRET ``` @@ -170,7 +223,7 @@ bash ./scripts/upgrade-and-run.sh --appid YOUR_APPID --secret YOUR_SECRET **方式二:手动分步安装** ```bash -git clone https://github.com/tencent-connect/openclaw-qq.git && cd openclaw-qq +git clone https://github.com/tencent-connect/openclaw-qqbot.git && cd openclaw-qqbot npm install --omit=dev openclaw plugins install . ``` @@ -408,7 +461,7 @@ bash ./scripts/pull-latest.sh --repo # 使用其他仓库地 ### 从源码升级 ```bash -git clone https://github.com/tencent-connect/openclaw-qq.git && cd openclaw-qq +git clone https://github.com/tencent-connect/openclaw-qqbot.git && cd openclaw-qqbot npm install --omit=dev bash ./scripts/upgrade.sh openclaw plugins install . @@ -420,7 +473,7 @@ openclaw gateway restart ## 📚 文档 -- [富媒体指南](docs/qqbot-media-guide.md) — 图片、语音、视频、文件 +- [富媒体指南](docs/qqbot-media-guide.md) — 详细的 STT/TTS 配置示例和标签用法 - [命令参考](docs/commands.md) — OpenClaw CLI 常用命令 - [更新日志](docs/changelog/) — 各版本变更记录([最新: 1.5.4](docs/changelog/1.5.4.md))