🖊️ XiaoXi

Open source ESP32 voice + vision hardware for AI Agent platforms.
Dumb terminal + Agent brain. Compile once, configure via Web page.

💡 Project Introduction

What is XiaoXi and why does it exist?

🎯

Dumb Terminal Design

ESP32 only handles audio I/O and WiFi. All intelligence lives on the Agent backend. Compile firmware once, change everything via Web config page.

🔄

Switch Backend in Seconds

Change Agent backend address in Web config — no recompilation, no reflashing. Hermes, xiaozhi, or any compatible backend.

🛠️

Full Agent Capabilities

Connect to Hermes for tool calling, smart home, calendar, search, MCP tools — everything an AI Agent can do.

💰

Ultra Low Cost

Hardware BOM from ¥29 (under $4 USD). Open source firmware, open source hardware. No monthly fees.

📱

Pen-Sized Form Factor

Pocket-sized AI voice assistant. Custom PCB designed to fit inside a pen barrel. Also available in desk form.

🔓

Fully Open Source

MIT License. Firmware, hardware schematics, PCB designs, documentation — all open.

⚡ XiaoXi vs XiaoZhi (Original)

Feature XiaoZhi XiaoXi
BackendHardcoded to officialConfigurable, switch freely
SettingsRecompile + reflashWeb page, instant
Switch LLMModify firmwareBackend side, ESP32 doesn't know
Add ToolsModify firmwareBackend side, ESP32 doesn't know
SetupPC client requiredBuilt-in Web page
HW Cost~¥50From ¥29

🏗️ System Architecture

ESP32 = dumb terminal. Agent backend = brain.

graph LR
  subgraph Device["ESP32 Device"]
    MIC["Microphone\nI2S INMP441"]
    SPK["Speaker\nI2S MAX98357"]
    BTN["Button / Wake Word"]
    CODEC["Audio Codec\nOpus Encode/Decode"]
    WIFI["WiFi Manager"]
    WEB["Web Config Page\nAP Hotspot 192.168.4.1"]
    CAM["Camera OV2640\nVision versions"]
  end

  subgraph NET["Network"]
    WIFI2["WiFi / Hotspot\nWebSocket"]
  end

  subgraph Backend["Agent Backend - Brain"]
    ASR["ASR\nWhisper / SenseVoice"]
    LLM["LLM\nDeepSeek / Qwen\nClaude / GPT / Local"]
    TTS["TTS\nEdge TTS / GPT-SoVITS\nOpenAI TTS"]
    CTX["Context Manager\nMulti-turn Memory"]
    TOOLS["Tool Calling / MCP\nWeather Search SmartHome\nCalendar Custom Tools"]
    VISION["Vision\nGPT-4o / Qwen-VL"]
    ADMIN["Web Admin\nPersona API Key\nVoice History"]
  end

  MIC --> CODEC
  CODEC --> WIFI
  BTN --> CODEC
  WIFI --> WIFI2
  WIFI2 --> ASR
  ASR --> LLM
  LLM --> TTS
  LLM --> CTX
  LLM --> TOOLS
  CAM --> VISION
  TTS --> WIFI2
  WIFI2 --> CODEC
  CODEC --> SPK
      
📱

Web Config Page

ESP32 creates WiFi AP. Phone connects → browser 192.168.4.1 → change Agent address, WiFi, volume, device name. No USB needed.

🔄

Home & Away

Home: ESP32 → WiFi → LAN → Agent. Outside: ESP32 → Phone hotspot → Internet → Agent. Auto switch.

🔄

OTA Updates

Upload new firmware via Web page. No USB cable, no compile tools. Just drag & drop the .bin file.

📦 Four Product Versions

Pen BasicPen EyeDesk StandardDesk Eye
ChipESP32-C3ESP32-CAMESP32-S3S3 Mini
TriggerButtonButtonWake word + buttonWake word + button
Camera✅ OV2640✅ OV2640
Screen✅ OLED✅ OLED
BOM~¥29~¥55~¥55~¥63
Price¥99-149¥199-299¥199-249¥249-349

⬇️ Downloads

Firmware, schematics, PCB files, 3D models

📦

Firmware (.bin)

Coming soon — Pre-compiled firmware for each version.

🔧

PCB Schematics

Coming soon — KiCad / Altium source files + Gerber for JLCPCB.

🖨️

3D Models (STL)

Coming soon — 3D printable enclosure for each version.

📋

BOM List

Coming soon — Complete bill of materials with purchase links.

🔧 Parts & Tools

What you need to build XiaoXi

💻

Firmware Development

ESP-IDF v5.5.2 — Espressif official SDK
VS Code + ESP-IDF Plugin — Recommended IDE
Python 3.10+ — Build system

🔌

Key Components

ESP32-C3 / S3 — Main controller
INMP441 — I2S digital microphone
MAX98357A — I2S audio amplifier
OV2640 — 2MP camera (vision versions)

📡

Agent Backend

Hermes Agent — Recommended, full-featured
xiaozhi-server — Docker self-hosted
xiaozhi.me — Official free service
Any compatible WebSocket Agent backend

📖 Documentation

Guides, references, and technical docs

📄

Firmware Code Analysis

Deep dive into xiaozhi-esp32 firmware architecture — audio pipeline, wake word engine, WebSocket protocol, OTA system.

English · 中文

📦

Product Line Definition

Four versions — BOM cost, pricing, component list, market comparison, data flow.

English · 中文

🏗️

Architecture Diagram

System architecture — ESP32 device layer, network layer, Agent backend layer.

English · 中文 (HTML)

🚀 Getting Started

Quick start guide

graph TD
  A["1. Get Hardware"] --> B["2. Flash Firmware"]
  B --> C["3. Power On"]
  C --> D["4. Connect to AP Hotspot"]
  D --> E["5. Configure WiFi + Agent Address"]
  E --> F["6. Talk to XiaoXi!"]

  style A fill:#1e293b,stroke:#22d3ee,color:#e2e8f0
  style B fill:#1e293b,stroke:#22d3ee,color:#e2e8f0
  style C fill:#1e293b,stroke:#22d3ee,color:#e2e8f0
  style D fill:#1e293b,stroke:#22d3ee,color:#e2e8f0
  style E fill:#1e293b,stroke:#22d3ee,color:#e2e8f0
  style F fill:#065f46,stroke:#34d399,color:#e2e8f0
      

Get Hardware

Order an ESP32-S3 or ESP32-C3 dev board + INMP441 mic + MAX98357A amp + speaker. Total under ¥50 from Taobao.

Flash Firmware

Download pre-built .bin from this site (coming soon), or compile from source with ESP-IDF v5.5.2. Flash via USB.

Configure

Power on → connect to XiaoXi AP hotspot → open 192.168.4.1 → set your WiFi and Agent backend address.

Set Up Backend

Install Hermes Agent on your PC, or use xiaozhi-server Docker, or connect to xiaozhi.me official server.

Talk!

Press button (pen version) or say wake word (desk version). Ask anything. XiaoXi replies in ~3 seconds.

Customize

Change LLM, TTS voice, persona prompt, tools — all from the Agent backend. ESP32 firmware stays the same.

📬 Links & Contact

Join us, contribute, or just say hi

🐙

GitHub Repository

R2129487/hermes-xiaoxi
Star ⭐ · Issues · Pull Requests

🔗

Related Projects

xiaozhi-esp32Original firmware (27k⭐)
xiaozhi-esp32-serverBackend server (9.7k⭐)
Hermes AgentAI Agent platform

🤝

Contributors Welcome

Looking for:
• Hardware / PCB designers
• ESP32 firmware engineers
• Frontend developers
• Documentation writers