Open source ESP32 voice + vision hardware for AI Agent platforms.
Dumb terminal + Agent brain. Compile once, configure via Web page.
What is XiaoXi and why does it exist?
ESP32 only handles audio I/O and WiFi. All intelligence lives on the Agent backend. Compile firmware once, change everything via Web config page.
Change Agent backend address in Web config — no recompilation, no reflashing. Hermes, xiaozhi, or any compatible backend.
Connect to Hermes for tool calling, smart home, calendar, search, MCP tools — everything an AI Agent can do.
Hardware BOM from ¥29 (under $4 USD). Open source firmware, open source hardware. No monthly fees.
Pocket-sized AI voice assistant. Custom PCB designed to fit inside a pen barrel. Also available in desk form.
MIT License. Firmware, hardware schematics, PCB designs, documentation — all open.
| Feature | XiaoZhi | XiaoXi |
|---|---|---|
| Backend | Hardcoded to official | Configurable, switch freely |
| Settings | Recompile + reflash | Web page, instant |
| Switch LLM | Modify firmware | Backend side, ESP32 doesn't know |
| Add Tools | Modify firmware | Backend side, ESP32 doesn't know |
| Setup | PC client required | Built-in Web page |
| HW Cost | ~¥50 | From ¥29 |
ESP32 = dumb terminal. Agent backend = brain.
graph LR
subgraph Device["ESP32 Device"]
MIC["Microphone\nI2S INMP441"]
SPK["Speaker\nI2S MAX98357"]
BTN["Button / Wake Word"]
CODEC["Audio Codec\nOpus Encode/Decode"]
WIFI["WiFi Manager"]
WEB["Web Config Page\nAP Hotspot 192.168.4.1"]
CAM["Camera OV2640\nVision versions"]
end
subgraph NET["Network"]
WIFI2["WiFi / Hotspot\nWebSocket"]
end
subgraph Backend["Agent Backend - Brain"]
ASR["ASR\nWhisper / SenseVoice"]
LLM["LLM\nDeepSeek / Qwen\nClaude / GPT / Local"]
TTS["TTS\nEdge TTS / GPT-SoVITS\nOpenAI TTS"]
CTX["Context Manager\nMulti-turn Memory"]
TOOLS["Tool Calling / MCP\nWeather Search SmartHome\nCalendar Custom Tools"]
VISION["Vision\nGPT-4o / Qwen-VL"]
ADMIN["Web Admin\nPersona API Key\nVoice History"]
end
MIC --> CODEC
CODEC --> WIFI
BTN --> CODEC
WIFI --> WIFI2
WIFI2 --> ASR
ASR --> LLM
LLM --> TTS
LLM --> CTX
LLM --> TOOLS
CAM --> VISION
TTS --> WIFI2
WIFI2 --> CODEC
CODEC --> SPK
ESP32 creates WiFi AP. Phone connects → browser 192.168.4.1 → change Agent address, WiFi, volume, device name. No USB needed.
Home: ESP32 → WiFi → LAN → Agent. Outside: ESP32 → Phone hotspot → Internet → Agent. Auto switch.
Upload new firmware via Web page. No USB cable, no compile tools. Just drag & drop the .bin file.
| Pen Basic | Pen Eye | Desk Standard | Desk Eye | |
|---|---|---|---|---|
| Chip | ESP32-C3 | ESP32-CAM | ESP32-S3 | S3 Mini |
| Trigger | Button | Button | Wake word + button | Wake word + button |
| Camera | ❌ | ✅ OV2640 | ❌ | ✅ OV2640 |
| Screen | ❌ | ❌ | ✅ OLED | ✅ OLED |
| BOM | ~¥29 | ~¥55 | ~¥55 | ~¥63 |
| Price | ¥99-149 | ¥199-299 | ¥199-249 | ¥249-349 |
Firmware, schematics, PCB files, 3D models
Coming soon — Pre-compiled firmware for each version.
Coming soon — KiCad / Altium source files + Gerber for JLCPCB.
Coming soon — 3D printable enclosure for each version.
Coming soon — Complete bill of materials with purchase links.
What you need to build XiaoXi
ESP-IDF v5.5.2 — Espressif official SDK
VS Code + ESP-IDF Plugin — Recommended IDE
Python 3.10+ — Build system
ESP32-C3 / S3 — Main controller
INMP441 — I2S digital microphone
MAX98357A — I2S audio amplifier
OV2640 — 2MP camera (vision versions)
Hermes Agent — Recommended, full-featured
xiaozhi-server — Docker self-hosted
xiaozhi.me — Official free service
Any compatible WebSocket Agent backend
Guides, references, and technical docs
Deep dive into xiaozhi-esp32 firmware architecture — audio pipeline, wake word engine, WebSocket protocol, OTA system.
Four versions — BOM cost, pricing, component list, market comparison, data flow.
Quick start guide
graph TD
A["1. Get Hardware"] --> B["2. Flash Firmware"]
B --> C["3. Power On"]
C --> D["4. Connect to AP Hotspot"]
D --> E["5. Configure WiFi + Agent Address"]
E --> F["6. Talk to XiaoXi!"]
style A fill:#1e293b,stroke:#22d3ee,color:#e2e8f0
style B fill:#1e293b,stroke:#22d3ee,color:#e2e8f0
style C fill:#1e293b,stroke:#22d3ee,color:#e2e8f0
style D fill:#1e293b,stroke:#22d3ee,color:#e2e8f0
style E fill:#1e293b,stroke:#22d3ee,color:#e2e8f0
style F fill:#065f46,stroke:#34d399,color:#e2e8f0
Order an ESP32-S3 or ESP32-C3 dev board + INMP441 mic + MAX98357A amp + speaker. Total under ¥50 from Taobao.
Download pre-built .bin from this site (coming soon), or compile from source with ESP-IDF v5.5.2. Flash via USB.
Power on → connect to XiaoXi AP hotspot → open 192.168.4.1 → set your WiFi and Agent backend address.
Install Hermes Agent on your PC, or use xiaozhi-server Docker, or connect to xiaozhi.me official server.
Press button (pen version) or say wake word (desk version). Ask anything. XiaoXi replies in ~3 seconds.
Change LLM, TTS voice, persona prompt, tools — all from the Agent backend. ESP32 firmware stays the same.
Join us, contribute, or just say hi
xiaozhi-esp32 — Original firmware (27k⭐)
xiaozhi-esp32-server — Backend server (9.7k⭐)
Hermes Agent — AI Agent platform
Looking for:
• Hardware / PCB designers
• ESP32 firmware engineers
• Frontend developers
• Documentation writers