English Documentation | 中文文档
Speech2Motion is a real-time streaming system that converts speech input into synchronized 3D character animations. The system provides intelligent motion matching based on speech content, keywords, and timing, enabling natural and expressive character animations for interactive applications.
- Real-time Streaming: Supports streaming speech-to-motion conversion with low latency
- Multi-version APIs: Provides V1, V2, and V3 API versions with different capabilities
- Intelligent Matching: Advanced keyword matching for both motion and speech text content
- Memory Management: User session memory to avoid repetitive animations
- Flexible Data Sources: Supports multiple data backends (SQLite, MySQL, MinIO, filesystem)
- Motion Blending: Smooth transitions between different motion sequences
- Avatar Support: Multi-avatar support with customizable rest poses
- Extensible Architecture: Modular design with pluggable filters and readers
The system consists of several key components:
- Streaming APIs: Handle real-time speech input and motion generation
- Motion Database: SQLite/MySQL database with motion metadata and binary files
- Filter Pipeline: Multi-stage filtering system for motion selection
- Timeline Management: Frame-based timeline for motion sequencing
- Memory System: User session management to track seen motions
- Text Processing: Jieba-based text segmentation for keyword extraction
- Motion Merging: Interpolation and blending for smooth transitions
To use Speech2Motion, you need to download the offline motion database and set up the required directory structure.
-
Download the motion database:
- Google Drive Download: motion_data.zip
- Baidu Cloud: motion_data.zip
- According to your network environment, choose the appropriate download method to download the compressed motion database file
-
Extract and organize the data:
- Extract the downloaded file to your project root directory
- Ensure the following directory structure is created:
├─configs
├─data
│ ├─motion_files
│ │ └─motion
│ ├─restpose_npz
│ └─motion_database.db
├─docs
├─speech2motion
└─tools
data/motion_files: A folder for storing binary motion files.data/restpose_npz/: A folder for storing restpose data in NPZ format.data/motion_database.db: A SQLite file that contains the motion database.- The
datadirectory will be mounted to the Docker container at/workspace/speech2motion/data
The easiest way to get started with Speech2Motion is using the pre-built Docker image:
Linux/macOS:
# Pull and run the pre-built image
docker run -it \
-p 18084:18084 \
-v $(pwd)/data:/workspace/speech2motion/data \
dlp3d/speech2motion:latestWindows:
# Pull and run the pre-built image
docker run -it -p 18084:18084 -v .\data:/workspace/speech2motion/data dlp3d/speech2motion:latestCommand Explanation:
-p 18084:18084: Maps the container's port 18084 to your host machine's port 18084-v $(pwd)/data:/workspace/speech2motion/data(Linux/macOS): Mounts your localdatadirectory to the container's data directory-v .\data:/workspace/speech2motion/data(Windows): Mounts your localdatadirectory to the container's data directorydlp3d/speech2motion:latest: Uses the pre-built public image
Prerequisites:
- Ensure you have a
datadirectory in your project root - Make sure Docker is installed and running on your system
For detailed information about installation, API usage, configuration, and development, please visit our comprehensive documentation:
The documentation includes:
- Installation Guide: Step-by-step environment setup and dependency installation
- API Documentation: Detailed API specifications and usage examples
- Configuration: Local and production configuration options
- Development Guide: Project structure, testing, and contribution guidelines
This project is licensed under the MIT License. See the LICENSE file for details.
The MIT License is a permissive open-source license that allows you to:
- Use the software for any purpose
- Modify and distribute the software
- Include the software in proprietary applications
- Sell the software
The only requirement is that you include the original copyright notice and license text in any copies or substantial portions of the software.