Speech2Motion

English Documentation | 中文文档

Overview

Speech2Motion is a real-time streaming system that converts speech input into synchronized 3D character animations. The system provides intelligent motion matching based on speech content, keywords, and timing, enabling natural and expressive character animations for interactive applications.

Key Features

Real-time Streaming: Supports streaming speech-to-motion conversion with low latency
Multi-version APIs: Provides V1, V2, and V3 API versions with different capabilities
Intelligent Matching: Advanced keyword matching for both motion and speech text content
Memory Management: User session memory to avoid repetitive animations
Flexible Data Sources: Supports multiple data backends (SQLite, MySQL, MinIO, filesystem)
Motion Blending: Smooth transitions between different motion sequences
Avatar Support: Multi-avatar support with customizable rest poses
Extensible Architecture: Modular design with pluggable filters and readers

System Architecture

The system consists of several key components:

Streaming APIs: Handle real-time speech input and motion generation
Motion Database: SQLite/MySQL database with motion metadata and binary files
Filter Pipeline: Multi-stage filtering system for motion selection
Timeline Management: Frame-based timeline for motion sequencing
Memory System: User session management to track seen motions
Text Processing: Jieba-based text segmentation for keyword extraction
Motion Merging: Interpolation and blending for smooth transitions

Data Preparation

To use Speech2Motion, you need to download the offline motion database and set up the required directory structure.

Download Motion Database

Download the motion database:
- Google Drive Download: motion_data.zip
- Baidu Cloud： motion_data.zip
- According to your network environment, choose the appropriate download method to download the compressed motion database file
Extract and organize the data:
- Extract the downloaded file to your project root directory
- Ensure the following directory structure is created:

├─configs
├─data
│  ├─motion_files
│  │  └─motion
│  ├─restpose_npz
│  └─motion_database.db
├─docs
├─speech2motion
└─tools

Directory Structure Explanation

data/motion_files: A folder for storing binary motion files.
data/restpose_npz/: A folder for storing restpose data in NPZ format.
data/motion_database.db: A SQLite file that contains the motion database.
The data directory will be mounted to the Docker container at /workspace/speech2motion/data

Quick Start

Using Docker

The easiest way to get started with Speech2Motion is using the pre-built Docker image:

Linux/macOS:

# Pull and run the pre-built image
docker run -it \
  -p 18084:18084 \
  -v $(pwd)/data:/workspace/speech2motion/data \
  dlp3d/speech2motion:latest

Windows:

# Pull and run the pre-built image
docker run -it -p 18084:18084 -v .\data:/workspace/speech2motion/data dlp3d/speech2motion:latest

Command Explanation:

-p 18084:18084: Maps the container's port 18084 to your host machine's port 18084
-v $(pwd)/data:/workspace/speech2motion/data (Linux/macOS): Mounts your local data directory to the container's data directory
-v .\data:/workspace/speech2motion/data (Windows): Mounts your local data directory to the container's data directory
dlp3d/speech2motion:latest: Uses the pre-built public image

Prerequisites:

Ensure you have a data directory in your project root
Make sure Docker is installed and running on your system

Documentation

For detailed information about installation, API usage, configuration, and development, please visit our comprehensive documentation:

📖 Complete Documentation

The documentation includes:

Installation Guide: Step-by-step environment setup and dependency installation
API Documentation: Detailed API specifications and usage examples
Configuration: Local and production configuration options
Development Guide: Project structure, testing, and contribution guidelines

License

This project is licensed under the MIT License. See the LICENSE file for details.

The MIT License is a permissive open-source license that allows you to:

Use the software for any purpose
Modify and distribute the software
Include the software in proprietary applications
Sell the software

The only requirement is that you include the original copyright notice and license text in any copies or substantial portions of the software.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
configs		configs
docs		docs
speech2motion		speech2motion
templates		templates
tests		tests
tools		tools
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md
main.py		main.py
pyproject.toml		pyproject.toml
ruff.toml		ruff.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech2Motion

Table of Contents

Overview

Key Features

System Architecture

Data Preparation

Download Motion Database

Directory Structure Explanation

Quick Start

Using Docker

Documentation

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Speech2Motion

Table of Contents

Overview

Key Features

System Architecture

Data Preparation

Download Motion Database

Directory Structure Explanation

Quick Start

Using Docker

Documentation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages