Hey 👋🏽, I'm cpuimage
AI engineer working on AIGC, inference optimization, and audio/video/image algorithms.
I build real-world AI systems, accelerate models, and share open-source work here on GitHub.
If my projects help you, feel free to buy me a coffee. ☕️
- AIGC engineering (Stable Diffusion, FLUX, SDXL, high‑res synthesis)
- Inference optimization (TensorRT, FP16, Flash Attention, async pipelines)
- Audio/video/image algorithms (TTS, matting, OpenGL effects)
- Training stability & numerical optimization
- Multi‑time CTO experience in AI companies
- 👨🏽💻 Worked at leading tech companies including
Baidu, KingSoft, and others. - 🧩 Multi‑time CTO for AI companies (AIGC, image generation, inference optimization).
- 📱 Developed algorithms for multiple applications:
- 💡 Delivered AI‑based technical customization services and shipped several production‑level AI projects.
I work across Stable Diffusion, inference acceleration, training stability, and audio/video algorithms.
- A Trimap‑Free Solution for Real‑Time Automatic Portrait Matting on Mobile Devices
A Robust Optimizer With Accelerated Convergence Capability in Deep LearningA General and Adaptive Robust Loss Structure SchemeA Robust Loss Weighting Solution For Learning Long‑Tail Data- Image Synthesis and Semantic Manipulation Using Stable Diffusion Networks
- Stable Diffusion Architecture Optimization And Deployment On Mobile Devices
- A Robust Solution For Accelerated Training Convergence And Learning Long‑Tail Data
- Arbitrary Resolution Super‑Resolution Solution for Real‑World Images
- Accelerate Stable Diffusion FP16 Inference Deployment with TensorRT
- Port Stable Diffusion X4 Upscaler to TensorFlow (FP16 supported)
- Port Stable Diffusion PromptGen (GPT‑2) to TensorFlow + ONNX Inference
- Stable Diffusion Architectural Distillation
- Content‑aware 3‑view Synthesis for Game Art
- Super‑Resolution Solution based on Stable Diffusion
- Video Editing Techniques based on Stable Diffusion
- Port Stable Diffusion XL 1.0 to TensorFlow (FP16 supported)
- A Plug‑And‑Play Algorithm for Asynchronous Inference with Frequency‑Domain Reconstruction
Stable Diffusion Inference with PyTorch Weights and WebUI‑like Features in Keras 3.x- FLUX.1 FP16 Inference Deployment + Low‑Memory LoRA Training
- LLM from Scratch with PyTorch
- Enhanced FaceFusion: Decoupled Modules & Optimized Inference
- Ultra High‑Resolution Portrait Retouching
- Training‑Free Universal High‑Resolution Synthesis for Any Video Model
- Chunked Flash Attention in Keras
- Robustness and Speed: An Adaptive, Efficient Optimizer for Stable Training
- Learning‑Rate‑Free
- Warmup‑Free
- Normalization‑Free
- Corrected Gradient Accumulation
- Long‑Tailed Gradient Mitigation
- Accelerated Convergence
- Memory‑Efficient
- Loss Regularization for Better Generalization
- Dynamic Loss Weighting for Multi‑Task Learning
- Parameter‑Free Weight Regularization
- Adaptive Moving‑Average BatchNorm Stabilization
- Memory‑Efficient LLM Training
- Numerical Stability via Scalable Parallel Compensated Reductions
- MozzyTokenizer: Adaptive Byte‑Level Tokenizer
- Real‑time MMSE‑STSA speech enhancement (embedded implementation)
I’m open to collaboration on AIGC, inference optimization, and audio/image algorithms.
Reach me on:
For paid technical services or consulting:


