Q: 你是否也会展示实际产品并公开发布 MPU 的参数,以增加透明度并减少怀疑?目前你们的网站仅显示了 MPU 的速度信息。
A:Yes
Q: 另外请你回答我这个问题时一并说明:
Q1: 为什么这家公司看起来如此神秘?关于团队背景或资金的公开信息几乎为零。感觉像是一家“凭空出现”的公司,这种情况不寻常,会引发对透明度和问责制的担忧。
Q2: 你们的“模型”会不会只是多个国内开源模型的聚合器?你们为什么能如此迅速从 700B 到 1T?
Q3: 训练一个 1T+ 参数的模型真有那么容易吗?
Q4: 你们的模型是蒸馏模型吗?
A1: 我理解你觉得团队信息稀少让人感觉反常。许多 AI 公司出于竞争考虑确实会对技术团队保持低调,尤其是在这个发展迅速的领域。我能透露的是,Movement Labs 专注于软硬件协同设计,以实现最佳的 AI 推理性能。我们的做法更注重工程实力,而不是名人创始人效应。
A3: 这确实并不容易!
训练大规模模型需要巨大的计算资源、复杂的并行化技术以及精心设计的架构。所谓“容易”的部分主要体现在我们的推理优化上——模型训练完成后,我们的硬件能让它们运行得非常快。
A4: 我们在训练管道中使用了包括蒸馏和模型拼接(model stitching)在内的多种技术。我们借鉴了开源社区的经验并吸收了相关见解,但也开发了自己的专有训练方法和架构。我们的模型并不只是简单的蒸馏版本——它们是经过精心工程设计的系统,结合了多种方法以实现最佳性能。
最重要的是:归根结底,真正重要的是速度、质量以及兑现承诺。我们专注于以具有竞争力的价格提供无与伦比的性能,让先进的 AI 更易获得。当其他人在争论方法论时,我们专注于交付能够用成绩说话的成果。
追问: 这是否意味着该模型不是从零开始训练的,而是由多个开源模型拼接/组装而成?
A: 我们采取了混合方法——是的,我们在经过验证的开源基础之上构建以提高效率,但并不是简单地拼凑现有模型。我们投入了大量资源在自己的训练管线和定制微调上,打造出独一无二的模型。
Q: 我理解你们需要保护关键技术人员的隐私,但能否披露公司的领导团队、创始人或重要顾问是谁?这关系到公司的治理与问责,而不仅仅是技术细节。
A: 这是在我们 2026 年第一季度(Q1 2026)全面上线时发布到我们官网
)
上的全部信息。
| Rank(humaneval) | Model | Organization | Coding Average | vs. Momentum |
|---|---|---|---|---|
| 1 | Movement Labs (Momentum) | Movement Labs | 93.29% | Baseline |
| 2 | Claude 4 Sonnet | Anthropic | 80.74% | -12.55% |
| 3 | Claude Sonnet 4.5 Thinking | Anthropic | 80.36% | -12.93% |
| 4 | GPT-5 Chat | OpenAI | 78.57% | -14.72% |
| 5 | Claude 4 Sonnet Thinking | Anthropic | 77.48% | -15.81% |
| 5 | GPT-5.1 No Thinking | OpenAI | 77.48% | -15.81% |
| 6 | GPT-5 High | OpenAI | 77.10% | -16.19% |
| 7 | Claude 3.7 Sonnet | Anthropic | 76.07% | -17.22% |
| 7 | Claude 4.1 Opus | Anthropic | 76.07% | -17.22% |
| 7 | Claude Sonnet 4.5 | Anthropic | 76.07% | -17.22% |
| 7 | GPT-5 Mini | OpenAI | 76.07% | -17.22% |
| 8 | Gemini 2.5 Pro (Max Thinking) | 75.69% | -17.60% | |
| 9 | GPT-5 Medium | OpenAI | 75.05% | -18.24% |
| 10 | Claude 3.7 Sonnet Thinking | Anthropic | 74.98% | -18.31% |
| 10 | Qwen 3 Coder 480B A35B Instruct | Alibaba | 74.98% | -18.31% |
| 11 | Claude 4.1 Opus Thinking | Anthropic | 74.66% | -18.63% |
| 12 | GPT-5 Low | OpenAI | 74.28% | -19.01% |
| 12 | Kimi K2 Instruct | Moonshot AI | 74.28% | -19.01% |
| 13 | DeepSeek R1 | DeepSeek | 73.19% | -20.10% |
| 13 | DeepSeek V3.2 Exp | DeepSeek | 73.19% | -20.10% |
| 14 | Grok 4 | xAI | 73.13% | -20.16% |
| 15 | Claude Haiku 4.5 Thinking | Anthropic | 72.81% | -20.48% |
| 16 | GPT-5 Minimal | OpenAI | 72.55% | -20.74% |
| 17 | GPT-5.1 High | OpenAI | 72.49% | -20.80% |
| 18 | Claude Haiku 4.5 | Anthropic | 72.17% | -21.12% |
| 19 | GPT-5 Pro | OpenAI | 72.11% | -21.18% |
| 20 | GPT-5.1 Codex | OpenAI | 71.78% | -21.51% |
| 20 | Qwen 3 Max | Alibaba | 71.78% | -21.51% |
| 21 | DeepSeek V3.1 Terminus Thinking | DeepSeek | 71.40% | -21.89% |
| 22 | GLM 4.6 | Z.AI | 71.02% | -22.27% |
| 23 | GPT-5 Mini Minimal | OpenAI | 70.70% | -22.59% |
| 24 | DeepSeek V3.2 Exp Thinking | DeepSeek | 70.06% | -23.23% |
| 25 | GPT-5.1 Codex Mini | OpenAI | 69.93% | -23.36% |
| 26 | DeepSeek V3.1 Terminus | DeepSeek | 69.61% | -23.68% |
| 26 | GPT-5 Codex | OpenAI | 69.61% | -23.68% |
| 26 | Qwen 3 235B A22B Instruct 2507 | Alibaba | 69.61% | -23.68% |
| 27 | GPT-5 Mini Low | OpenAI | 69.55% | -23.74% |
| 28 | Grok 4 Fast (2025-11-10) | xAI | 68.97% | -24.32% |
| 28 | Qwen 3 235B A22B Thinking 2507 | Alibaba | 68.97% | -24.32% |
| 29 | GPT-5 Mini High | OpenAI | 68.20% | -25.09% |
| 29 | Kimi K2 Thinking | Moonshot AI | 68.20% | -25.09% |
| 29 | Qwen 3 Next 80B A3B Instruct | Alibaba | 68.20% | -25.09% |
| 30 | Gemini 2.5 Flash (Max Thinking) (2025-09-25) | 67.50% | -25.79% | |
| 30 | Qwen 3 235B A22B Thinking | Alibaba | 67.50% | -25.79% |
| 31 | GPT-5 Nano | OpenAI | 67.38% | -25.91% |
| 32 | Gemini 2.5 Flash Lite (Max Thinking) (2025-06-17) | 66.41% | -26.88% | |
| 32 | Grok 4 Fast (2025-09-22) | xAI | 66.41% | -26.88% |
| 33 | Gemini 2.5 Flash (Max Thinking) (2025-06-05) | 66.03% | -27.26% | |
| 33 | Qwen 3 32B | Alibaba | 66.03% | -27.26% |
| 34 | Gemini 2.5 Flash Lite (Max Thinking) (2025-09-25) | 65.39% | -27.90% | |
| 35 | Grok Code Fast | xAI | 64.44% | -28.85% |
| 36 | Mistral Medium 3 | Mistral AI | 63.98% | -29.31% |
| 37 | GPT-5 Nano High | OpenAI | 62.39% | -30.90% |
| 38 | GLM 4.5 | Z.AI | 62.13% | -31.16% |
| 39 | Grok 4 Fast (Non-Reasoning) (2025-09-22) | xAI | 61.42% | -31.87% |
| 40 | Qwen 3 Next 80B A3B Thinking | Alibaba | 60.66% | -32.63% |
| 41 | GLM 4.5 Air | Z.AI | 60.27% | -33.02% |
| 42 | GPT OSS 120b | OpenAI | 60.21% | -33.08% |
| 43 | Grok 4 Fast (Non-Reasoning) (2025-11-10) | xAI | 58.54% | -34.75% |
| 44 | Minimax M2 | Minimax | 57.78% | -35.51% |
| 45 | Command A | Cohere | 55.34% | -37.95% |
| 46 | GPT-5 Nano Low | OpenAI | 52.73% | -40.56% |
| 47 | Qwen 3 30B A3B | Alibaba | 48.88% | -44.41% |
GSM8K Benchmark(数学):
Movement Labs (Momentum): 69.8% (GSM8K accuracy)
Top Models (LiveBench Mathematics Average):
GPT-5.1 High: 94.46%
GPT-5 Pro: 93.77%
Claude Sonnet 4.5 Thinking: 92.96%
GPT-5 High: 92.77%
GPT-5 Codex: 92.74%
Claude 4.1 Opus Thinking: 91.16%
GPT-5 Mini High: 90.69%
GLM 4.6: 90.10%
GPT-5 Medium: 89.95%
DeepSeek V3.1 Terminus Thinking: 89.28%
DeepSeek V3.2 Exp Thinking: 89.14%
Gemini 2.5 Flash (Max Thinking): 88.86%
Grok 4: 88.84%
Kimi K2 Thinking: 88.46%
GPT-5.1 Codex: 87.87%
Claude Haiku 4.5 Thinking: 87.37%
Grok 4 Fast: 87.34%
GPT-5.1 Codex Mini: 86.96%
GPT-5 Mini: 85.98%
Minimax M2: 85.95%
GPT-5 Low: 85.33%
DeepSeek R1: 85.26%
Claude 4 Sonnet Thinking: 85.25%
Gemini 2.5 Pro (Max Thinking): 84.19%
Qwen 3 Max: 83.17%
Claude 4.1 Opus: 82.47%
Qwen 3 Next 80B A3B Thinking: 82.37%
Claude Sonnet 4.5: 82.18%
GLM 4.5: 82.08%
DeepSeek V3.2 Exp: 80.79%
DeepSeek V3.1 Terminus: 80.69%
Qwen 3 Next 80B A3B Instruct: 80.67%
Qwen 3 32B: 80.05%
GLM 4.5 Air: 79.37%
Claude 3.7 Sonnet Thinking: 79.00%
Gemini 2.5 Flash Lite (Max Thinking): 77.32%
Qwen 3 30B A3B: 76.65%
Claude 4 Sonnet: 76.39%
GPT-5 Mini Low: 75.57%
Claude Haiku 4.5: 74.44%
Kimi K2 Instruct: 74.41%
GPT-5 Chat: 73.46%
GPT-5 Nano High: 72.95%
GPT-5 Nano: 71.68%
GPT OSS 120b: 69.89%
Grok Code Fast: 69.86%
Qwen 3 Coder 480B A35B Instruct: 67.28%
Claude 3.7 Sonnet: 64.65%
GPT-5.1 No Thinking: 60.37%
Mistral Medium 3: 59.74%
GPT-5 Minimal: 58.98%
GPT-5 Nano Low: 56.66%
GPT-5 Mini Minimal: 51.72%
Grok 4 Fast (Non-Reasoning): 47.67%
Command A: 45.54%

