Qwen3-Coder-480B-A35B-Instruct-FP8模型已经上线huggingface

我们先进的大模型已经遥遥领先于Claude Sonnet-4和其他国产开源.jpg

可于https://chat.qwen.ai 体验

12 个赞

遥遥领先!!

最近吹起cli的风

qwen cli在哪可以使用来着

冲冲 测起来

确实遥遥领先
不知道 hyperbolic 费用怎么算的,就不放了
原本是想测不同来源的kimi k2差异的,结果… :laughing:

保持原本专案设定,调整 prompt 分数应该会更高

qwen3-coder (hyperbolic)

Aider Polyglot: 60.4

- dirname: 2025-07-22-19-36-56--qwen-qwen3-coder-hyperbolic-02
  test_cases: 225
  model: openai/Qwen/Qwen3-Coder-480B-A35B-Instruct
  edit_format: diff
  commit_hash: f38200c-dirty
  pass_rate_1: 32.4
  pass_rate_2: 60.4
  pass_num_1: 73
  pass_num_2: 136
  percent_cases_well_formed: 95.1
  error_outputs: 14
  num_malformed_responses: 14
  num_with_malformed_responses: 11
  user_asks: 97
  lazy_comments: 0
  syntax_errors: 0
  indentation_errors: 0
  exhausted_context_windows: 0
  prompt_tokens: 3002019
  completion_tokens: 429041
  test_timeouts: 4
  total_tests: 225
  command: aider --model openai/Qwen/Qwen3-Coder-480B-A35B-Instruct
  date: 2025-07-22
  versions: 0.85.3.dev
  seconds_per_case: 39.4

kimi-k2 (Together)

Aider Polyglot: 55.6

- dirname: 2025-07-22-18-22-34--kimi-k2-together-02
  test_cases: 225
  model: openrouter/moonshotai/kimi-k2
  edit_format: diff
  commit_hash: f38200c-dirty
  pass_rate_1: 20.9
  pass_rate_2: 55.6
  pass_num_1: 47
  pass_num_2: 125
  percent_cases_well_formed: 93.3
  error_outputs: 17
  num_malformed_responses: 15
  num_with_malformed_responses: 15
  user_asks: 72
  lazy_comments: 0
  syntax_errors: 0
  indentation_errors: 0
  exhausted_context_windows: 0
  prompt_tokens: 2465203
  completion_tokens: 367131
  test_timeouts: 5
  total_tests: 225
  command: aider --model openrouter/moonshotai/kimi-k2
  date: 2025-07-22
  versions: 0.85.3.dev
  seconds_per_case: 41.9
  total_cost: 3.5666

costs: $0.0159/test-case, $3.57 total, $3.57 projected

kimi-k2 (deepinfra)

Aider Polyglot: 51.1

─
- dirname: 2025-07-22-17-55-27--kimi-k2-deepinfra-02
  test_cases: 225
  model: openrouter/moonshotai/kimi-k2
  edit_format: diff
  commit_hash: f38200c-dirty
  pass_rate_1: 20.4
  pass_rate_2: 51.1
  pass_num_1: 46
  pass_num_2: 115
  percent_cases_well_formed: 96.0
  error_outputs: 9
  num_malformed_responses: 9
  num_with_malformed_responses: 9
  user_asks: 50
  lazy_comments: 0
  syntax_errors: 0
  indentation_errors: 0
  exhausted_context_windows: 0
  prompt_tokens: 2197170
  completion_tokens: 366197
  test_timeouts: 6
  total_tests: 225
  command: aider --model openrouter/moonshotai/kimi-k2
  date: 2025-07-22
  versions: 0.85.3.dev
  seconds_per_case: 24.0
  total_cost: 2.0141

costs: $0.0090/test-case, $2.01 total, $2.01 projected
2 个赞

这测评靠谱吗

跑分肯定没问题啊,发出来的跑分基本都是可以跑到的,不至于直接打自己脸

qwen的跑分,kimi酱显得很眉清目秀

等大佬测试

emm,找点渠道测一下..支持国产 :smiling_face_with_three_hearts:

这些都比 DeepSeek 低太多了,
DeepSeek R1 (0528) 71.4%
o3-pro (high) 84.9%
gemini-2.5-pro-preview-06-05 (32k think) 83.1%

别和think模型比呀 时间不一样的啊

这个模型速度也挺慢的,你可以试试

R1有些供应商 吐字超快 就是贵了点

可于https://chat.qwen.ai 体验,什么意思指2api吗

我在Qwen Chat 感觉还行