参考了这个帖子
去除了aider上没有的模型
更新至gemini-2.5-flash(Aider成绩来自谷歌自测)
| Model | Global Average | Reasoning Average | Aider Correct | Mathematics Average | Data Analysis Average | Language Average | IF Average |
|---|---|---|---|---|---|---|---|
| o3 High | 82.60 | 93.33 | 79.6 | 84.67 | 75.8 | 76 | 86.17 |
| Gemini 2.5 Pro Experimental | 79.90 | 87.53 | 72.9 | 89.16 | 79.89 | 69.31 | 80.59 |
| o4-Mini High | 77.74 | 88.11 | 72 | 84.9 | 70.43 | 66.05 | 84.96 |
| Claude 3.7 Sonnet Thinking | 73.94 | 76.17 | 64.9 | 79 | 74.05 | 68.27 | 81.25 |
| o1 High | 72.94 | 77.47 | 61.7 | 79.28 | 65.47 | 72.15 | 81.55 |
| o3 Mini High | 70.53 | 74.36 | 60.4 | 76.55 | 70.64 | 56.86 | 84.36 |
| Gemini 2.5 Flash Preview | 69.99 | 73.47 | 51.1 | 81.45 | 75.49 | 59.43 | 79.02 |
| Grok 3 Mini Beta (High) | 69.93 | 87.61 | 49.3 | 77 | 67.87 | 59.09 | 78.7 |
| DeepSeek R1 | 68.88 | 76.58 | 56.9 | 77.91 | 67.6 | 54.77 | 79.49 |
| o3 Mini Medium | 66.39 | 69 | 53.8 | 71.68 | 66.56 | 54.12 | 83.16 |
| Claude 3.7 Sonnet | 62.87 | 49.11 | 60.4 | 64.65 | 63.37 | 63.19 | 76.49 |
| QwQ 32B | 62.01 | 76.72 | 20.9 | 76.08 | 65.03 | 51.48 | 81.83 |
| Gemini 2.0 Pro Experimental | 61.63 | 61.75 | 35.6 | 68.54 | 68.02 | 52.5 | 83.38 |
| GPT-4.5 Preview | 61.45 | 54.42 | 44.9 | 67.94 | 64.33 | 64.76 | 72.33 |
| GPT-4.1 | 59.99 | 44.39 | 52.4 | 62.39 | 69.13 | 54.55 | 77.05 |
| DeepSeek V3 | 59.91 | 44.28 | 55.1 | 71.44 | 60.37 | 46.82 | 81.47 |
| Grok 3 Beta | 59.61 | 48.53 | 53.3 | 62.75 | 54.53 | 53.8 | 84.74 |
| Gemini 2.0 Flash Thinking Experimental | 59.13 | 61.5 | 18.2 | 74.81 | 69.37 | 48.43 | 82.47 |
| ChatGPT-4o | 56.94 | 48.81 | 45.3 | 55.72 | 70.47 | 49.43 | 71.92 |
| Gemini 2.0 Flash | 54.23 | 44.25 | 22.2 | 63.19 | 67.55 | 42.39 | 85.79 |
| Claude 3.5 Sonnet | 54.03 | 43.22 | 51.6 | 50.54 | 55.03 | 54.48 | 69.3 |
| Qwen2.5 Max | 53.14 | 38.53 | 21.8 | 56.87 | 67.93 | 58.37 | 75.35 |
| GPT-4.1 Mini | 53.02 | 53.78 | 32.4 | 58.78 | 64.87 | 38 | 70.31 |
| o1 Mini | 52.08 | 51.33 | 32.9 | 60.26 | 57.92 | 44.66 | 65.4 |
| Llama 4 Maverick 17B 128E Instruct | 50.74 | 43.83 | 15.6 | 60.58 | 59.03 | 49.65 | 75.75 |
| Claude 3.5 Haiku | 39.85 | 26.19 | 28 | 34.84 | 48.45 | 39.71 | 61.88 |
| GPT-4.1 Nano | 36.99 | 35.58 | 8.9 | 42.39 | 46.59 | 30.96 | 57.54 |
| GPT-4o Mini | 33.99 | 25.64 | 3.6 | 38.05 | 49.96 | 29.88 | 56.8 |