Log inSign up
Design Arena
The Intelligence Company
500 posts
user avatar
Design Arena
The Intelligence Company
@Designarena
World's first benchmark for real-world design with 4M+ creators and counting. Made by @intelligence_ai
designarena.ai
Joined June 2025
9
Following
13.7K
Followers
  • user avatar
    Design Arena
    The Intelligence Company
    @Designarena
    Jun 17
    Kimi K2.7 Code by @Kimi_Moonshot is 5th overall among open weight models on Design Arena with an Elo of 1312. This is in the same performance band as MiniMax M3 by @MiniMax_AI. With an average generation time of 337.6 seconds, Kimi K2.7 Code is 78.8 seconds faster than Kimi K2.6
    6.6K
  • user avatar
    Design Arena
    The Intelligence Company
    @Designarena
    Jun 16
    BREAKING: Riverflow Pro 2.5, a reasoning model by @riverflow_ai that calls a mix of proprietary and open diffusion models, has scored 1st on Image Arena (Models + Routers), 1st on Graphic Design Arena, and 1st in Image Edit (Models + Routers). Riverflow Pro 2.5 averages 10 Elo
    23K
  • user avatar
    Design Arena
    The Intelligence Company
    @Designarena
    Jun 16
    BREAKING: GLM-5.2 is now 1st on Design Arena. With an Elo of 1360, GLM-5.2 has jumped ahead of the now unavailable Claude Fable 5. And it's open weights. This is an improvement of 4 positions and 27 Elo points to achieve one of the highest Elo scores in our code categories
    1.9M
    user avatar
    Design Arena
    The Intelligence Company
    @Designarena
    Jun 16
    Agentic evaluations coming soon!
    40K
  • user avatar
    Design Arena
    The Intelligence Company
    @Designarena
    Jun 16
    Replying to @Designarena
    Try it now on
    Design Arena
    From designarena.ai
    51K
  • user avatar
    Design Arena
    The Intelligence Company
    @Designarena
    Jun 15
    BREAKING: Reve 2.0 by @reve debuts at 2nd on Image Editing Arena with an Elo of 1325. Reve establishes a new Pareto frontier for Preference vs. Speed, faster than any model at this preference level with an average generation time of 86.8 seconds. Reve is now the highest-ranked
    9.5K
  • Design Arena reposted
    user avatar
    Grace Li
    The Intelligence Company
    @grx_xce
    Jun 15
    BREAKING: Le Chaton Fat has fully saturated our benchmark. We are at a loss for words. In response, we are retiring Design Arena. Congratulations to the @MistralAI team, and thanks for putting us on vacation.
    91K
  • user avatar
    Design Arena
    The Intelligence Company
    @Designarena
    Jun 13
    Introducing Real-World Agentic Evaluations on Design Arena! Our new series of evaluations measuring end-to-end agentic model performance. Using real-world sessions and apps created by our 4M+ users, we analyzed agent traces to capture how models behave during deployment and in
    00:00
    3.7K
    user avatar
    Design Arena
    The Intelligence Company
    @Designarena
    Jun 13
    Replying to @Designarena
    Our first set of evaluations are now live, with more to follow. View our Agentic Evaluations now at
    Design Arena
    From designarena.ai
    1.3K
  • user avatar
    Design Arena
    The Intelligence Company
    @Designarena
    Jun 13
    Replying to @Designarena
    Real-World Reach & Daily Usage Design Arena users can publish their winning apps for other community members to see. Using Wilson Score Intervals, we calculated the average unique views and real user views with apps from each model - normalized as deviations from the table
    548
    user avatar
    Design Arena
    The Intelligence Company
    @Designarena
    Jun 13
    User Retention We tracked how often users returned to an app a week after its creation on average: measuring whether models were building apps worth revisiting.
    461
  • user avatar
    Design Arena
    The Intelligence Company
    @Designarena
    Jun 12
    Kimi-K2.7-Code by @Kimi_Moonshot is now available on Design Arena! Built upon Kimi K2.6, Kimi-K2.7-Code introduces improvements in coding and agent performance, reasoning efficiency, and long-horizon coding, marking it as their strongest coding model yet. Congrats to the
    user avatar
    Kimi.ai
    @Kimi_Moonshot
    Jun 12
    🌘 Kimi-K2.7-Code, our latest coding model, is now released and open-sourced! 🔷 Improved coding & agent performance over K2.6: +21.8% on Kimi Code Bench v2, +11.0% on Program Bench, and +31.5% on MLS Bench Lite. 🔷 Reasoning efficiency: Less overthinking, with 30% lower
    26K
  • user avatar
    Design Arena
    The Intelligence Company
    @Designarena
    Jun 12
    Article cover image
    Article
    Reve 2.0 establishes Reve as the top independent foundation image model lab
    We are excited to introduce Reve 2.0 – Reve’s most capable image generation model to date. With this release, Reve becomes the highest-ranked independent foundation image model lab on Design Arena....
    4.1K
  • user avatar
    Design Arena
    The Intelligence Company
    @Designarena
    Jun 12
    Replying to @Designarena
    What this means for model selection Opus 4.8 is a step backward for UI-focused, single-turn tasks. It's worse than Opus 4.7 in both workflow and agentic settings, and substantially worse in single-turn pipelines. For teams choosing a Claude model for design work, Opus 4.7, Opus
    1K
    user avatar
    Design Arena
    The Intelligence Company
    @Designarena
    Jun 12
    We will continue monitoring Opus 4.8 performance and how it compares to other models. Fable analysis coming soon. Congratulations to the @AnthropicAI team on the launch, and try out Opus 4.8 for free on DesignArena.ai.
    Design Arena
    From designarena.ai
    959
  • user avatar
    Design Arena
    The Intelligence Company
    @Designarena
    Jun 12
    Replying to @Designarena
    This may be a direct result of Opus 4.8’s over-optimization on tool use, as it rarely uses tools that write files directly and instead prefers to use bash commands that directly create files. Since these commands require intricate escaping, it’s easy to make these sorts of
    819
    user avatar
    Design Arena
    The Intelligence Company
    @Designarena
    Jun 12
    But there is a bright spot: Opus 4.8 is very good at backend! Opus 4.8 has real strengths in database design, API scaffolding, and auth implementation, as is shown by holding the 1st position on Design Arena’s Agentic Web Dev Backend Evaluation. Since these are easily checked
    788
  • user avatar
    Design Arena
    The Intelligence Company
    @Designarena
    Jun 12
    Replying to @Designarena
    Model Behavior #3: The Return of Anti-Patterns We also see a significant regression in terms of the anti-patterns that Claude Opus 4.8 uses in comparison to Opus 4.7. The model tends to use grid overlays (5.3% of generations) and floating/bobbing hero images (7.4% of
    736

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms of Service|Privacy Policy|Cookie Policy|Accessibility|Ads info|© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up