{"id":3833,"date":"2026-03-19T10:59:42","date_gmt":"2026-03-19T18:59:42","guid":{"rendered":"https:\/\/mlcommons.org\/?post_type=benchmarks&#038;p=3833"},"modified":"2026-03-19T13:25:37","modified_gmt":"2026-03-19T21:25:37","slug":"endpoints","status":"publish","type":"benchmarks","link":"https:\/\/mlcommons.org\/benchmarks\/endpoints\/","title":{"rendered":"MLPerf Endpoints"},"content":{"rendered":"\n<section class=\"block-hero-benchmark wp-elements-f21150dec01a117e97b94c767e83f28f wp-block-acf-benchmarkhero has-text-color has-white-color has-background has-ml-blue-background-color\" id=\"block-hero-benchmark-block_83d6c31e7647785ee0ccf4030ac84d9d\">\n  <div class=\"hero-benchmark\">\n    <div class=\"hero-benchmark__inner\">\n      <div class=\"hero-benchmark__title\"  data-animate=\"fade\">\n        <div>\n                      <p class=\"subheading\">Benchmark Suite Results<\/p>\n                    <h1>MLPerf Endpoints<\/h1>\n        <\/div>\n      <\/div>\n      <div class=\"hero-benchmark__content\">\n        <div  data-animate=\"fade\">\n          \n\n<p>No matter the workload, no matter the hardware, no matter the deployer, GenAI is an API endpoint. <strong>That is why we have started the MLPerf Endpoints Working Group at MLCommons.<\/strong><\/p>\n\n\n\n<p>The mission of this working group is to build a trusted, open, fair, and reproducible MLPerf benchmark for evaluating GenAI endpoint performance. MLPerf became the industry standard, but it needs to evolve to bring <strong>clarity and velocity<\/strong> to the Gen AI era. Combined with quality targets and backed by a consensus-based governance model, we believe this work on endpoints will expand the AI community and advance capabilities for the entire ecosystem.<\/p>\n\n\n\n<p>If it has an API, we can measure it. If this mission speaks to you, we invite you to join us.<\/p>\n\n\n\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button is-style-white-outline\"><a class=\"wp-block-button__link wp-element-button\" href=\"https:\/\/sd5pd.share.hsforms.com\/2yI9ulJRFT0ajNfRYPmTBPA\" target=\"_blank\" rel=\"noreferrer noopener\">Join the Working Group<\/a><\/div>\n\n\n\n<div class=\"wp-block-button is-style-white-outline\"><a class=\"wp-block-button__link wp-element-button\" href=\"http:\/\/Endpoints.MLCommons.org\">Try Endpoints<\/a><\/div>\n<\/div>\n\n        <\/div>\n      <\/div>\n    <\/div>\n  <\/div>\n<\/section>\n\n\n\n<div id=\"results\" class=\"wp-block-group is-layout-constrained wp-block-group-is-layout-constrained\" style=\"margin-top:var(--wp--preset--spacing--4);margin-bottom:var(--wp--preset--spacing--4)\">\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>We released the first demonstration version of MLPerf Endpoints on 19 March 2026 at GTC with the support of over 30 organizations, including submissions from five member organizations: AMD, Intel, Google, Krai, and NVIDIA. You can try it <a href=\"https:\/\/endpoints.mlcommons.org\/\">here<\/a>.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized\"><a href=\"https:\/\/endpoints.mlcommons.org\/\"><img loading=\"lazy\" decoding=\"async\" width=\"1080\" height=\"608\" src=\"https:\/\/mlcommons.org\/wp-content\/uploads\/2026\/03\/2kngif.gif\" alt=\"\" class=\"wp-image-3856\" style=\"aspect-ratio:1.7763459513041324;width:1142px;height:auto\"\/><\/a><\/figure>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:25%\"><\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:50%\"><\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:25%\"><\/div>\n<\/div>\n\n\n\n<h2 class=\"wp-block-heading\">See How Systems Really Perform Across Their Full Operating Range<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Start with what matters to you.<\/strong> Select the model most relevant to your use case, then compare systems side-by-side on a Throughput vs. Interactivity graph \u2014 instantly see the real-world tradeoffs across the full operating range, not just peak numbers.<\/li>\n\n\n\n<li><strong>No more single-point measurements.<\/strong> Concurrency X-axis graphs for Throughput, Time to First Token (TTFT), and Interactivity show you the complete performance surface. Use the concurrency selector to focus on the region that matches your production workload \u2014 for example, systems handling at least 10 simultaneous users.<\/li>\n\n\n\n<li><strong>Find the right system for your use case.<\/strong> Filter and compare systems by accelerator, software stack, and more. Whether you&#8217;re evaluating on-prem infrastructure or managed API endpoints, the data is right there.<\/li>\n\n\n\n<li><strong>Hover over any run to understand utilization.<\/strong> Each point on the curve shows tokens\/sec for that run relative to the system&#8217;s peak throughput \u2014 so you can see not just what a system can do, but how hard it&#8217;s working to do it.<\/li>\n\n\n\n<li><strong>Click any point to get the full picture.<\/strong> Every data point links to a detailed run report covering the System Under Test (SUT) summary, model and dataset, node-level hardware and software descriptions (including heterogeneous and disaggregated systems), and complete run data: concurrency, TTFT, TPOT, tokens\/sec, queries per second, and more.<\/li>\n\n\n\n<li><strong>Transparent, reproducible, and auditable.<\/strong> Every result is peer-reviewed and self-contained. The detail you see here is the same detail available to anyone \u2014 buyers, analysts, and the industry at large.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Goals<\/h2>\n\n\n\n<p><a href=\"https:\/\/mlcommons.org\/benchmarks\/inference-datacenter\/\">MLPerf Inference<\/a> is the industry standard benchmark for AI system performance and efficiency. MLPerf Endpoints extends this foundation for the Gen AI era, delivering on six key goals:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Velocity<\/strong> \u2014 shift from a fixed bi-annual schedule to a continuous and flexible rolling submission process; easier to set up, run, and submit means more tests in less time; rapidly update the suite with 0-day support for new models and platforms, meaning vendors can include real MLPerf Endpoints results in new product launch material and buyers can request MLPerf Endpoints scores in RFPs<\/li>\n\n\n\n<li><strong>Clarity<\/strong> \u2014 measure performance mirroring real customer deployment experience; Pareto curves measure performance across a broader range of use cases; easier to understand results and compare across systems visually, making MLPerf Endpoints data more accessible to a wide range of users (including system purchasers)<\/li>\n\n\n\n<li><strong>API endpoint-centric architecture<\/strong> \u2014 simplified, production-ready, lightweight, and decoupled; if a system has an API, it can be benchmarked; measures everything from on-prem systems to managed endpoints<\/li>\n\n\n\n<li><strong>Standardized Pareto curves<\/strong> \u2014 each run captures TTFT, Throughput, Interactivity, and Query Latency; customers can match results to different production use cases (e.g., high usage during the day, low at night)<\/li>\n\n\n\n<li><strong>Broad participation<\/strong> \u2014 diverse members and solutions, encompassing developers, enterprise buyers, CSPs, OEMs, and open-source contributors<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Principles<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Relevant<\/strong>\n<ul class=\"wp-block-list\">\n<li>Focus on the important problems; mirror customer deployments to ensure relevance<\/li>\n\n\n\n<li>GenAI performance is a complex, non-linear, and multi-dimensional surface<\/li>\n\n\n\n<li>Real-world traffic involves &#8220;long-tail&#8221; queries and latency always explodes as utilization peaks; measuring averages ignores reality<\/li>\n\n\n\n<li>Pareto curves accurately and realistically measure performance for the full range of customer use cases<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Fair and neutral<\/strong>\n<ul class=\"wp-block-list\">\n<li>MLCommons is a non-profit, committed to neutrality and fairness for all<\/li>\n\n\n\n<li>Clear rules for submissions under different categories, which apply to all submitters<\/li>\n\n\n\n<li>Architectural neutrality across hardware, software stacks, and deployment models<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Reproducible<\/strong>\n<ul class=\"wp-block-list\">\n<li>Extensive rules and well-documented workloads; robust peer-review of results with auditing<\/li>\n\n\n\n<li>Decoupled client-server architecture \u2014 submitted results are self-contained and reproducible by third parties<\/li>\n\n\n\n<li>Enables customers and the whole ecosystem to trust results and develop best practices<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Inclusive<\/strong>\n<ul class=\"wp-block-list\">\n<li>Well-structured governance with robust participation that drives industry consensus<\/li>\n\n\n\n<li>Open-source codebase, broadly accessible; standard OpenAI-compatible API interface<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Call for Participation<\/h2>\n\n\n\n<p>In an era of black-box benchmarks, <a href=\"https:\/\/sd5pd.share.hsforms.com\/2yI9ulJRFT0ajNfRYPmTBPA\">help us build<\/a> the one the industry actually trusts. Rolling submissions start in <strong>Q2 2026<\/strong>.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Enterprise &amp; IT Buyers<\/strong> \u2014 influence the standard by joining the advisory council; ensure the gold-standard benchmark tests what matters to you; benefit from trusted, up-to-date performance data that simplifies RFP requirements<\/li>\n\n\n\n<li><strong>Infrastructure &amp; Software<\/strong> (OEMs, CSPs, ODMs) \u2014 demonstrate leadership by shaping the spec and adding results to the rolling leaderboard<\/li>\n\n\n\n<li><strong>Model Developers &amp; API Providers<\/strong> \u2014 scale the roadmap by integrating next-gen SOTA models and build managed roadmaps for cloud endpoints<\/li>\n\n\n\n<li><strong>Researchers &amp; Community<\/strong> \u2014 anchor your science by using MLPerf Endpoints for reproducible basepoints and contribute feedback<\/li>\n<\/ul>\n<\/div>\n\n\n\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button is-style-alt-hover\"><a class=\"wp-block-button__link wp-element-button\" href=\"https:\/\/sd5pd.share.hsforms.com\/2yI9ulJRFT0ajNfRYPmTBPA\">Join us!<\/a><\/div>\n<\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n","protected":false},"excerpt":{"rendered":"<p>Measures how fast systems can process inputs and produce results using a trained model.<\/p>\n","protected":false},"featured_media":0,"parent":0,"menu_order":4,"template":"","bm-cats":[82],"class_list":["post-3833","benchmarks","type-benchmarks","status-publish","hentry","bm-cats-mlperf"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Benchmark MLPerf Endpoints<\/title>\n<meta name=\"description\" content=\"The MLPerf Endpoints \u2014 a new approach to benchmarking generative AI services that reflects how they&#039;re actually deployed.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/mlcommons.org\/benchmarks\/endpoints\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Benchmark MLPerf Endpoints\" \/>\n<meta property=\"og:description\" content=\"The MLPerf Endpoints \u2014 a new approach to benchmarking generative AI services that reflects how they&#039;re actually deployed.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/mlcommons.org\/benchmarks\/endpoints\/\" \/>\n<meta property=\"og:site_name\" content=\"MLCommons\" \/>\n<meta property=\"article:modified_time\" content=\"2026-03-19T21:25:37+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/mlcommons.org\/wp-content\/uploads\/2026\/03\/2kngif.gif\" \/>\n\t<meta property=\"og:image:width\" content=\"1080\" \/>\n\t<meta property=\"og:image:height\" content=\"608\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/gif\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/mlcommons.org\\\/benchmarks\\\/endpoints\\\/\",\"url\":\"https:\\\/\\\/mlcommons.org\\\/benchmarks\\\/endpoints\\\/\",\"name\":\"Benchmark MLPerf Endpoints\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/mlcommons.org\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/mlcommons.org\\\/benchmarks\\\/endpoints\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/mlcommons.org\\\/benchmarks\\\/endpoints\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/mlcommons.org\\\/wp-content\\\/uploads\\\/2026\\\/03\\\/2kngif.gif\",\"datePublished\":\"2026-03-19T18:59:42+00:00\",\"dateModified\":\"2026-03-19T21:25:37+00:00\",\"description\":\"The MLPerf Endpoints \u2014 a new approach to benchmarking generative AI services that reflects how they're actually deployed.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/mlcommons.org\\\/benchmarks\\\/endpoints\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/mlcommons.org\\\/benchmarks\\\/endpoints\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/mlcommons.org\\\/benchmarks\\\/endpoints\\\/#primaryimage\",\"url\":\"https:\\\/\\\/mlcommons.org\\\/wp-content\\\/uploads\\\/2026\\\/03\\\/2kngif.gif\",\"contentUrl\":\"https:\\\/\\\/mlcommons.org\\\/wp-content\\\/uploads\\\/2026\\\/03\\\/2kngif.gif\",\"width\":1080,\"height\":608},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/mlcommons.org\\\/benchmarks\\\/endpoints\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/mlcommons.org\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"MLPerf Endpoints\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/mlcommons.org\\\/#website\",\"url\":\"https:\\\/\\\/mlcommons.org\\\/\",\"name\":\"MLCommons\",\"description\":\"Better AI for Everyone\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/mlcommons.org\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Benchmark MLPerf Endpoints","description":"The MLPerf Endpoints \u2014 a new approach to benchmarking generative AI services that reflects how they're actually deployed.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/mlcommons.org\/benchmarks\/endpoints\/","og_locale":"en_US","og_type":"article","og_title":"Benchmark MLPerf Endpoints","og_description":"The MLPerf Endpoints \u2014 a new approach to benchmarking generative AI services that reflects how they're actually deployed.","og_url":"https:\/\/mlcommons.org\/benchmarks\/endpoints\/","og_site_name":"MLCommons","article_modified_time":"2026-03-19T21:25:37+00:00","og_image":[{"url":"https:\/\/mlcommons.org\/wp-content\/uploads\/2026\/03\/2kngif.gif","width":1080,"height":608,"type":"image\/gif"}],"twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/mlcommons.org\/benchmarks\/endpoints\/","url":"https:\/\/mlcommons.org\/benchmarks\/endpoints\/","name":"Benchmark MLPerf Endpoints","isPartOf":{"@id":"https:\/\/mlcommons.org\/#website"},"primaryImageOfPage":{"@id":"https:\/\/mlcommons.org\/benchmarks\/endpoints\/#primaryimage"},"image":{"@id":"https:\/\/mlcommons.org\/benchmarks\/endpoints\/#primaryimage"},"thumbnailUrl":"https:\/\/mlcommons.org\/wp-content\/uploads\/2026\/03\/2kngif.gif","datePublished":"2026-03-19T18:59:42+00:00","dateModified":"2026-03-19T21:25:37+00:00","description":"The MLPerf Endpoints \u2014 a new approach to benchmarking generative AI services that reflects how they're actually deployed.","breadcrumb":{"@id":"https:\/\/mlcommons.org\/benchmarks\/endpoints\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/mlcommons.org\/benchmarks\/endpoints\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/mlcommons.org\/benchmarks\/endpoints\/#primaryimage","url":"https:\/\/mlcommons.org\/wp-content\/uploads\/2026\/03\/2kngif.gif","contentUrl":"https:\/\/mlcommons.org\/wp-content\/uploads\/2026\/03\/2kngif.gif","width":1080,"height":608},{"@type":"BreadcrumbList","@id":"https:\/\/mlcommons.org\/benchmarks\/endpoints\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/mlcommons.org\/"},{"@type":"ListItem","position":2,"name":"MLPerf Endpoints"}]},{"@type":"WebSite","@id":"https:\/\/mlcommons.org\/#website","url":"https:\/\/mlcommons.org\/","name":"MLCommons","description":"Better AI for Everyone","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/mlcommons.org\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"}]}},"meta_box":[],"_links":{"self":[{"href":"https:\/\/mlcommons.org\/wp-json\/wp\/v2\/benchmarks\/3833","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mlcommons.org\/wp-json\/wp\/v2\/benchmarks"}],"about":[{"href":"https:\/\/mlcommons.org\/wp-json\/wp\/v2\/types\/benchmarks"}],"wp:attachment":[{"href":"https:\/\/mlcommons.org\/wp-json\/wp\/v2\/media?parent=3833"}],"wp:term":[{"taxonomy":"bm-cats","embeddable":true,"href":"https:\/\/mlcommons.org\/wp-json\/wp\/v2\/bm-cats?post=3833"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}