Introducing llmaz: Easy, advanced inference platform for large language models on Kubernetes

InftyAI’s llmaz is an advanced inference platform designed to streamline the deployment and management of large language models (LLMs) on Kubernetes. By integrating state-of-the-art inference backends, llmaz brings cutting-edge research to the cloud, offering a production-ready solution for LLMs.

Key Features of llmaz:

  • Kubernetes Integration for easy to use: deploy and manage LLMs within Kubernetes clusters, leveraging Kubernetes’ robust orchestration capabilities.
  • Advanced Inference Backends: Utilize state-of-the-art inference backends to ensure efficient and scalable model serving.
  • Production-Ready: Designed for production environments, llmaz offers reliability and performance for enterprise applications.

The deployment of a model is quite simple in llmaz.

Here’s a toy example for deploying deepseek-ai/DeepSeek-R1, all you need to do is to apply a Model and a Playground.

apiVersion: llmaz.io/v1alpha1
kind: OpenModel
metadata:
  name: opt-125m
spec:
  familyName: opt
  source:
    modelHub:
      modelID: deepseek-ai/DeepSeek-R1
  inferenceConfig:
    flavors:
      – name: default # Configure GPU type
        requests:
          nvidia.com/gpu: 1

apiVersion: inference.llmaz.io/v1alpha1
kind: Playground
metadata:
  name: opt-125m
spec:
  replicas: 1
  modelClaim:
    modelName: opt-125m

Latest Release: v0.1.3

The latest release, v0.1.3, was released on April 23th, 2025. The release v0.1 includes several enhancements and bug fixes to improve the platform’s stability and performance. For detailed information on the changes introduced in this release, please refer to the release notes.

Integrations

Broad Backends Support:  llmaz supports a wide range of advanced inference backends for different scenarios, like vLLM, Text-Generation-Inference, SGLang, llama.cpp. Find the full list of supported backends here.

llmaz supports a wide range of model providers, such as HuggingFace, ModelScope, ObjectStores.

AI Gateway Support: Offering capabilities like token-based rate limiting, model routing with the integration of Envoy AI Gateway.
Build-in ChatUI: Out-of-the-box chatbot support with the integration of Open WebUI, offering capacities like function call, RAG, web search and more, see configurations here

llmaz, serving as an easy to use and advanced inference platform, uses LeaderWorkerSet as the underlying workload to support both single-host and multi-host inference scenarios.

llmaz supports horizontal scaling with HPA by default and will integrate with autoscaling components like Cluster-Autoscaler or Karpenter for smart scaling across different clouds.

About the Founder: Kante Yin

Kante Yin is a prominent figure in the Kubernetes community, serving as a SIG Scheduling Approver and a top committer of LWS and Kueue. His contributions to Kubernetes scheduling and workload management have been instrumental in advancing cloud-native technologies. Kante’s expertise and leadership continue to drive innovation in the Kubernetes ecosystem.

Compared to other inference platforms, llmaz stands out with its extensionable cloud-native design, making it incredibly lightweight and efficient. Its architecture is optimized for scalability and resource efficiency, enabling seamless integration into modern cloud environments while maintaining high performance.

OSPP 2025 (Open Source Software Supply)

The Open Source Promotion Plan is a summer program organized by the Open Source Software Supply Chain Promotion Plan of the Institute of Software Chinese Academy of Sciences in 2020. It aims to encourage university students to actively participate in the development and maintenance of open source software, cultivate and discover more outstanding developers, promote the vigorous development of excellent open source software communities, and assist in the construction of open source software supply chains.

llmaz has 2 projects in OSPP 2025. Student Registration and Application: May 9 – June 9. Welcome to our community.

  1. KEDA-based Serverless Elastic Scaling for llmaz
  2. Enabling Efficient Model and Container Image Distribution in LLMaz with Dragonfly

For more information about llmaz and its features, visit the GitHub repository.

Introducing llmaz: Easy, advanced inference platform for large language models on Kubernetes

KubeCon 现场见闻:从 HeadLamp 到 MCP 热潮

本次大会上,不仅有各类技术项目的精彩展示,还有不少轻量级工具与社区项目引起了广泛关注。以下是我对部分亮点的整理与感受。

HeadLamp

作为 Kubernetes 社区项目,正以极具竞争力的姿态亮相。

  • 替代方案优势:HeadLamp 的功能已经足够丰富,可以取代传统的 kube dashboard 以及 KubeSphere部分功能。
  • 微软风格的桌面体验:Keynote 演示中,微软展现了将其打造为每个 Kubernetes 用户的必备桌面应用的意图。与 Lens 等竞品相比,其轻量、便捷、支持自部署的特点给人留下深刻印象——用户只需将各个集群的 token 或证书添加进来,即可快速上手管理。

ETCD Operator

关注热度与现状:项目启动之初吸引了不少关注,但目前实际参与者非常少,正处于 “help wanted” 阶段。

跨领域协作与挑战

弹性指标定义挑战:在 LLM 场景下如何定义弹性指标仍是一大难题

当前对 s3 modeling 的支持让人颇感无奈,这些都为社区未来的设计改善留下了想象空间。

推理领域: vllm production stack、KServer 、AIBrix 、 llmaz 的对比,目前感受上 KServe 有很多历史包袱,迫于之前用户和产品的要求很难直接做颠覆性重构,这也带来了一种担忧。AIBrix 和 llmaz 都是刚刚起步,AIBrix 有字节背书;llmaz 的目标则是更轻量。

MCP 热潮 —— 新项目与支持的风起云涌

本周又见一波 MCP 热潮:

另外,clusterpedia 的也需要相关方案,由 manusa 和 silenceper 推出的 kube MCP 项目也在积极探索中。

项目多元化:不少新建热门项目纷纷加入 MCP 支持行列,有项目直接在已有项目中添加 MCP 支持。

实例展示

dagger 与 MCP 的整合

k8sgpt 的 MCP 讨论

Steering 年报与 SIG 动态

一些需要更贡献者参与的点, Steering 团队基于各个 SIG/WG 的年报进行了总结:需要帮助的项目:各 SIG 维护者在年报中均提到若干待解项目和功能点,展现了社区在持续迭代和改进中的求助信号。

展台与 Demo Theater 的亮点

现场展区同样吸引了众多关注:

  • Wiz 展台:展示了丰富的安全工具 UI,从演示中可以感受到其在安全领域的坚实基础。
  • Demo Theater:不少 sponsor 的演示项目大放异彩,如 Google 现场展示的 65k node 演示给与会者留下了深刻印象。
  • 热门展区主题:当前最受欢迎的包括可观测性、安全、AI + Gateway 等方向,另有 Kubeflow 专题也成为亮点之一。

会场全景与个人感受

在现场的诸多体验中,也有一些值得一提的地方:

  • 行程安排的小插曲:第一天一早到达时状态不佳,参加 Maintainer Summit 时主要参与了 Steering 的 AMA。同时,预定的民宿体验不如预期,提示大家尽量避免使用 Booking.com 订房。
  • 最佳 End User 奖项:来自海外的 KubeCon 首次将此奖项授予中国企业蚂蚁集团,此前国内的京东和滴滴曾获得该奖,但是实在 KubeCon China。评判标准更多聚焦在社区贡献上。
  • 国际视野:日本专场圆桌中可见维护者数量和议题均有所增加。
  • End User: 欧洲 KubeCon 的 End User 演讲更是覆盖了从工业到农业的各个领域。
  • 会议场地设计:此次会场的房间布局略显“奇特”——Room A-H 人气颇旺,而部分在三楼或隐蔽位置(如 ROOM IJ)的场地则稍显冷清。
  • 演讲大会回顾
    • 热门主题:AI 相关议题(如 LLM、Ollama、Benchmark、DRA、k8sgpt)备受关注;此外,Argo、Cilium、Otel 与 Platform Engineering 也都有不少观众。
    • 项目展示:Project Lighting 场次人气高涨;而在其他项目中,keynote 中呈现的 honeycomb 效果不错,karpenter、cluster-api 与 vCluster(我的主题)等均引发了会后踊跃讨论。
    • 部分领域的冷场:部分偏维护者或底层技术的主题,如存储,本次的参与度和热度相对较低。
  • 餐饮体验:与过去相比,现场的用餐体验似乎又回到了曾经的“难吃”状态(巴黎好像饮食非常棒),主要是冷。

本次活动超过12500人参加,是史上新高,云原生的热潮似乎并没有冷却,加油👍

KubeCon 现场见闻:从 HeadLamp 到 MCP 热潮

llmaz: Revolutionizing LLM Deployment on Kubernetes

In the rapidly evolving field of AI, large language models (LLMs) are powering applications from intelligent chatbots to content generation engines. However, deploying these models at scale comes with significant challenges—ranging from resource management and scalability to performance optimization. This is where llmaz comes into play. Developed by InftyAI, llmaz is a production-ready inference platform designed to simplify the deployment of LLMs on Kubernetes. In this post, we’ll explore what makes llmaz unique and compare it with two other notable platforms: AIBrix and KServe.

Introducing llmaz

llmaz is built with ease-of-use and high performance in mind. Its main goal is to remove the complexity of deploying LLMs in production environments. Key features include:

  • Easy Deployment: llmaz enables users to launch LLM services with minimal configuration, making it accessible even to teams without deep Kubernetes expertise.
  • Multiple Inference Backends: The platform supports a variety of backends such as vLLM, Text-Generation-Inference (TGI), SGLang, and even llama.cpp, offering flexibility to optimize for different performance requirements.
  • Model Caching and Distribution: With an out-of-the-box model cache system powered by Manta, llmaz optimizes resource usage and accelerates model loading across clusters.
  • Accelerator Fungibility: llmaz allows a single LLM to be served on multiple types of hardware accelerators. This feature is crucial for optimizing both cost and performance.
  • Advanced Inference Techniques: Incorporating cutting-edge methods like speculative decoding and splitwise, llmaz improves inference efficiency and overall throughput.

These features position llmaz as a specialized solution for organizations that need to deploy LLMs efficiently on Kubernetes, providing a robust platform that handles the heavy lifting of infrastructure management.

Comparing llmaz, AIBrix, and KServe

While llmaz focuses specifically on LLM deployment, it exists in an ecosystem with other notable platforms. Here’s how llmaz compares with AIBrix and KServe:

AIBrix: Scalable GenAI Inference

AIBrix is an open-source platform that emphasizes scalable and cost-efficient GenAI inference. Its core strengths include:

  • High-Density LoRA Management: Designed to support lightweight, low-rank adaptations of models, enabling efficient resource utilization.
  • LLM Gateway and Dynamic Autoscaling: Features a robust routing system to manage traffic across multiple replicas and dynamically scales inference resources based on real-time demand.
  • Heterogeneous GPU Inference: Optimizes deployments by supporting a mix of GPU types to achieve cost-effective performance.

AIBrix’s battery-included approach makes it an attractive option for enterprises looking to handle large workloads without compromising on scalability or cost efficiency.

KServe: General-Purpose ML Inference

KServe, part of the Kubeflow ecosystem, offers a standardized, serverless platform for ML inference. While it can serve LLMs, its design is broader, targeting a wide range of ML models:

  • Versatile Model Serving: KServe deploys any ML model as a scalable service, with support for multiple frameworks.
  • Auto Scaling and Traffic Management: It automatically scales model replicas based on traffic, ensuring smooth performance during peak loads.
  • Robust Monitoring and Security: With integrated metrics, monitoring, and enterprise-grade security features, KServe is well-suited for complex production environments.

Although KServe is versatile and benefits from a large community (with over 2000 GitHub stars), it might require additional configuration to fully optimize LLM-specific tasks compared to llmaz and AIBrix.

Detailed Comparison

Below is a table that highlights the key aspects of each platform:

AspectllmazAIBrixKServe
SpecializationTailored for LLM inference on KubernetesFocused on scalable GenAI inference, especially for LLMsGeneral ML inference platform; can support LLMs with extra setup
Ease of UseMinimal configuration; highly user-friendlyBattery-included approach; designed for enterprise useMay require additional configuration for LLM-specific optimizations
Supported BackendsvLLM, TGI, SGLang, llama.cpp, etc.Built on vLLM with specific optimizationsSupports various ML frameworks; custom integration may be needed
Model ManagementIntegrated model cache (powered by Manta)Unified AI runtime for model downloading and managementModel serving and basic caching mechanisms; specifics vary
Accelerator SupportSupports diverse hardware accelerators for cost/performance optimizationHeterogeneous GPU inference for cost-effective deploymentsGeneral support; lacks LLM-specific accelerator optimizations
ScalabilityHorizontal scaling with Kubernetes HPA, Cluster-Autoscaler, etc.Advanced autoscaling with distributed inference for large workloadsAutomatic scaling based on traffic; integrated within the Kubeflow ecosystem
Community & AdoptionActive, though in an alpha stage (~100 GitHub stars)Growing community (~500 GitHub stars); enterprise-focusedMature and widely adopted (>2000 GitHub stars); part of Kubeflow ecosystem
LicensingApache License 2.0Apache License 2.0Apache License 2.0

Source: llmaz GitHub Page , AIBrix Documentation , KServe GitHub Page

Key Considerations

  • Specialization vs. Versatility:
    llmaz and AIBrix are designed with LLMs in mind, offering advanced optimizations like speculative decoding and specialized autoscaling. KServe, meanwhile, is a more general-purpose platform that can handle a variety of ML models but might need extra customization for LLM workloads.
  • Community and Support:
    A larger community can be a double-edged sword. KServe benefits from extensive support and mature integrations, while llmaz, though currently smaller in community size, offers early adopters a chance to influence its development direction.
  • Cost and Performance:
    llmaz and AIBrix both excel at optimizing resource usage through features like accelerator fungibility and heterogeneous GPU support. For organizations with intensive LLM requirements, these optimizations could lead to significant cost savings and performance improvements.
  • Licensing and Flexibility:
    All three platforms are open-source and licensed under Apache License 2.0, allowing for high flexibility and customization without proprietary constraints.

Conclusion

llmaz is paving the way for a new era in LLM deployment on Kubernetes, offering an accessible yet powerful solution for managing large language models in production. Its design focuses on minimizing configuration overhead while maximizing performance through advanced features and multi-backend support. When compared to AIBrix and KServe, llmaz stands out for its LLM-centric approach, making it an excellent choice for organizations focused on next-generation AI applications.

  • llmaz is ideal if you’re looking for a solution tailored specifically for LLMs.
  • AIBrix might appeal more if you require enterprise-grade scalability and cost efficiency.
  • KServe is the go-to for broader ML inference scenarios with a robust, general-purpose framework.

As AI continues to evolve, choosing the right platform depends on your specific needs—whether it’s ease of deployment, advanced scalability, or versatility. With all three platforms being open source, organizations have the flexibility to experiment, integrate, and scale their AI applications without being locked into proprietary systems.

Happy deploying, and may your models always infer efficiently!

llmaz: Revolutionizing LLM Deployment on Kubernetes

2024年年终小节

今年花在社区的时间还是挺少的。一方面工作侧重点还是发生了不少变化,同时和老婆养娃的一年还是花了不少时间的。记录下今年的流水账。娃从5个月到现在17个月,这个变化实在太大了。从会走路、会说话、会叫爸爸、第一次带她出去玩,第一次坐飞机火车,也包括生病,一路过来老婆真的是太辛苦了,也逐渐进入了“不要不要”的时间段,非常有自己的想法,同时也非常善解人意和有礼貌,可爱同时调皮,和我老婆小时候确实太像了,在学习如何教育和很多育儿知识的过程里,自己也成长了很多。

1月尝试 SIG-Release 的 Signal Team Lead,1-4月的 v1.30 发布周期参与了不少 release team 的工作,收获很多,了解了 release 团队以及 shadow program(培养新人)的过程,不过自己因为时差和沟通能力问题,还是会觉得这块对自己有些难度和挑战。而且大量的事务会非常影响日常的功能开发跟进情况。

3月 KCD 上海 + KubeCon 巴黎,巴黎之旅因为时差问题状态一直不太好,第一次可能也是唯一一次参与 Steering AMA,再次参加 KCS 依然很开心见到那么多 Kubernetes 社区的维护者和贡献者,不同是这次还见到很多新参与的 release 团队的不少人。不过后面三天的感受除了疲劳都快忘光了,感觉大会结束自己才刚倒完时差。

Kubernetes Steering Committee: Genesis, Bootstrap, Now & Future – Nabarun Pal & Paco Xu

3月KCD 上海 云原生新手和开源教育分论坛 01-Kubernetes 社区:从新手到影响者,邀请了 Nikhita 和 Madhav 录了视频参与到我的演讲中,也希望通过这样的分享吸引到更多的人参与到社区中来,KCD 上海也开始了解到国内的学生参与开源的情况。一直也在尝试多参与下 Linux Foundation APAC的 SIG Education 相关的工作,今年还鼓励同事参与 LFX Mentorship 的活动,也希望更多的国内学生参与到这个培训活动中,国内开源之夏一直办的不错,也在尝试参与这部分的事情。也很有幸参与到对外经贸的开源课程,同时也看到国家对开源的重视和一些激励政策在路上,非常希望这些能够给大家打开思路,让大家能够更加 Open 的去对待很多问题,多元以及包容度的提升也许才能改变很多社会层面的问题。

8月 KubeCon + CloudNativeCon + OpenSourceSummit+AI_dev China 2024 在香港举行,第一次能在这个规模的会议上作为联席主席,非常激动,也一直在尝试把会议弄的更有吸引力,运气很好 Linus 的加入让会议热度提升了一个档次。同时几个 Keynote 还是非常精彩的,AI 含量还是有点太高了(后来也思考过很多,这个确实取决于时代发展,不可避免的大家都会去做一些和 AI 相关的内容,作为悲观主义者,还是对未来挺悲观的,AI编程能力的提升和成本下降,未来程序员这个职业的影响还是挺大的,如何使用AI帮助各个行业,如何和 AI 融合可能是未来的重要课题,目前我们也正在推动 AI 的持续崛起,而之后我们将面对的可能是一个激烈变化的时代,奇点也许就要来临了,在我们的有生之年)。尝试组织了一次 Kubernetes 十周年圆桌,Kubernetes Community Panel: A Decade of Evolution and Future Trends,感觉还是非常有意义的。

Keynote: Kubernetes Community and Cloud Native Activities in China – Paco Xu & Wei Cai 第一次 Keynote 还是有些紧张,没有讲什么技术主题,最近的重点可能加入 Steering 之后,自己的视角也一直是社区的成长和发展,社区的可持续问题等。未来还是需要更多时间去沉淀基础。

另外第一次去香港,感受还是蛮多的,我个人还是很喜欢香港这个城市的,从之前看 TVB 很多很多电视剧,到香港电影,对香港各种文化还是有挺深的感情的。第一次去最大的感受还是维港确实非常漂亮,尤其是这个夜景。可能我接触的人还比较少,但是总体还是能感受到很多友好的互动的,带娃去确实有点累,当时娃还不会走路,要一直抱着,还不喜欢坐车😄,困了小宝宝就是要找妈妈,所以把媳妇累得不轻。以后可能还是想多去香港感受感受。

🔊 [Kubernetes Podcast from Google]: Leading Kubernetes into its Second Decade 第一次接受英文访问,体验还是很奇妙的。

十一去了趟南京,毕业后这都12年了,时间过得也太快了。熟悉又陌生的感觉,鸡鸣寺玄武湖除了人有点多,还是很不错的,可惜没时间去中山陵溜达溜达。也没有时间在学校多转转。下次再去不知道是啥时候了。

11月 KubeCon 盐湖城 担任了 Track Chair,之后也在 欧洲 KubeCon 负责这个方向的 Track Chair, 这个过程还是学到了很多,欧洲的竞争竟然比北美大不少,这个是没想到的。不过总体感觉北美维护者数量和议题质量并不输。一直没机会现场参与北美 KubeCon 还是挺遗憾的,也许签证是个很大的壁垒。

🎗️🌍 Kaiyuanshe 2024 China Open Source Pioneers 33 能入选开源先锋还是很意外的,还不清楚是谁提名了我,非常感谢。

25年有全球五次 KubeCon,估计事情会比24年还要多,可能也需要适当看下怎么平衡会议和社区,以及其他。还是很期待能去到伦敦的。提一下关于 KubeCon 投搞的,还挺感谢几个客户的,每次 OnCall 刨根问底,也让我对一些领域有了更深刻的理解和反思,23年关于 Pod 启动的主题和25年关于大规模和多租户的主题都是在 OnCall过程实际和客户解决问题的时候产生的想法。

今年好像没怎么看足球比赛,瓦伦西亚已经到了降级区,又听说梅斯塔利亚新球场在修了,希望能赶在新球场启用之前去一次自己的主队球场看球(曾经做梦在里面踢过球的地方)。今年利物浦给人眼前一亮,西班牙夺冠但是今年欧洲杯没怎么看,亚马尔确实很不错,但是好像已经对西班牙没有之前的热情了。

今年看电子竞技吃鸡比赛还挺多的,17 Gaming 今年除了春季赛表现不错,虽然 PGS 后面2次拿了第二,但是总体感觉这个队伍已经到头了,年底再重组,似乎 xdd mingming 25年可期,17 就算了。

今年没怎么追剧,追了几集《边水往事》,中间倒是重看了段棋魂电视剧,还看了动漫版本,对围棋重新开始了一下,不过水平有限,不过最近柯洁这事情实在让人愤慨。

最近主要的娱乐和视野拓宽全靠电台、播客了,听了很多五代十国、南北朝野史下酒,也听了很多忽左忽右,之前挺喜欢“帆看世界”聊西甲和体坛那帮人瞎侃,后来基本不怎么挺足球了,开始听日谈和其他一些节目了。

最后回答40个问题 https://stephango.com/40-questions 中英文混着回答吧😄

  1. What did you do this year that you’d never done before?
    • For KubeCon, I worked as Co-chair of KubeCon China and Track Chair of KubeCon NA/EU。选择困难的我来说,评审委员会的工作总是会比其他人用时更长。另外就是track chair 和 co-chair都是基于其他评委的评分去评价,这时候也会去 Judge 这些评委的认真程度,很多时候我可能把时间都消耗在从评分较低的里面发现一些惊艳的内容并求证。但是遇到非常接近的情况,实在是很难抉择和平衡。(也出现了一次重大沟通失误,导致可能我的部分评分没有被 Co-Chair 收到。)
    • 法国的 Steering AMA 现场英文回答问题,是第一次。
    • 第一次去广州、香港、法国。前两个还是带娃去的。还在十一和春节带娃回了两次老家。
    • 第一次书籍分享 奇点临近(油条咖啡读书俱乐部)
  2. Did you keep your new year’s resolutions?
    • 减肥失败是肯定的😓
    • 家人身体好像也都变差了😓 新的一年可能需要真的重视下了,如何健康生活
  3. Did anyone close to you give birth?
    • No
  4. Did anyone close to you die?
    • No
  5. What cities/states/countries did you visit?
    • Pairs, France(KubeCon),卢浮宫太大了根本逛不完,铁塔旁边的河清晨跑步也太棒了。
    • Hong Kong(KubeCon), 维多利亚港真美。
    • Guangzhou, a nice city with so many flowers。
  6. What would you like to have next year that you lacked this year?
    • Soccer. 24年可能踢球在15次以内,希望25年能比较规律的运动一下。
    • 更专注在项目上一点,目前花在活动(演讲、评审、组织活动)上的时间有点太多了。
  7. What date(s) from this year will remain etched upon your memory, and why?
    • August 21-23: to host the KubeCon+CloudNativeCon China 2024 as a co-chair is the first one, and I have a keynote as well. I also meet Linus there.
  8. What was your biggest achievement of the year?
  9. What was your biggest failure?
    • SIG Release Shadow Program in Q1. I think I did not make a good job as the release signal team lead in the release cycle. This process exposed my huge communication problems.
  10. What other hardships did you face?
    • Balancing life and work.
    • Balancing Steering, kubeadm, SIG-Node, Cloud-native Activities/Conferences.
  11. Did you suffer illness or injury?
    • Do influenza, COVID-19, and rhinitis count?
  12. What was the best thing you bought?
    • Beats headphones and Apple watch.
  13. Whose behavior merited celebration?
    • My wife. Busy on her work and our daughter without any rest.
  14. Whose behavior made you appalled?
    • 好的方面: AI
    • 坏的方面: 很多国际事件,俄乌、加沙、缅甸泰国电诈这些事
  15. Where did most of your money go?
    • My daughter. (1 year old)

问题太多了,准备下次再补充。。。

  1. What did you get really, really, really excited about?
  2. What song will always remind you of this year?
  3. Compared to this time last year, are you: happier or sadder? Thinner or fatter? Richer or poorer?
    • both(happier most time, but sadder in other times)、fatter(80kg to 85kg)、poorer(房价下跌📉)
  4. What do you wish you’d done more of?
    • More coding and tech deep dive; More involvement in new contributor orientation
  5. What do you wish you’d done less of?
    • Less on events
  6. How are you spending the holidays?
    • Family.
  7. Did you fall in love this year?
  8. Do you hate anyone now that you didn’t hate this time last year?
  9. What was your favorite show?
  10. What was the best book you read?
  11. What was your greatest musical discovery of the year?
  12. What was your favorite film?
  13. What was your favorite meal?
  14. What did you want and get?
  15. What did you want and not get?
  16. What did you do on your birthday?
    • It is also my daughter’s 1-year-old birthday.
  17. What one thing would have made your year immeasurably more satisfying?
  18. How would you describe your personal fashion this year?
  19. What kept you sane?
  20. Which celebrity/public figure did you admire the most?
  21. What political issue stirred you the most?
  22. Who did you miss?
  23. Who was the best new person you met?
  24. What valuable life lesson did you learn this year?
  25. What is a quote that sums up your year?

2024年年终小节

2023 年终总结 工作篇

1月,CNCF Blog 发布 Worth the wait – Kubernetes Community Days Chengdu 2022,总结了下 22年组织 KCD 成都疫情原因从年初推迟到年底,组织能力严重短板,活动不算成功。但是过程中跟市场部门的同事学到了很多,难得和成都开源的小伙伴能够聚聚。

2月,作为 kubeadm reviewer 确认维护者主题 Kubeadm Deep Dive 中选 KubeCon 荷兰,感谢 Lubomir 的支持和 Rohit 的发起。之后花了大量时间练习英语口语。

4月,2023 KubeCon EU: Kubeadm Deep Dive – Rohit Anand, NEC & Paco Xu, Dao Cloud 第一次英文演讲,欧洲 KubeCon 的经历还是很赞的,会议前一天就路边偶遇 Antonio Ojea 被认出来;会议第一天找会议室路上遇到了 Jordan Liggitt,KCS 的体验真的很棒,见到了很多网友,大部分都是社区的维护者给我很多帮助。没想到欧洲kubecon人山人海到这个程度,kubeadm 的维护者主题竟然都能坐满,是我完全没想到的。时间有限,基本上只看了 Kubernetes 的维护者主题,其他主题很难坐下来听很久,收获还是很多的。

5月,成为了 Kubeadm Maintainer,也越来越发现维护者的责任会更重,对于功能开发和 PR 合并需要更多的思考。从 21 年初成为review 到 23 年一方面工作比较分散,另一方面没有做 kubeadm 什么大功能,主要去尝试了 kubeadm operator 但是场景太少最终还是放弃了。

6月,第一次参与 KubeCon 审稿,Program Committee Member of KubeCon China 2023,感觉大大的扩展了自己的知识面和视野,更加了解目前社区和行业里大家的一些实践和方向。与其说在审稿,不如说在调研学习,只是把自己喜欢和觉得价值高、应用场景广泛的议题选出来。

另外,Iceber 组织的 KCD 北京非常成功,顿时觉得这是个很不错的选择,擅长的人做擅长的事,事半功倍。

8月确认 KubeCon 上海中了两个议题,Kubernetes SIG Node Intro and Deep DiveHow Can Pod Start-up Be Accelerated on Nodes in Large Clusters? 。公司中标率相当高还是很开心的,也能跟老朋友再聚聚。

9月 月初第一次选中 Shadow,成为 v1.29 CI Signal Shadow,后续也是坚持参与,最终在下个版本发布中成为 Kubernetes v1.30 Release Signal Team Lead。

然后是 KubeCon 上海,见到很多国内的开发者和老同事,收获也很多。不过对比19年kubecon中国感觉还是冷落了很多,虽然看到了更多的公司在使用云原生,用户其实是增长了数倍,但是缺少国外贡献者的参与和分享,感觉氛围还是不如疫情前。

10月,意外选中进入 Kubernetes Steering Committee,本来的心态真的是试试看,今年运气好参加了两次贡献者峰会,而且自己又是属于多个 SIG 横跨的情况,因此认识的人可能会比较多,但是正是因为精力分散,自己的专长不够突出,也会造成 no big project and low review quality 的情况,还是需要继续改进。进入委员会,确实也了解到了更多社区治理的现状以及更多社区健康情况,一方面会有些担忧社区的进一步繁荣,另一方面也接触到了更多为 kubernetes 项目生态努力的人,和他们能有更多的深入交流和学习。

11月没能去现场参加 kubecon 也很遗憾,但是获得了 SIG-Node  2023 年 Kubernetes 贡献者 Award 还是很开心的。

最遗憾的莫过于错过 keynote,算是一大遗憾,不过希望明年能补上。

在 DaoCloud 7周年,一路走来也有很多曲折,希望能够越来越好。

12月,去无锡开放原子开发者大会的演讲 Kubernetes LTS 进展和年度升级展望,然后入选了 2024 LFAPAC Evangelist 布道者,也希望能把社区的一些活动在国内多做宣传,吸引更多学生和贡献者参与开源。

另外也进入了 Program Committee Member for KubeCon EU 2024,希望未来我们在国外 KubeCon 出现的频率越来越高,很期待2024年的各种云原生活动。

2024 希望能有更多有趣的事情可以分享

2023 年终总结 工作篇

2022 年终总结,该开始好好养生了

生活

伤病最多的一年,手上耐性估计要从 B 调到 C 了。

  • 上海封城前就准备做胃镜,因为自己肠胃一直不好。
    • 肠胃镜从 4月推到 6月,结果约成了“痛”的肠胃镜,没有麻药,是真的人生最疼的一次了,超越了自己第一次麻药没起效果就开始的外痔手术。没生过孩子,感觉可能和生孩子的疼痛感有的一拼,持续时间还不断😓。
  • 年中,痔疮+肛裂大概持续了接近 2个月时间才完全康复。
    • 术前的两周太痛苦了,
  • 年末,新冠算是第一波阳的,但是确实差不多第二波好的,持续 3周,虽然症状不严重,但是很折磨。

家人

  • 爷爷去世,爷爷一直给我的感觉就是朴实善良纯粹,这几年因为脑萎缩,一直以来也不能算是在享福。12岁很早就独立生活,参加过抗美援朝的小班长,后来大公无私的书记,一直是我的榜样。一直觉得奶奶去世后,爷爷就没有以前开心了,缺了一个能一直陪伴在身边聊天的人,陪伴太重要了。也要说一下,爸爸是最孝顺的爸爸,一直以来的照顾是真的很辛苦。
  • 疫情原因,年初春节过年没回家。7月份终于有机会回家,结果还病了很久。
  • 老婆年初做了个小手术,恢复的很顺利希望一切都好,妈妈来上海了几天。
  • 小秘密,明年再说。
  • 老婆工作变动,搬家,新环境还是挺舒服的。
  • 年初才知道年尾的房子烂尾风险巨大,第一次信访和游行,另一番体验吧,房子的事情至今还是很悬。
  • 大姨肿瘤也化疗好了之后,又扩散也进行了一次手术,都太不容易了。
  • 姥姥姥爷新冠,还好症状不严重。

工作

  • KCD 从年初推迟到年尾,人也不多,办的不尽如人意,不过过程真的还是花了很多心思,可惜没达到预期的效果,也在反思。
  • Kubecon
    • 错过了 瓦伦西亚 kubecon ,挺可惜的,最喜欢的足球队城市和自己工作的领域难得有一次交集。23年,希望能去阿姆斯特丹,看起来也是有难度的。
  • 社区进展不大,今年主要的时间还是在kubelet 和kubeadm上。kubeadm-operator 花了一些时间,但是总体感觉还是接近一个玩具。今年不少时间在看image和cgroup v2 相关的一些问题,也许明年能够有更多相关的进展。很多周边的开源项目也只是在了解的层面,没有真正去深入。

娱乐生活

今年开始养了一只小狗,名字奥利给。笨笨的昨天刚好一岁了,有点贪吃,也不算很亲近人,只爱踢萝卜,胖乎乎已经6斤多了。希望她能健康成长,乐呵呵每一天。

影视 & 游戏

今年印象比较深的是看了权游、木鱼水心的水浒解说跟了一年,还买了四大名著看看,结果没怎么看。其他比较推荐的兹山鱼谱(看看韩国人怎么思考朱子理学和科学、基督教的,还挺受益的;没有觉醒年代那么激烈,缺能让你多一点在一个“中国历史文化圈”的边缘角度,去思考何为理性路在何方。),风起陇西虽然看完了但是还是不推荐了,后面其实有点烂尾,三国迷可以看看玩。

上班地铁上把三体听掉了,也没少听准风月谈、电影侦探、体坛系的足球节目(听的最多的是帆看世界、老梁嘴硬😓)。

PUBG 比赛看了不少,最喜欢的 17 战队获得年度大满亚,只有一个第三,剩下全是亚军,其实打得还不错,但是却一点碾压感,下限很高。17shou 退役,这个战队可能会有波动,希望能够有所斩获。

足球

梅西年初罚丢点球巴黎被淘汰,年尾世界杯夺冠,世界杯而言更喜欢姆巴佩。不过说实话很难喜欢现在一些球队的强硬,最近在看梅西传,回忆起当年梅西被切尔西弄伤、被穆尼里奥淘汰还是觉得算是一种轮回,梅西本身并不喜欢那种粗野的对抗,而这次阿根廷夺冠恰恰是靠着队友相对粗野的对抗,我是真的很难喜欢阿库尼亚、帕雷德斯。德保罗这种拼劲全力,但是没那么脏的球员才是我比较喜欢的。而梅西生涯被拜仁战胜的几次,也是我认为真的靠足球赢得的比赛,德国人靠的是身体,但是不是靠的犯规和粗野,这有很大区别。

欧冠皇马 本泽马神奇的一年,今年的梅西、本泽马真的让人拍案叫绝。世界杯是本泽马很大的遗憾,当然球队很多时候团结是更重要的。此外最遗憾的就是加亚了,跟瓦伦西亚续约之后,也入选了世界杯大名单,却在训练中受伤无缘世界杯,太可惜了,95年的 Gaya 下届世界杯就要31岁了,不知道还能不能踢一次世界杯了。

最后,真的好想去现场看球,尤其是国外的一些球场,很想参与一次球队的夺冠游行,如果瓦伦能夺冠的话,上次是2019吧。


也希望国内的职业足球能够更多的去关注下球迷,去花时间做一些普及相关的事情,今年可能是最近10年踢球最少的一年了,封城+伤病至少缺席了 5个月,其他时间也没有每周踢球。

2023

  • 小秘密: 明年再说。
  • 小目标: 想出趟国,疫情前 19年去了趟日本,至少算是出过国了。
  • 健康也许才是最重要的,希望老婆新的一年顺顺利利。希望妈妈工作不要那么拼,注意身体;爸爸少喝酒误事就行。

2022 年终总结,该开始好好养生了

DNS 的一个崩溃姿势: ndots

下图来源:https://mrkaran.dev/posts/ndots-kubernetes/?utm_sq=gbj18v3zpx  关于 DNS 的问题还是有很多可以聊的,ndots/search 是其中之一。

在社区整理的 kubernetes 失败故事中,https://codeberg.org/hjacobs/kubernetes-failure-stories 提到了几个 CoreDNS 默认设置带来的潜在问题,其中 ndots 设置是一个比较常见的问题。在进行外部域名解析的时候,比如 abc.com, nodts=5 意味着所有的 search 都要进行,会给 dns 带来一些没必要的压力。在大量外部域名解析的场景下,建议 Pod 配置 ndots。

官网给出了一个手动设置ndots 的方法 https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-dns-config  

apiVersion: v1
kind: Pod
metadata:
  namespace: default
  name: dns-example
spec:
  containers:
    - name: test
      image: nginx
  dnsPolicy: "None"
  dnsConfig:
    nameservers:
      - 1.2.3.4
    searches:
      - ns1.svc.cluster-domain.example
      - my.dns.search.suffix
    options:
      - name: ndots
        value: "2"
kyverno 提供了一个策略引擎,可以默认设置 pod 的一些参数,比如 ndots。
https://kyverno.io/policies/other/add_ndots/?policytypes=Pod 
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: add-ndots
  annotations:
    policies.kyverno.io/title: Add ndots
    policies.kyverno.io/category: Sample
    policies.kyverno.io/subject: Pod
    policies.kyverno.io/description: >-
      The ndots value controls where DNS lookups are first performed in a cluster
      and needs to be set to a lower value than the default of 5 in some cases.
      This policy mutates all Pods to add the ndots option with a value of 1.      
spec:
  background: false
  rules:
  - name: add-ndots
    match:
      resources:
        kinds:
        - Pod
    mutate:
      patchStrategicMerge:
        spec:
          dnsConfig:
            options:
              - name: ndots
                value: "1"

关于 DNS,这里推荐两期 TGIK 的内容 

https://github.com/vmware-tanzu/tgik/blob/master/episodes/122/README.md

https://github.com/vmware-tanzu/tgik/blob/master/episodes/147/README.md

有兴趣可以看下。

DNS 的一个崩溃姿势: ndots

kubernetes 调度器: kube-scheduler 学习

官方文档

Kubernetes Scheduler https://kubernetes.io/docs/concepts/scheduling/kube-scheduler/

调度分两步,过滤+打分。

过滤 Filtering

  • PodFitsHostPorts: Checks if a Node has free ports (the network protocol kind) for the Pod ports the Pod is requesting.
  • PodFitsHost: Checks if a Pod specifies a specific Node by it hostname.
  • PodFitsResources: Checks if the Node has free resources (eg, CPU and Memory) to meet the requirement of the Pod.
  • PodMatchNodeSelector: Checks if a Pod’s Node Selector matches the Node’s label(s).
  • NoVolumeZoneConflict: Evaluate if the Volumes that a Pod requests are available on the Node, given the failure zone restrictions for that storage.
  • NoDiskConflict: Evaluates if a Pod can fit on a Node due to the volumes it requests, and those that are already mounted.
  • MaxCSIVolumeCount: Decides how many CSI volumes should be attached, and whether that’s over a configured limit.
  • CheckNodeMemoryPressure: If a Node is reporting memory pressure, and there’s no configured exception, the Pod won’t be scheduled there.
  • CheckNodePIDPressure: If a Node is reporting that process IDs are scarce, and there’s no configured exception, the Pod won’t be scheduled there.
  • CheckNodeDiskPressure: If a Node is reporting storage pressure (a filesystem that is full or nearly full), and there’s no configured exception, the Pod won’t be scheduled there.
  • CheckNodeCondition: Nodes can report that they have a completely full filesystem, that networking isn’t available or that kubelet is otherwise not ready to run Pods. If such a condition is set for a Node, and there’s no configured exception, the Pod won’t be scheduled there.
  • PodToleratesNodeTaints: checks if a Pod’s tolerations can tolerate the Node’s taints.
  • CheckVolumeBinding: Evaluates if a Pod can fit due to the volumes it requests. This applies for both bound and unbound PVCs

简单来看就是,端口可用、是否制定了hostname、资源充足、NodeSelector符合、存储卷相关检查、磁盘PID内存压力、节点状态、污点和容忍设置。

打分 Scoring

  • SelectorSpreadPriority: Spreads Pods across hosts, considering Pods that belonging to the same ServiceStatefulSet or ReplicaSet.
  • InterPodAffinityPriority: Computes a sum by iterating through the elements of weightedPodAffinityTerm and adding “weight” to the sum if the corresponding PodAffinityTerm is satisfied for that node; the node(s) with the highest sum are the most preferred.
  • LeastRequestedPriority: Favors nodes with fewer requested resources. In other words, the more Pods that are placed on a Node, and the more resources those Pods use, the lower the ranking this policy will give.
  • MostRequestedPriority: Favors nodes with most requested resources. This policy will fit the scheduled Pods onto the smallest number of Nodes needed to run your overall set of workloads.
  • RequestedToCapacityRatioPriority: Creates a requestedToCapacity based ResourceAllocationPriority using default resource scoring function shape.
  • BalancedResourceAllocation: Favors nodes with balanced resource usage.
  • NodePreferAvoidPodsPriority: Priorities nodes according to the node annotation scheduler.alpha.kubernetes.io/preferAvoidPods. You can use this to hint that two different Pods shouldn’t run on the same Node.
  • NodeAffinityPriority: Prioritizes nodes according to node affinity scheduling preferences indicated in PreferredDuringSchedulingIgnoredDuringExecution. You can read more about this in Assigning Pods to Nodes
  • TaintTolerationPriority: Prepares the priority list for all the nodes, based on the number of intolerable taints on the node. This policy adjusts a node’s rank taking that list into account.
  • ImageLocalityPriority: Favors nodes that already have the container images for that Pod cached locally.
  • ServiceSpreadingPriority: For a given Service, this policy aims to make sure that the Pods for the Service run on different nodes. It favouring scheduling onto nodes that don’t have Pods for the service already assigned there. The overall outcome is that the Service becomes more resilient to a single Node failure.
  • CalculateAntiAffinityPriorityMap: This policy helps implement pod anti-affinity.
  • EqualPriorityMap: Gives an equal weight of one to all nodes.

尽量分散调度、Pod亲和性、Node亲和性 Prefer设置、节点资源使用量、污点、已经有相应的镜像的节点加分等。

大规模集群的调度速度优化

Scheduler Performance Tuning: https://kubernetes.io/docs/concepts/scheduling/scheduler-perf-tuning/

Percentage of Nodes to Score 就是说在集群很大的情况下,我们没必要把所有节点进行打分,1000台机器的集群,如果有500台符合调度条件,我们只要打分其中10-20台,选择一个合适的节点调度就可以了。这个打分的范围可以用百分比进行调整。用户可以根据集群规模动态调整这个值。

区域分布不均的问题解决方法 Pod Topology Spread Constraints

https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/

spec:
  topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: zone
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
      matchLabels:
        foo: bar

上面的例子表示,根据zone 进行偏移量检查,如果有两个区 A、B,每个区都有若干台机器,然后 A区2个pod、B区1个pod。

那么调度的时候,如果 maxSkew 为1, 那么下一个pod 一定会调度到 B区。调整 maxSkew 为2或者3,下一个pod 才有可能调度到 A区。

已知的一个问题:

  • 缩容的时候可能导致容器组分布不均 Scaling down a Deployment may result in imbalanced Pods distribution.

Pod Overhead 容器组额外开销计算

https://kubernetes.io/docs/concepts/configuration/pod-overhead/

Pods have some resource overhead. In our traditional linux container (Docker) approach, the accounted overhead is limited to the infra (pause) container, but also invokes some overhead accounted to various system components including: Kubelet (control loops), Docker, kernel (various resources), fluentd (logs).

主要计算了 kubelet、docker、kernel 的额外开销,这个feature启动需要在 kubelet 以及 scheduler 等多处配置feature gate。

kube-scheduler 启动参数

https://kubernetes.io/docs/reference/command-line-tools-reference/kube-scheduler/

此外 Kubernetes 支持多个 scheduler,可以在 pod 上制定调度器

https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/

如果default 调度器不符合您的调度需求,可以自己实现调度器,并在集群内配置你的调度器,或者配置多个调度器,默认情况用 default-scheduler,特定的一些 pod 通过您定制的 scheduler 进行调度。

Kubernetes 原生的调度能力与扩展能力

基本的调度策略: 先过滤调度条件找到合适的主机列表,然后进行打分选择最高分。

下面是一个案例,案例来源 https://itnext.io/keep-you-kubernetes-cluster-balanced-the-secret-to-high-availability-17edf60d9cb7 强烈推荐

 

调度优化的开源项目

https://github.com/topics/k8s-sig-scheduling 目前有三个调度相关的项目: poseidon 、 kube-batch 、descheduler。 还有个个人项目 resbalancer

他们的适用场景各不相同,kube-batch 是批量调度场景,descheduler 是再次平衡调度类似二次平衡,poseidon 试图通过网络流量的数据影响调度让调度更合理。

Descheduler

Descheduler 的出现就是为了解决 Kubernetes 自身调度(一次性调度)不足的问题。它以定时任务方式运行,根据已实现的策略,重新去平衡 pod 在集群中的分布。

这个重新调度的任务可以作为 Kubernetes Job 执行,比如我们认为业务流量在凌晨2点最小,可以选择在这个时间点执行这个 Job,比如每周运行一次,保证集群的调度始终保持一个比较平均的效果。或者在上线日之后的几个小时进行 Descheduler。

类似的项目还有 https://github.com/pusher/k8s-spot-rescheduler, 该项目主要做的事情是在 AWS 的 kuber 集群中,把压力较大的节点上的 pod 重新调度到新的一组节点上,大概相当于给两组节点打上label,从一组重新调度到新的一组label上。

Kube-Batch

面向机器学习 / 大数据 /HPC 的批调度器(batch scheduler)。kubeflow中gang scheduler的实现就使用的是kube-batch。

https://www.jianshu.com/p/042692685cf4

kube-batch

在此基础上华为推出了 https://github.com/volcano-sh/scheduler

volcano

Poseidon (alpha https://github.com/kubernetes-sigs/poseidon/releases/tag/v0.8 五月份发布 alpha 版本之后没有更新,主分支上次更新时间 2019 Apr 4)

Kubernetes是支持第三方调度器插件的,而Firmament本身是用C++写的,Kubernetes是用Golang写的,所以Poseidon起的是桥梁的作用,把Firmament调度器集成到Kubernetes中。

Firmament是基于网络流的调度程序,它使用了高效的批处理技术,即用最小费用最大流的算法来进行优化,这种优化再加上Firmament的调度策略可以达到很好的pod放置效果。

https://zhuanlan.zhihu.com/p/35161270

如上面案例,我们此时需要 de-scheduler

https://github.com/kubernetes-sigs/descheduler

kubernetes 调度器: kube-scheduler 学习