Orchestrating AI Models in Kubernetes: Deploying Ollama as a Nati... Samuel Veloso & Lucas Fernández

Samuel Veloso, Lucas Fernández

KubeCon + CloudNativeCon Europe 2025 · Session

This talk, presented by Samuel Veloso and Lucas Fernández at KubeCon EU, delves into an innovative approach for deploying Artificial Intelligence (AI) models within Kubernetes environments. Specifically, it focuses on **Ollama**, a rapidly popular command-line interface (CLI) tool designed for running large language models (LLMs) and other AI models locally with remarkable simplicity. While Ollama excels in local deployments, the speakers address the challenge of scaling and orchestrating these models in a production-grade Kubernetes cluster in a truly native fashion, moving beyond traditional Helm chart deployments.

AI review

This talk presents a genuinely novel and deeply technical approach to integrating AI models, specifically Ollama, directly into Kubernetes as first-class citizens. By leveraging RuntimeClass and a custom containerd shim, the speakers have engineered an elegant solution that bypasses traditional application-layer deployments, treating models as native container runtimes. This work addresses a critical gap in MLOps, enabling scalable, private, and Kubernetes-idiomatic orchestration of LLMs, making it a foundational piece of research for anyone serious about AI infrastructure.

Watch on YouTube