Introduction

In the rapidly evolving world of AI, agents powered by Large-Language Models (LLMs) unlock everything from conversational support to automated content creation. Yet deploying them reliably in production remains a challenge. In this tutorial, we'll show how to stand up an LLM-driven agent on Azure AKS; every step maps 1-for-1 to AWS or GCP if that's your stack.

High-level architecture of the agent stack
Figure 1. What you'll have running by the end of this guide.

This tutorial is part of our AI-Engineering series and complements our earlier post on Introduction to AI Engineering. .

Prerequisites

To follow along, you should be familiar with:

Azure services (Key Vault, Application Insights, Log Analytics, AKS)

We'll deploy everything on Azure, but identical concepts apply to other clouds.

Kubernetes concepts and management

AKS handles scaling and rollout. Alternatives such as Docker Swarm or Mesos work too, but the Kubernetes ecosystem makes life easier.

Terraform for Infrastructure-as-Code

Terraform lets us declare-not-click our infrastructure and reuse the code on any cloud.

Helm Charts

Helm bundles our Kubernetes resources into versioned, repeatable releases.

Overview of the Stack

We'll deploy our agent on an Azure AKS cluster and configure logging, tracing, and CI/CD via Helm Charts.

Features of our agent

The agent will be able to:

  • Answer questions about our website
  • Draft e-mails
  • Convert currencies on the fly
  • Gracefully respond to off-topic queries

Components of our agent

The agent relies on four building blocks:

  • Data pipeline (crawler → embeddings → ChromaDB → Azure)
  • Agent tools (retriever, e-mail draft/send)
  • LLM planner
  • Short-term memory
Figure 2. Overview of an AI agent [1]

Concretley our agents will look like this:
Figure 3. Architecture of our agent

Step-by-Step Tutorial

Step 1 – Set up AKS, Log Analytics & Container Registry

We create the cluster, enable logging, and push a private registry; RBAC permissions let AKS pull the image.

Step 2 – Deploy the agent

AKS handles container orchestration, auto-scaling, and load-balancing. We choose VM sizes optimised for our model's GPU and memory needs.

Step 3 – Integrate OpenAI models

Supply your OpenAI API key (or Azure OpenAI endpoint) and update the Helm values file.

Step 4 – Add observability with OpenTelemetry

Reliability starts with instrumenting the code itself. The Python snippet below wires OpenTelemetry into our agent so that every prompt, tool call, and model token is traced, logged, and counted. A lightweight Collector sidecar then streams those signals to Azure:

  • Traces – each request becomes a trace with spans for plan-build → tool-call → LLM-compose, enabling slow-path replay in Application Insights.
  • Logs – structured JSON (prompt, response, token count) land in Log Analytics for ad-hoc search.
  • Metrics – token/sec, queue age, and GPU utilisation feed KEDA auto-scaling and SLO dashboards.

                         
                            from opentelemetry.instrumentation.logging import LoggingInstrumentor
                            from my_agent.utils import init_logger        # your helper
                            
                            AZURE_CONNECTION_STRING = os.getenv("AZURE_MONITOR_CONNECTION_STRING")
                            
                            
                            # ---------- Tracing ----------
                            resource = Resource(attributes={SERVICE_NAME: "mlab-agent"})
                            trace.set_tracer_provider(TracerProvider(resource=resource))
                            tracer_provider = trace.get_tracer_provider()
                            
                            if AZURE_CONNECTION_STRING:
                                from azure.monitor.opentelemetry.exporter import AzureMonitorTraceExporter
                            
                                trace_exporter = AzureMonitorTraceExporter(
                                    connection_string=AZURE_CONNECTION_STRING
                                )
                                tracer_provider.add_span_processor(BatchSpanProcessor(trace_exporter))
                                print("🔗 Azure Monitor trace exporter enabled.")
                            else:
                                tracer_provider.add_span_processor(
                                    SimpleSpanProcessor(ConsoleSpanExporter())
                                )
                                print("🖥️ Console span exporter enabled for local dev.")
                            
                            # ---------- Logging ----------
                            if AZURE_CONNECTION_STRING:
                                from azure.monitor.opentelemetry.exporter import AzureMonitorLogExporter
                            
                                log_exporter = AzureMonitorLogExporter(
                                    connection_string=AZURE_CONNECTION_STRING
                                )
                                otel_handler = LoggingHandler(level=logging.INFO)
                            
                                provider = LoggerProvider()
                                provider.add_log_record_processor(BatchLogRecordProcessor(log_exporter))
                                
                            

With this in place, you can open Application Insights and:

  • Expand a trace to see every agent decision step  → was the Currency-Converter tool slow, or did the LLM skip it?
  • Create an alert when P95 plan-build latency exceeds 500 ms.
  • Drill into logs for any span ID to view the exact prompt and response.

We'll explore dashboards and auto-alerts in a dedicated monitoring post, but this wiring gives you live debugging from Day 1.

Conclusion

A reliable agent requires a solid backbone—elastic compute, IaC, observability, and CI/CD. With this stack, you can iterate safely and scale confidently. Questions or feedback? Contact us!

Machine Learning Architects Basel

Machine Learning Architects Basel (MLAB) is part of the Swiss Digital Network . We help customers deploy and scale data & AI products.

References & Acknowledgements

  1. Memory in Agent Systems