Tech Guide: Reliably Deploying and Running Your First Agent

Introduction

In the rapidly evolving world of AI, agents powered by Large-Language Models (LLMs) unlock everything from conversational support to automated content creation. Yet deploying them reliably in production remains a challenge. In this tutorial, we'll show how to stand up an LLM-driven agent on Azure AKS; every step maps 1-for-1 to AWS or GCP if that's your stack.

High-level architecture of the agent stack — **Figure 1.** What you'll have running by the end of this guide.

This tutorial is part of our AI-Engineering series and complements our earlier post on Introduction to AI Engineering. .

Prerequisites

To follow along, you should be familiar with:

Azure services (Key Vault, Application Insights, Log Analytics, AKS)

We'll deploy everything on Azure, but identical concepts apply to other clouds.

Kubernetes concepts and management

AKS handles scaling and rollout. Alternatives such as Docker Swarm or Mesos work too, but the Kubernetes ecosystem makes life easier.

Terraform for Infrastructure-as-Code

Terraform lets us declare-not-click our infrastructure and reuse the code on any cloud.

Helm Charts

Helm bundles our Kubernetes resources into versioned, repeatable releases.

Overview of the Stack

We'll deploy our agent on an Azure AKS cluster and configure logging, tracing, and CI/CD via Helm Charts.

Features of our agent

The agent will be able to:

Answer questions about our website
Draft e-mails
Convert currencies on the fly
Gracefully respond to off-topic queries

Components of our agent

The agent relies on four building blocks:

Data pipeline (crawler → embeddings → ChromaDB → Azure): This ingests the relevant subsites on our website
Agent tools (retriever, e-mail draft/send): This enables the agent to use functionality such as writing an email, and retrieving information about our website
LLM planner: After a query from the user arrives, our agent plans the substeps it needs to take to answer the query. Furthermore, it plans whether it should use any tools to answer the query or if it can answer without any tool usage.
Short-term memory: To remember the tools that the agent has already used and the conversation that has already taken place
An API that lets any frontend interact with the agent.

Figure 2. Overview of an AI agent [1]

Concretely, our agents' architecture will look like this:

Figure 3. Architecture of our agent

Step-by-Step Tutorial

Step 1 – Set up AKS, Log Analytics & Container Registry

We create the cluster, enable logging, and push a private registry; RBAC permissions let AKS pull the image. We set up the whole infrastructure using Terraform, which spins up the cluster, Application Insights, the Azure Container Registry, and the key vault. Here is a snippet, that sets up the infrastructure:


    
data "azurerm_key_vault" "mlab_kv" {
    name                = "kv-mlab-dev"
    resource_group_name = data.azurerm_resource_group.mlab_rg.name
  }
  
  resource "azurerm_kubernetes_cluster" "mlab_aks" {
    name                = "mlab-aks-cluster"
    location            = data.azurerm_resource_group.mlab_rg.location
    resource_group_name = data.azurerm_resource_group.mlab_rg.name
    dns_prefix          = "mlab-aks"
  
    default_node_pool {
      name       = "default"
      node_count = 2
      vm_size    = "Standard_DS2_v2"
    }
  
    identity {
      type = "SystemAssigned"
    }
  
    tags = {
      environment = "dev"
    }
  }

Step 2 – Deploy the agent

AKS handles container orchestration, auto-scaling, and load-balancing. We choose VM sizes optimised for our model's GPU and memory needs. For this example, we chose an instance size that is free, as we are currently operating Azure on a free license.

Step 3 – Integrate OpenAI models

Supply your OpenAI API key (or Azure OpenAI endpoint) and update it in the Azure key vault secrets

Step 4 – Add observability with OpenTelemetry and Opik

Reliability starts with instrumenting the code itself. The Python snippet below wires OpenTelemetry into our agent so that every prompt, tool call, and model token is traced, logged, and counted. A lightweight Collector sidecar then streams those signals to Azure:

Traces – each request becomes a trace with spans for plan-build → tool-call → LLM-compose, enabling slow-path replay in Application Insights.
Logs – structured JSON (prompt, response, token count) land in Log Analytics for ad-hoc search.


from opentelemetry.instrumentation.logging import LoggingInstrumentor
from my_agent.utils import init_logger        # your helper

AZURE_CONNECTION_STRING = os.getenv("AZURE_MONITOR_CONNECTION_STRING")


# ---------- Tracing ----------
resource = Resource(attributes={SERVICE_NAME: "mlab-agent"})
trace.set_tracer_provider(TracerProvider(resource=resource))
tracer_provider = trace.get_tracer_provider()

if AZURE_CONNECTION_STRING:
    from azure.monitor.opentelemetry.exporter import AzureMonitorTraceExporter

    trace_exporter = AzureMonitorTraceExporter(
        connection_string=AZURE_CONNECTION_STRING
    )
    tracer_provider.add_span_processor(BatchSpanProcessor(trace_exporter))
    print("🔗 Azure Monitor trace exporter enabled.")
else:
    tracer_provider.add_span_processor(
        SimpleSpanProcessor(ConsoleSpanExporter())
    )
    print("🖥️ Console span exporter enabled for local dev.")

# ---------- Logging ----------
if AZURE_CONNECTION_STRING:
    from azure.monitor.opentelemetry.exporter import AzureMonitorLogExporter

    log_exporter = AzureMonitorLogExporter(
        connection_string=AZURE_CONNECTION_STRING
    )
    otel_handler = LoggingHandler(level=logging.INFO)

    provider = LoggerProvider()
    provider.add_log_record_processor(BatchLogRecordProcessor(log_exporter))

With this in place, you can open Application Insights and:

Expand a trace to see every agent decision step → was the Currency-Converter tool slow, or did the LLM skip it?
Create an alert when P95 plan-build latency exceeds 500 ms.

A sample trace of an agent in Application Insights can look like this


INFO:mlab-agent:🧾 LLM raw response: {
    "requires_tools": true,
    "direct_response": null,
    "thought": "I need to draft an email to Sammer Puran regarding a meeting about MLOps. I'll use the write_email_draft_tool to create the email draft.",
    "plan": [
        "Use the write_email_draft_tool to draft an email to Sammer Puran.",
        "Include a subject line and a message body that clearly states the purpose of the meeting."
    ],
    "tool_calls": [
        {
        "tool": "write_email_draft_tool",
        "args": {
            "recipient": "Sammer Puran",
            "recipient_email": "sammer.puran@ml-architects.ch",
            "subject": "Request for Meeting Regarding MLOps",
            "message_body": "Dear Sammer,\n\nI hope this message finds you well. I would like to schedule a meeting to discuss MLOps and explore potential collaboration opportunities.\n\nPlease let me know your available times.\n\nBest regards,\n\n[Your Name]"
        }
        }
    ]
    }

We'll explore dashboards and auto-alerts in a dedicated monitoring post, but this wiring gives you live debugging from Day 1.

Conclusion

Congratulations, you have a successful, simple agent running that is accessible via an API. As you have seen, a reliable agent requires a solid backbone—elastic compute, IaC, observability, and CI/CD. With this stack, you can iterate safely and scale confidently. In our next sessions, we will dive deeper into testing and evaluation for agents, evaluations of different frameworks to implement agents a deeper dive into monitoring and multi-agent systems. Questions or feedback? Contact us!

Machine Learning Architects Basel

Machine Learning Architects Basel (MLAB) is part of the Swiss Digital Network . We help customers deploy and scale data & AI products.

References & Acknowledgements

Memory in Agent Systems

Reliable Data Solutions

AI Engineering

Effective AI & LLM(Ops)