Let's Do DevOps

Let's Do DevOps

đŸ”„Building Audit Logging for Multi-Platform AI Bots with Python, AWS CloudwatchđŸ”„

aka, who is actually using our bots today?

Kyler Middleton's avatar
Kyler Middleton
Nov 25, 2025
∙ Paid

This blog series focuses on presenting complex DevOps projects as simple and approachable via plain language and lots of pictures. You can do it!

These articles are supported by readers, please consider subscribing to support me writing more of these articles <3 :)

Hey all!

We’ve built three AI bots across our organization: VeraSlack, VeraTeams, and VeraResearch. All three run on AWS Bedrock using Claude Sonnet. They answer questions, search knowledge bases, and help employees find information.

When we first deployed these bots, we relied on AWS Bedrock’s built-in logging. Every API call generates a log entry with timestamps and request IDs. Our Lambda functions logged to CloudWatch. We figured we had full visibility. If security needed to audit a conversation, they’d just look at the logs.

The first time our security team asked “What did Bob from Finance ask the bot last Tuesday?” we couldn’t answer it. We had thousands of Bedrock API log entries, but no way to connect them to Bob’s question.

Agentic bots don’t make one API call per user question. They make dozens.

An employee asks: “What’s our process for requesting SSL certificates?” That single question triggers the bot to query the knowledge base, call a reranking service, search Confluence, check PagerDuty, maybe query Jira, synthesize the results with Bedrock, and format a response. That’s fifteen separate API calls across multiple services.

Each API call logs separately. Each log entry is atomic—just a request ID, timestamp, and parameters. No user name. No original question. No conversation context.

To reconstruct what Bob asked, you’d need to find his username in Slack’s event logs, correlate that to a Lambda execution timestamp, connect it to fifteen different Bedrock API calls scattered across log groups, and piece together the conversation.

Pro tip: This is awful, I don’t want to spend all my time reading logs

The problem gets worse with conversational bots. VeraResearch uses Bedrock Agents and can have multi-turn conversations. A five-minute troubleshooting session generates hundreds of log entries. Bedrock logs API requests, but has no concept of “user,” “conversation,” or “session.”

We needed to answer basic questions: What is Bob asking? What information are we providing about sensitive topics? Which bot is being used most? Bedrock’s logging couldn’t answer these.

So we built our own audit logging system. One that captures user identity, the original question, conversation context, the bot’s response, and ties it together with a single session ID.

In this article, I’ll walk you through why platform logging fails for agentic bots, how we designed an audit system across Slack and Teams, and the platform-specific challenges we solved.

Keep reading with a 7-day free trial

Subscribe to Let's Do DevOps to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Kyler Middleton
Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture