Let's Do DevOps

Let's Do DevOps

đŸ”„Solving AWS Bedrock's Enterprise Logging Problem: Adding Bot Context to Model Invocation LogsđŸ”„

aka, who is using up all our AI tokens?

Kyler Middleton's avatar
Kyler Middleton
Oct 14, 2025
∙ Paid
Share

Hey all! Today we’re tackling a problem that’s been driving me crazy since we started scaling our AWS Bedrock bot deployments across the enterprise. You’d think that when you have multiple AI bots running in production, AWS would give you some reasonable way to separate and monitor their activities. You’d be wrong.

Here’s the issue: every single AWS Bedrock model invocation in a region gets dumped into one massive CloudWatch log group. Doesn’t matter if you have five different bots serving different teams, or if Bot A is handling customer support while Bot B is doing internal documentation queries. Everything goes into `/aws/bedrock/modelinvocations` and good luck figuring out which log entry came from which bot.

This creates a monitoring nightmare. When someone asks “how much is our Slack bot costing us?” or “why did the support bot give a weird response yesterday?” you’re stuck grep-ing through thousands of log entries that all look identical. The raw Bedrock logs contain the model inputs and outputs, but zero context about which application made the call or who the actual user was.

After dealing with this mess for way too long, I finally built a solution that transforms these useless raw logs into something actually meaningful for enterprise monitoring. We’re talking about a Kinesis Firehose pipeline with a custom Lambda processor that parses the Bedrock JSON and adds the context that should have been there in the first place.

The enhancement adds four key fields that make all the difference: `botName` (extracted from the IAM role), `botOutput` (what the bot actually said to the user), `user` (who asked the question), and `query` (what they asked). Suddenly, instead of cryptic JSON blobs, you have logs that tell a story about real conversations happening in your enterprise.

This approach works for any AWS Bedrock deployment, but it’s especially valuable if you’re running multiple bots or need to track usage by team, project, or individual users. The processed logs flow into whatever monitoring system you’re already using - in our case, Splunk - where they become the foundation for dashboards, alerting, and cost attribution.

Let’s walk through how this whole system works and why it’s become essential for managing AI deployments at enterprise scale.

If you don’t care about the write-up and just want the code, you can find the modified cloudwatch to splunk lambda that adds bot information here:

gist.github.com/KyMidd/f2d216682b0c40ca06d74742f1cc56f7

Keep reading with a 7-day free trial

Subscribe to Let's Do DevOps to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Kyler Middleton
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture