đ„Solving AWS Bedrock's Enterprise Logging Problem: Adding Bot Context to Model Invocation Logsđ„
aka, who is using up all our AI tokens?
Hey all! Today weâre tackling a problem thatâs been driving me crazy since we started scaling our AWS Bedrock bot deployments across the enterprise. Youâd think that when you have multiple AI bots running in production, AWS would give you some reasonable way to separate and monitor their activities. Youâd be wrong.
Hereâs the issue: every single AWS Bedrock model invocation in a region gets dumped into one massive CloudWatch log group. Doesnât matter if you have five different bots serving different teams, or if Bot A is handling customer support while Bot B is doing internal documentation queries. Everything goes into `/aws/bedrock/modelinvocations` and good luck figuring out which log entry came from which bot.
This creates a monitoring nightmare. When someone asks âhow much is our Slack bot costing us?â or âwhy did the support bot give a weird response yesterday?â youâre stuck grep-ing through thousands of log entries that all look identical. The raw Bedrock logs contain the model inputs and outputs, but zero context about which application made the call or who the actual user was.
After dealing with this mess for way too long, I finally built a solution that transforms these useless raw logs into something actually meaningful for enterprise monitoring. Weâre talking about a Kinesis Firehose pipeline with a custom Lambda processor that parses the Bedrock JSON and adds the context that should have been there in the first place.
The enhancement adds four key fields that make all the difference: `botName` (extracted from the IAM role), `botOutput` (what the bot actually said to the user), `user` (who asked the question), and `query` (what they asked). Suddenly, instead of cryptic JSON blobs, you have logs that tell a story about real conversations happening in your enterprise.
This approach works for any AWS Bedrock deployment, but itâs especially valuable if youâre running multiple bots or need to track usage by team, project, or individual users. The processed logs flow into whatever monitoring system youâre already using - in our case, Splunk - where they become the foundation for dashboards, alerting, and cost attribution.
Letâs walk through how this whole system works and why itâs become essential for managing AI deployments at enterprise scale.
If you donât care about the write-up and just want the code, you can find the modified cloudwatch to splunk lambda that adds bot information here:
Keep reading with a 7-day free trial
Subscribe to Let's Do DevOps to keep reading this post and get 7 days of free access to the full post archives.