🔥Building Better GenAI: Pre-Contextualizing Knowledge Base Results🔥
aka, the knowledge base would really like to know this
This blog series focuses on presenting complex DevOps projects as simple and approachable via plain language and lots of pictures. You can do it!
These articles are supported by readers, please consider subscribing to support me writing more of these articles <3 :)
Hey all!
I’m calling my “Building a Slack Bot with AI Capabilities” series done, but I’m not done improving and building GenAI tools. This article will continue building the maturity of GenAI by establishing patterns that make these tools more powerful.
When you ask Vera, the bot we’ve built over the past 9 entries into the Building… series, a question, a LOT of stuff happens in the background. It’s mostly invisible to you, but it:
Walks the Slack thread to construct a bedrock-compatible conversation
Flattens the conversation to a string-ified convo, then queries the Bedrock Knowledge Base (which is really OpenSearch in a trenchcoat) for 25-50 relevant vectorized results
Use the reranker() model to match results to your question to improve the fidelity of results
Query the actual model for an answer, including all the compiled context
And it generally does a very good job, particularly for “chat with the knowledge base” style queries. However, that’s not always what we want. Sometimes we have highly structured data and we want to standardize or shim in other contextual data derived from what the user has entered before we query the knowledge base.
I’ve been playing around with adding an additional step, highlighted in the picture below, that derives structured data to assist the knowledge base in obtaining better results, from the unstructured information the user provides. It changes the pattern to look like this, bolded step is the new one:
Walks the Slack thread to construct a bedrock-compatible conversation
Query a model with instructions on how to standardize the data and what keywords would be needed to return high quality info. Ephemerally store that answer and provide to the knowledge base dip.
Flattens the conversation to a string-ified convo, then queries the Bedrock Knowledge Base (which is really OpenSearch in a trenchcoat) for 25-50 relevant vectorized results
Use the reranker() model to match results to your question to improve the fidelity of results
Query the actual model for an answer, including all the compiled context
This sounds really abstract, so lets add an example. I’m building “NetBot”, a tool to help answer the “is it the network?” type questions engineers ask all the time. This tool is both trained on our large general data store (Confluence) but also specifically on highly structured data (the iOS and ASA configs that filter traffic across our network).
When folks ask “does my VPN permit me to 1.2.3.4 host”, the knowledge base fails terribly - some/most network device rules don’t permit to a host, they permit to a CIDR, or a whole VPC/vNet. Knowledge base fetching using RAG generally isn’t smart enough to figure that out, so we have a model with system prompt instructions to:
Assistant should find any specific (/32) IPs in the thread, and specify the class B and Class C CIDRs for the network.
Assistant should provide no other information.
If there are no IPs, assistant must not respond at all.
With that context, the NetBot knowledge base results are markedly more related to the question, since they are now able to find rules that would permit or deny the flow, but which don’t have an exact keyword match.
Notably, NetBot isn’t finished, and I may transition the whole thing to agentic skills (via MCP?) in order to get it working to a level I want it to. However, this pattern has proven useful for all sorts of other use cases, so I wanted to share how it might improve your AI architectures.
Lets walk through how this is implemented - I wrote it all out so you can easily steal borrow it!
Implementation - Constants
Lets walk through the code. If you want to follow along, I’ll be walking through this codebase:
First, the constants. We add a feature flag, on line 2, so we can easily enable/disable this even after a code update, for testing.
Then on line 3, we add a status message we’ll send back to the user. It’s just sent to the user while we’re on this initial step, which is only for a second or two.
Then on line 4, a heredoc that has the instructions. I don’t have any here, but it could be to populate any keywords or derived information that you can have the model populate.
# Initial context step | |
enable_initial_model_context_step = False | |
initial_model_user_status_message = "Adding additional context :waiting:" | |
initial_model_context_instructions = f""" | |
Assistant should... | |
""" |
Implementation - Context Conversation
Within the handle_message_event function, which is a master function that handles all message events for this particular slack bot, we have a feature flag check on line 4. If we have our context step enabled, we follow this path. If not, skip it entirely.
On line 5, we call our function to update the slack response - this is often the first response to the user, but that doesn’t matter, since this function is written idempotently. For more details, you can read here.
We take our initial conversation, thus far the slack conversational context (reading the slack thread as a conversation), and append the system prompt instructions as a user text.
def handle_message_event(client, body, say, bedrock_client, app, token, registered_bot_id): | |
# ... | |
# Before we fetch the knowledge base, do an initial turn with the AI to add context | |
if enable_initial_model_context_step: | |
message_ts = update_slack_response( | |
say, client, message_ts, channel_id, thread_ts, | |
initial_model_user_status_message, | |
) | |
# Append to conversation | |
conversation.append( | |
{ | |
"role": "user", | |
"content": [ | |
{ | |
"text": initial_model_context_instructions, | |
} | |
], | |
} | |
) |
After that, we need to send the conversation to the model to get the context.
On line 4, we request a response. Note the very last flag, a new one, that says for this particular request, just get a text response (not a streaming response back to slack), as a response to this function invocation.
Then on line 7, we add the response as an assistant text response.
def handle_message_event(client, body, say, bedrock_client, app, token, registered_bot_id): | |
# ... | |
# Ask the AI for a response | |
ai_response = ai_request(bedrock_client, conversation, say, thread_ts, client, message_ts, channel_id, False) | |
# Append to conversation | |
conversation.append( | |
{ | |
"role": "assistant", | |
"content": [ | |
{ | |
"text": ai_response, | |
} | |
], | |
} | |
) |
What Happens Next
Immediately after this, we flatten the conversation and send it to the knowledge base to get keyword-based responses, which we then rerank to get the most related ones.
Hopefully with this additional context information, the responses from the knowledge base are more related, particularly for Cisco ASA Access Control Lists and other structured data that you might be querying.
Note that this contextual data is never returned to the user, nor is it retained anywhere in state. It’s simply generated, and used to help the knowledge base, then thrown away.
Summary
In this article we talked about an additional step I’m playing with to “pre-contextualized” user input to make knowledge base responses more relevant.
We created some constants, which include a feature flag, status message, and instructions for the pre-contextualizer step. Then we walked through how I’ve implemented the logic to do that, including a call and response option for the ai_request, rather than streaming tokens back to slack like we always did before.
I’m not entirely sure this is the pattern I’ll use for this - it’s possible I’m trying to implement a partial agentic functionality in a non-agentic way - I’m certainly not happy with the fidelity of the answers I’m able to get yet, and I see issues where say, a user provides the name of a host, which can be mapped to an IP from Confluence Knowledge Base, or from querying the cloud infrastructure where it’s housed, which then needs to be mapped to a class B or class C CIDR.
That’s hard to predict, and not a solved problem yet. As I figure this out I’ll report back on how to improve this process. I’ll also publish all the code so ya’ll can use it, too.
Thanks all! Good luck out there.
kyler