š„Building Better GenAI: Pre-Contextualizing Knowledge Base Resultsš„
aka, the knowledge base would really like to know this
This blog series focuses on presenting complex DevOps projects as simple and approachable via plain language and lots of pictures. You can do it!
These articles are supported by readers, please consider subscribing to support me writing more of these articles <3 :)
Hey all!
Iām calling my āBuilding a Slack Bot with AI Capabilitiesā series done, but Iām not done improving and building GenAI tools. This article will continue building the maturity of GenAI by establishing patterns that make these tools more powerful.
When you ask Vera, the bot weāve built over the past 9 entries into the Building⦠series, a question, a LOT of stuff happens in the background. Itās mostly invisible to you, but it:
Walks the Slack thread to construct a bedrock-compatible conversation
Flattens the conversation to a string-ified convo, then queries the Bedrock Knowledge Base (which is really OpenSearch in a trenchcoat) for 25-50 relevant vectorized results
Use the reranker() model to match results to your question to improve the fidelity of results
Query the actual model for an answer, including all the compiled context
And it generally does a very good job, particularly for āchat with the knowledge baseā style queries. However, thatās not always what we want. Sometimes we have highly structured data and we want to standardize or shim in other contextual data derived from what the user has entered before we query the knowledge base.
Iāve been playing around with adding an additional step, highlighted in the picture below, that derives structured data to assist the knowledge base in obtaining better results, from the unstructured information the user provides. It changes the pattern to look like this, bolded step is the new one:
Walks the Slack thread to construct a bedrock-compatible conversation
Query a model with instructions on how to standardize the data and what keywords would be needed to return high quality info. Ephemerally store that answer and provide to the knowledge base dip.
Flattens the conversation to a string-ified convo, then queries the Bedrock Knowledge Base (which is really OpenSearch in a trenchcoat) for 25-50 relevant vectorized results
Use the reranker() model to match results to your question to improve the fidelity of results
Query the actual model for an answer, including all the compiled context
This sounds really abstract, so lets add an example. Iām building āNetBotā, a tool to help answer the āis it the network?ā type questions engineers ask all the time. This tool is both trained on our large general data store (Confluence) but also specifically on highly structured data (the iOS and ASA configs that filter traffic across our network).
When folks ask ādoes my VPN permit me to 1.2.3.4 hostā, the knowledge base fails terribly - some/most network device rules donāt permit to a host, they permit to a CIDR, or a whole VPC/vNet. Knowledge base fetching using RAG generally isnāt smart enough to figure that out, so we have a model with system prompt instructions to:
Assistant should find any specific (/32) IPs in the thread, and specify the class B and Class C CIDRs for the network.
Assistant should provide no other information.
If there are no IPs, assistant must not respond at all.
With that context, the NetBot knowledge base results are markedly more related to the question, since they are now able to find rules that would permit or deny the flow, but which donāt have an exact keyword match.
Notably, NetBot isnāt finished, and I may transition the whole thing to agentic skills (via MCP?) in order to get it working to a level I want it to. However, this pattern has proven useful for all sorts of other use cases, so I wanted to share how it might improve your AI architectures.
Lets walk through how this is implemented - I wrote it all out so you can easily steal borrow it!
Implementation - Constants
Lets walk through the code. If you want to follow along, Iāll be walking through this codebase:
First, the constants. We add a feature flag, on line 2, so we can easily enable/disable this even after a code update, for testing.
Then on line 3, we add a status message weāll send back to the user. Itās just sent to the user while weāre on this initial step, which is only for a second or two.
Then on line 4, a heredoc that has the instructions. I donāt have any here, but it could be to populate any keywords or derived information that you can have the model populate.
# Initial context step | |
enable_initial_model_context_step = False | |
initial_model_user_status_message = "Adding additional context :waiting:" | |
initial_model_context_instructions = f""" | |
Assistant should... | |
""" |
Implementation - Context Conversation
Within the handle_message_event function, which is a master function that handles all message events for this particular slack bot, we have a feature flag check on line 4. If we have our context step enabled, we follow this path. If not, skip it entirely.
On line 5, we call our function to update the slack response - this is often the first response to the user, but that doesnāt matter, since this function is written idempotently. For more details, you can read here.
We take our initial conversation, thus far the slack conversational context (reading the slack thread as a conversation), and append the system prompt instructions as a user text.
def handle_message_event(client, body, say, bedrock_client, app, token, registered_bot_id): | |
# ... | |
# Before we fetch the knowledge base, do an initial turn with the AI to add context | |
if enable_initial_model_context_step: | |
message_ts = update_slack_response( | |
say, client, message_ts, channel_id, thread_ts, | |
initial_model_user_status_message, | |
) | |
# Append to conversation | |
conversation.append( | |
{ | |
"role": "user", | |
"content": [ | |
{ | |
"text": initial_model_context_instructions, | |
} | |
], | |
} | |
) |
After that, we need to send the conversation to the model to get the context.
On line 4, we request a response. Note the very last flag, a new one, that says for this particular request, just get a text response (not a streaming response back to slack), as a response to this function invocation.
Then on line 7, we add the response as an assistant text response.
def handle_message_event(client, body, say, bedrock_client, app, token, registered_bot_id): | |
# ... | |
# Ask the AI for a response | |
ai_response = ai_request(bedrock_client, conversation, say, thread_ts, client, message_ts, channel_id, False) | |
# Append to conversation | |
conversation.append( | |
{ | |
"role": "assistant", | |
"content": [ | |
{ | |
"text": ai_response, | |
} | |
], | |
} | |
) |
What Happens Next
Immediately after this, we flatten the conversation and send it to the knowledge base to get keyword-based responses, which we then rerank to get the most related ones.
Hopefully with this additional context information, the responses from the knowledge base are more related, particularly for Cisco ASA Access Control Lists and other structured data that you might be querying.
Note that this contextual data is never returned to the user, nor is it retained anywhere in state. Itās simply generated, and used to help the knowledge base, then thrown away.
Summary
In this article we talked about an additional step Iām playing with to āpre-contextualizedā user input to make knowledge base responses more relevant.
We created some constants, which include a feature flag, status message, and instructions for the pre-contextualizer step. Then we walked through how Iāve implemented the logic to do that, including a call and response option for the ai_request, rather than streaming tokens back to slack like we always did before.
Iām not entirely sure this is the pattern Iāll use for this - itās possible Iām trying to implement a partial agentic functionality in a non-agentic way - Iām certainly not happy with the fidelity of the answers Iām able to get yet, and I see issues where say, a user provides the name of a host, which can be mapped to an IP from Confluence Knowledge Base, or from querying the cloud infrastructure where itās housed, which then needs to be mapped to a class B or class C CIDR.
Thatās hard to predict, and not a solved problem yet. As I figure this out Iāll report back on how to improve this process. Iāll also publish all the code so yaāll can use it, too.
Thanks all! Good luck out there.
kyler