
🔥Building a Slack Bot with AI Capabilities - Part 8, ReRanking Bedrock Knowledge Base Vectors to Improve Result Relevancy🔥
aka, just give me those GOOD vectors please
This blog series focuses on presenting complex DevOps projects as simple and approachable via plain language and lots of pictures. You can do it!
These articles are supported by readers, please consider subscribing to support me writing more of these articles <3 :)
This article is part of a series of articles, because 1 article would be absolutely massive.
Part 1: Covers how to build a slack bot in websocket mode and connect to it with python
Part 4: How to convert your local script to an event-driven serverless, cloud-based app in AWS Lambda
Part 7: Streaming token responses from AWS Bedrock to your AI Slack bot using converse_stream()
Part 8 (this article!): ReRanking knowledge base responses to improve AI model response efficacy
Part 9: Adding a Lambda Receiver tier to reduce cost and improve Slack response time
Hey all!
So far in this series we’ve created a slack app, connected that to a tiered python-based lambda system, and built all the logic to relay all that sweet sweet slack context over to the bedrock AI.
It’s all serverless, we’re using a RAG knowledge base against Confluence to make sure our bot has memorized all our internal enterprise information , and we’re even streaming tokens from bedrock to start responding right away (and be as cool as platforms like ChatGPT).
We’re even using converse(), AWS’s front-end API to help map our conversation to each model’s requested API format, so we can easily swap out the AI to different models.
In this article, we’re going to talk about ReRanking. ReRanking is an understated technology. It reads your input question/conversation, and all your knowledge base results, and gives you a list of the best ones, with their relevancy score added as metadata. It takes only a quarter of a second to run on 50 results, the max I’ve scaled it.
And it improves the accuracy and relevancy of your bot’s answers DRAMATICALLY
Seriously, it’s so much better.
I have a mentor at work, who used this analogy:
Your knowledge base using Retrieve() is like a librarian scooping up books off a shelf that are near to your keywords. Are those books related to your question? No idea, but they look similar!
A ReRanker reads all the books the librarian picked out and then just gives you the number of requests that are most related, and tells you which ones to read first.
Empowering your model with weighted responses, and skipping the text vectors that are least likely an answer to the question your users are asking, is INCREDIBLY IMPACTFUL.
You get it. You’ve read this far, and you’re ready to improve the fidelity of your knowledge base results. All for the low low cost of almost no time, and almost no money. Lets do this.
Getting the ReRanker
First of all, we need to get access. Head to your target AWS account, in your target region, to the Bedrock service, and find the Providers —> Amazon —> Rerank 1.0. If you’re reading this later and there’s a newer version, use that.
If you see “Access granted” like below, you’re good to go. If not, you need to request access (or work with your DevOps/platform team to request access).
Implementing ReRanking
Lets go to our Python script. As of writing, we’re working in the script here.
First of all, on line 2, we’ve previously had “5” knowledge base results. That’s because if we go much higher than that, we end up overloading our model with unrelated information, or flatly just TOO MUCH information. However, we’re going to use the ReRanker to filter down to truly relevant information after the KB dip, so lets crank it WAY up to 50.
The max the API currently supports is 100, by the way, and I’m reasonably confident we could do that to be slightly slower, and slightly better on accuracy.
Next, some new keys. On line 5, we have a feature flag to enable/disable ReRanker if you’re not happy with it. Then on line 6, the number of results we should filter down to. To be clear, that’ll be a sub-set of the results the knowledge base gives it - it’ll pick the best `n` (in this case 5), and score them.
On line 7, the model we’ll use for ReRanking. I’m not aware of any specific name of other ReRanking models, but this is a standard function models are trained on. Feel free to mess with them! We’ll use amazon’s own rerank-v1:0.
# Knowledge base information | |
knowledgeBaseContextNumberOfResults = 50 | |
# Rerank configuration | |
enable_rerank = True | |
rerank_number_of_results = 5 | |
rerank_model_id = "amazon.rerank-v1:0" |
Double-Checking the Knowledge Base
Here’s a partial snippet of our ask_bedrock_llm_with_knowledge_base() function. Previously we just did a KB dip and returned the information. However, we need to be a little more structured here since we’re going to be manipulating the knowledge base results.
On line 4, we’re going to be building a list of maps of knowledge base results, including their text and URL. Those fields are always populated from the knowledge base for Confluence.
If you use other data sources that Confluence, looks like you’ll need to update this logic.
On line 10, we check if enable_rerank is True - this is our feature flag. If not, we skip this logic entirely and just return the knowledge base responses as structured now.
If it is, we call our rerank_text() function - that’s entirely new, we’ll cover that next. In order to call it, it needs our flat_conversation (the question we’re asking), the kb_responses (the current knowledge base array), and the bedrock_client (so it can be authenticated and call ReRanker API).
We capture the response (line 12) as kb_responses, over-writing what was provided by the knowledge base dip.
# Function to retrieve info from RAG with knowledge base | |
def ask_bedrock_llm_with_knowledge_base(flat_conversation, knowledge_base_id, bedrock_client) -> str: | |
# Structure response | |
kb_responses = [ | |
{ | |
"text": result['content']['text'], | |
"url": result['location']['confluenceLocation']['url'] | |
} for result in kb_response['retrievalResults'] | |
] | |
if enable_rerank: | |
# Rerank the knowledge base results | |
kb_responses = rerank_text( | |
flat_conversation, | |
kb_responses, | |
bedrock_client | |
) | |
return kb_responses |
ReRank the Vectors
Now lets actually write our reranking function. We receive the flat_conversation (context/question), the kb_responses array, and the bedrock client.
The kb_responses data currently looks like line 4-14 (but with real data obviously).
# Reranking knowledge base results | |
def rerank_text(flat_conversation, kb_responses, bedrock_client): | |
# Data looks like this: | |
# [ | |
# { | |
# "text": "text", | |
# "url": "url", | |
# }, | |
# { | |
# "text": "text", | |
# "url": "url", | |
# } | |
# ] |
Now, the reranker doesn’t want all that context from above. It doesn’t care about the “text” key, or the url at all. It doesn’t use that information. It just wants that sweet sweet potential response vector.
So first of all, we build a new map with just the text information it wants. We initialize on line 2, and then iterate over every map in the array to extract the text.
Then we flatten it all, on line 11.
Then on line 14, we construct the body the reranker is looking for - the query (question/conversation context), the documents (the knowledge base array of text), and the top_n (the number of results you want it to provide back).
You can totally have it score every single knowledge base entry, but winnowing down responses from a broad knowledge base dip is a really powerful way to concentrate the AI’s response and speed up the larger model’s response time.
Plus, if you’re not going to append all those results to the AI model’s context, then why get them scored and sent back anyway?
# Format kb_responses into a list of sources | |
kb_responses_text = [] | |
for kb_response in kb_responses: | |
kb_responses_text.append( | |
[ | |
kb_response['text'] | |
] | |
) | |
# Flatten | |
kb_responses_text = [item[0] for item in kb_responses_text] | |
# Construct body | |
body = json.dumps( | |
{ | |
"query": flat_conversation, | |
"documents": kb_responses_text, | |
"top_n": rerank_number_of_results, | |
} | |
) |
Next, we invoke our reranker model and send it everything it requires.
# Fetch ranks | |
rank_response = bedrock_client.invoke_model( | |
modelId=rerank_model_id, | |
accept="application/json", | |
contentType="application/json", | |
body=body, | |
) |
We load the response in and decode it, and we can see the response looks like line 7 - 16. Note that none of this is the actual text that was sent - it’s simply giving you the address in the array you sent and the scores.
The scores are notably listed in order, which is helpful.
# Decode response | |
rank_response_body = json.loads( | |
rank_response['body'].read().decode() | |
) | |
# Response looks like this: | |
# [ | |
# { | |
# "index": 9, | |
# "relevance_score": 0.9438672242987702 | |
# }, | |
# { | |
# "index": 0, | |
# "relevance_score": 0.9343951625409306 | |
# } | |
# ] |
This isn’t very many lines of code, but there’s a lot going on here. Thus far, we have our knowledge base responses (the full array information from the knowledge base) and the rank_response_body results, the list from the reranker of the array address and its score.
We want to walk through the results from the reranker, so that’s what we do in a `for` loop, on line 9.
For every item, we look up the text and URL of the item based on the array index.
Very important here is the order of the items in your knowledge base response, and the items in your flattened list (and rerank response) are all in the same order, or the array index address wouldn’t be helpful at all.
The relevancy score isn’t already in the kb_responses, so we add that key through direct `for` loop item reference.
Then, on line 12, we return the freshly enriched and filtered list of kb_responses.
# Iterate through the rank response and reorder the kb_responses and add relevance_score | |
# We're also filtering just for the most relevant results according to rerank_number_of_results | |
ranked_kb_responses = [ | |
{ | |
# Use the index value in rank_response to find the correct kb_response | |
"text": kb_responses[rank_response['index']]["text"], | |
"url": kb_responses[rank_response['index']]["url"], | |
"relevance_score": rank_response['relevance_score'] | |
} for rank_response in rank_response_body['results'] | |
] | |
return ranked_kb_responses |
Summary
And that’s it! This is one of the shorter articles in this series, but there’s a lot of meaning and polish embedded in it, so I didn’t want to water it down.
I didn’t realize the power of reranking when I started building this bot and learning these things, but it can be an incredibly efficient and powerful tool for incrementally walking forward the fidelity of the information provided by your AI tools.
Remember, all the code is here and MIT licensed so you can build it yourself:
Hope you’re enjoying the ride! Good luck out there.
kyler