🔥Building a Slack Bot with AI Capabilities - Part 8, ReRanking Bedrock Knowledge Base Vectors to Improve Result Relevancy🔥

aka, just give me those GOOD vectors please

Mar 18, 2025

This blog series focuses on presenting complex DevOps projects as simple and approachable via plain language and lots of pictures. You can do it!
These articles are supported by readers, please consider subscribing to support me writing more of these articles <3 :)

This article is part of a series of articles, because 1 article would be absolutely massive.

Part 1: Covers how to build a slack bot in websocket mode and connect to it with python
Part 2: How to deploy an AWS Bedrock AI resource and connect to it to ask a request locally from your terminal with python3
Part 3: We’ll connect our slack bot with Bedrock locally using python3 with ngrok so slack users can have conversations with AI
Part 4: How to convert your local script to an event-driven serverless, cloud-based app in AWS Lambda
Part 5: Building a RAG Knowledge Base of an entire Confluence wiki, and teaching our Slack Bot to Read it in RealTime
Part 6: Switching to the .converse() API to support DOC/DOCX, XLS/XLSX, PDF, and others to chat with your data
Part 7: Streaming token responses from AWS Bedrock to your AI Slack bot using converse_stream()
Part 8 (this article!): ReRanking knowledge base responses to improve AI model response efficacy
Part 9: Adding a Lambda Receiver tier to reduce cost and improve Slack response time

Hey all!

So far in this series we’ve created a slack app, connected that to a tiered python-based lambda system, and built all the logic to relay all that sweet sweet slack context over to the bedrock AI.

It’s all serverless, we’re using a RAG knowledge base against Confluence to make sure our bot has memorized all our internal enterprise information , and we’re even streaming tokens from bedrock to start responding right away (and be as cool as platforms like ChatGPT).

We’re even using converse(), AWS’s front-end API to help map our conversation to each model’s requested API format, so we can easily swap out the AI to different models.

In this article, we’re going to talk about ReRanking. ReRanking is an understated technology. It reads your input question/conversation, and all your knowledge base results, and gives you a list of the best ones, with their relevancy score added as metadata. It takes only a quarter of a second to run on 50 results, the max I’ve scaled it.

And it improves the accuracy and relevancy of your bot’s answers DRAMATICALLY

Seriously, it’s so much better.

I have a mentor at work, who used this analogy:

Your knowledge base using Retrieve() is like a librarian scooping up books off a shelf that are near to your keywords. Are those books related to your question? No idea, but they look similar!
A ReRanker reads all the books the librarian picked out and then just gives you the number of requests that are most related, and tells you which ones to read first.
Empowering your model with weighted responses, and skipping the text vectors that are least likely an answer to the question your users are asking, is INCREDIBLY IMPACTFUL.

You get it. You’ve read this far, and you’re ready to improve the fidelity of your knowledge base results. All for the low low cost of almost no time, and almost no money. Lets do this.

All code is MIT open sourced here

Getting the ReRanker

First of all, we need to get access. Head to your target AWS account, in your target region, to the Bedrock service, and find the Providers —> Amazon —> Rerank 1.0. If you’re reading this later and there’s a newer version, use that.

If you see “Access granted” like below, you’re good to go. If not, you need to request access (or work with your DevOps/platform team to request access).

Implementing ReRanking

Lets go to our Python script. As of writing, we’re working in the script here.

First of all, on line 2, we’ve previously had “5” knowledge base results. That’s because if we go much higher than that, we end up overloading our model with unrelated information, or flatly just TOO MUCH information. However, we’re going to use the ReRanker to filter down to truly relevant information after the KB dip, so lets crank it WAY up to 50.

The max the API currently supports is 100, by the way, and I’m reasonably confident we could do that to be slightly slower, and slightly better on accuracy.

Next, some new keys. On line 5, we have a feature flag to enable/disable ReRanker if you’re not happy with it. Then on line 6, the number of results we should filter down to. To be clear, that’ll be a sub-set of the results the knowledge base gives it - it’ll pick the best `n` (in this case 5), and score them.

On line 7, the model we’ll use for ReRanking. I’m not aware of any specific name of other ReRanking models, but this is a standard function models are trained on. Feel free to mess with them! We’ll use amazon’s own rerank-v1:0.

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters

Show hidden characters

	# Knowledge base information
	knowledgeBaseContextNumberOfResults = 50

	# Rerank configuration
	enable_rerank = True
	rerank_number_of_results = 5
	rerank_model_id = "amazon.rerank-v1:0"

view raw top_level_keys.py hosted with ❤ by GitHub

Double-Checking the Knowledge Base

Here’s a partial snippet of our ask_bedrock_llm_with_knowledge_base() function. Previously we just did a KB dip and returned the information. However, we need to be a little more structured here since we’re going to be manipulating the knowledge base results.

On line 4, we’re going to be building a list of maps of knowledge base results, including their text and URL. Those fields are always populated from the knowledge base for Confluence.

If you use other data sources that Confluence, looks like you’ll need to update this logic.

On line 10, we check if enable_rerank is True - this is our feature flag. If not, we skip this logic entirely and just return the knowledge base responses as structured now.

If it is, we call our rerank_text() function - that’s entirely new, we’ll cover that next. In order to call it, it needs our flat_conversation (the question we’re asking), the kb_responses (the current knowledge base array), and the bedrock_client (so it can be authenticated and call ReRanker API).

We capture the response (line 12) as kb_responses, over-writing what was provided by the knowledge base dip.

Show hidden characters

	# Function to retrieve info from RAG with knowledge base
	def ask_bedrock_llm_with_knowledge_base(flat_conversation, knowledge_base_id, bedrock_client) -> str:
	# Structure response
	kb_responses = [
	{
	"text": result['content']['text'],
	"url": result['location']['confluenceLocation']['url']
	} for result in kb_response['retrievalResults']
	]
	if enable_rerank:
	# Rerank the knowledge base results
	kb_responses = rerank_text(
	flat_conversation,
	kb_responses,
	bedrock_client
	)
	return kb_responses

view raw ask_bedrock_llm_with_knowledge_base_updated.py hosted with ❤ by GitHub

ReRank the Vectors

Now lets actually write our reranking function. We receive the flat_conversation (context/question), the kb_responses array, and the bedrock client.

The kb_responses data currently looks like line 4-14 (but with real data obviously).

Show hidden characters

	# Reranking knowledge base results
	def rerank_text(flat_conversation, kb_responses, bedrock_client):

	# Data looks like this:
	# [
	# {
	# "text": "text",
	# "url": "url",
	# },
	# {
	# "text": "text",
	# "url": "url",
	# }
	# ]

view raw rerank_text1.py hosted with ❤ by GitHub

Now, the reranker doesn’t want all that context from above. It doesn’t care about the “text” key, or the url at all. It doesn’t use that information. It just wants that sweet sweet potential response vector.

So first of all, we build a new map with just the text information it wants. We initialize on line 2, and then iterate over every map in the array to extract the text.

Then we flatten it all, on line 11.

Then on line 14, we construct the body the reranker is looking for - the query (question/conversation context), the documents (the knowledge base array of text), and the top_n (the number of results you want it to provide back).

You can totally have it score every single knowledge base entry, but winnowing down responses from a broad knowledge base dip is a really powerful way to concentrate the AI’s response and speed up the larger model’s response time.
Plus, if you’re not going to append all those results to the AI model’s context, then why get them scored and sent back anyway?

Show hidden characters

	# Format kb_responses into a list of sources
	kb_responses_text = []
	for kb_response in kb_responses:
	kb_responses_text.append(
	[
	kb_response['text']
	]
	)

	# Flatten
	kb_responses_text = [item[0] for item in kb_responses_text]

	# Construct body
	body = json.dumps(
	{
	"query": flat_conversation,
	"documents": kb_responses_text,
	"top_n": rerank_number_of_results,
	}
	)

view raw rerank_text2.py hosted with ❤ by GitHub

Next, we invoke our reranker model and send it everything it requires.

Show hidden characters

	# Fetch ranks
	rank_response = bedrock_client.invoke_model(
	modelId=rerank_model_id,
	accept="application/json",
	contentType="application/json",
	body=body,
	)

view raw rerank_text3.py hosted with ❤ by GitHub

We load the response in and decode it, and we can see the response looks like line 7 - 16. Note that none of this is the actual text that was sent - it’s simply giving you the address in the array you sent and the scores.

The scores are notably listed in order, which is helpful.

Show hidden characters

	# Decode response
	rank_response_body = json.loads(
	rank_response['body'].read().decode()
	)

	# Response looks like this:
	# [
	# {
	# "index": 9,
	# "relevance_score": 0.9438672242987702
	# },
	# {
	# "index": 0,
	# "relevance_score": 0.9343951625409306
	# }
	# ]

view raw rerank_text4.py hosted with ❤ by GitHub

This isn’t very many lines of code, but there’s a lot going on here. Thus far, we have our knowledge base responses (the full array information from the knowledge base) and the rank_response_body results, the list from the reranker of the array address and its score.

We want to walk through the results from the reranker, so that’s what we do in a `for` loop, on line 9.

For every item, we look up the text and URL of the item based on the array index.

Very important here is the order of the items in your knowledge base response, and the items in your flattened list (and rerank response) are all in the same order, or the array index address wouldn’t be helpful at all.

The relevancy score isn’t already in the kb_responses, so we add that key through direct `for` loop item reference.

Then, on line 12, we return the freshly enriched and filtered list of kb_responses.

Show hidden characters

	# Iterate through the rank response and reorder the kb_responses and add relevance_score
	# We're also filtering just for the most relevant results according to rerank_number_of_results
	ranked_kb_responses = [
	{
	# Use the index value in rank_response to find the correct kb_response
	"text": kb_responses[rank_response['index']]["text"],
	"url": kb_responses[rank_response['index']]["url"],
	"relevance_score": rank_response['relevance_score']
	} for rank_response in rank_response_body['results']
	]

	return ranked_kb_responses

view raw rerank_text5.py hosted with ❤ by GitHub

Summary

And that’s it! This is one of the shorter articles in this series, but there’s a lot of meaning and polish embedded in it, so I didn’t want to water it down.

I didn’t realize the power of reranking when I started building this bot and learning these things, but it can be an incredibly efficient and powerful tool for incrementally walking forward the fidelity of the information provided by your AI tools.

Remember, all the code is here and MIT licensed so you can build it yourself:

github.com/KyMidd/SlackAIBotServerless

Hope you’re enjoying the ride! Good luck out there.
kyler

Subscribe to Let's Do DevOps

By Kyler Middleton · Launched a year ago

Let's Do DevOps by Kyler Middleton

By subscribing, I agree to Substack's Terms of Use, and acknowledge its Information Collection Notice and Privacy Policy.

1 Like

Let's Do DevOps

🔥Building a Slack Bot with AI Capabilities - Part 8, ReRanking Bedrock Knowledge Base Vectors to Improve Result Relevancy🔥

aka, just give me those GOOD vectors please

Getting the ReRanker

Implementing ReRanking

Double-Checking the Knowledge Base

ReRank the Vectors

Summary

Subscribe to Let's Do DevOps

Discussion about this post