Helpbot at The Texas Tribune

24 Sep 2024

Problem Statement

We use Slack at The Texas Tribune and like many other orgs out there we use a dedicated “help” channel for staff to get assistance on any IT problems, account access needs, office supply requests etc. This channel receives a high volume of questions, requiring team leads and senior management to monitor it closely to ensure timely responses. Below are some examples of the questions frequently asked.

Proposed Solution

To address questions like these, we created an FAQ document in a Q&A format. Our engineering manager also converted some tables into the Q&A format. The answers to the questions are as follows:

The proposed solution was to use AI technology to guide people to the right resources. So, we started building a Retrieval-Augmented Generation (RAG) bot, integrated with Slack, to help direct people to the right people and places efficiently.

Implementation

Attempt 1

We tested Google Cloud’s Agent Builder, and while it was easy to set up and integrate with Slack, the app didn’t offer much control over settings and we found it was inconsistent at times. These issues with Agent Builder prevented us from going any further with their product.

Attempt 2

We turned to AWS for a more custom solution and found exactly what we needed with AWS Bedrock. It provided greater control over how we work with different models and parameters, offering the flexibility we were looking for.

How does this work?

We leverage AWS Bedrock to interact with LLM models and integrate Slack via AWS Lambda. Let’s walk through the figure below to understand the process. A user asks a question in Slack, and the Lambda function sends it to Amazon Bedrock, which processes the query and returns the response to Slack. For more details on this integration, check out the AWS Machine Learning blog (the image below is sourced from that blog).


AWS slackbot
Helpbot architecture.


To implement a RAG system based solely on our documents, I converted the google documents to pdf, stored the PDF files to S3, and used AWS Bedrock’s knowledge base feature for vector data (embeddings). The knowledge base is a managed solution designed to store vector representations of documents, making retrieval fast and efficient.

Models for embedding and text generations

For the embeddings, we used Amazon’s Titan Embeddings G1 - Textv1.2, and opted for a semantic chunking strategy when processing the documents. During initial testing, I found that semantic chunking produced the best results for our use case. Next, I chose Amazon Bedrock’s AI agent feature because we wanted a tool similar to Google’s Agent Builder. You can learn more about Agents and their capabilities here. One of the key advantages of using Agent is that other team members can easily modify agents directly in the AWS console. Whether it’s updating the model, knowledge base, prompts, or parameters, the team has the flexibility to make changes without relying on extensive code updates. Finally, for model selection, I went with Claude 3 Sonnet, which performed exceptionally well for our use case.

To minimize the maintenance, we decided early on to build the serverless infrastructure. I used AWS SAM, which allows us to keep most of the stack in our codebase. This setup includes AWS CloudFormation, Lambda, Secrets Manager, and CloudWatch.

Bonus: The Texas Tribune Festival

Just before The Texas Tribune Festival 2024, we uploaded the info desk document into S3 and integrated it into the helpbot’s knowledge base. The helpbot could answer Festival-related questions. However, we noticed that responses to questions involving tables were less accurate, highlighting the importance of structuring documents in a clear question-and-answer format. As a potential improvement, we could consider using the Claude model to generate embeddings instead of the Titan model—a great idea for a future blog post! 🤔

Results

The helpbot has been incredibly useful so far. Users can ask questions either in the all-help public channel or through direct messages with the bot. What makes this bot, particularly the Claude model, stand out is its strong understanding of human intent. You can phrase questions naturally, even with minor errors, and it will still provide accurate answers—assuming the information is covered in the Q&A document.

Lessons Learned

There are many valuable lessons we’ve learned from this experience, outlined below:

Improvements

Conclusion

The helpbot is a very simple, yet powerful, tool we have built. This was great because it saved time for people who had to constantly check the slack channel. It gave us a great space to test out these tools and learn from it. Some of the lessons learned regarding evaluation, metrics, and data type will help us greatly in the future iterations.



Thank you to Ashley Hebler and Darla Cameron for bringing these ideas and helping me throughout the process. It’s been a real treat to bring this to life and learn together from it. And thank you to all the users who patiently waited for this tool to get better and provided consistent feedback!


Republished from engineering blog at The Texas tribune.