From Zero to Hero: Want to cheaply build a robust multiuser chat assistant? 🤖

By Jerry D Boonstra

June 24, 2024

developer2

Introduction

I have spent a few months diving into the LLM application development scene, and wow is there alot going on!

With my new-found knowedge, I built a multi-user chat assistant at work that could be helpful to data scientists. Challenged to do it quickly and for ultralow-cost, I will share some of what I learned in the process.

In a series of articles published regularly I will delve into the nuts and bolts of creating a similar multi-user chat assistant with AWS Serverless and OpenAI. Expect me to improve the shared application and add new posts to the series over the coming weeks.

The series as planned

(this article) 👉 From Zero to Hero: Want to cheaply build a robust multiuser chat assistant?
Build and deploy your first multi-user Chat Assistant, using AWS Lambda and OpenAI Assistant API and TypeScript
Add Logging Traces and Ratings to your Application
Add Unit Tests and Evals to your Application
Fine-tune your LLM Application to balance Accuracy, Robustness and Cost
…?

Some things you should know

Before we go further in, this series will make a few assumptions about your use case.

Our assumptions:

You are ok with sharing your input data with OpenAI - within their ToS
A multi-turn chatbot -style interface is a suitable one for your business problem
You think AWS serverless technologies are cool (enough :)), and you have admin access for your account
You are comfortable with Typescript

If these do not hold, this series might not be for you.

Beyond a Demo

To paraphrase AI consultant Hamel Husain:

Many people focus exclusively on changing the behavior or the system… which prevents them from improving their LLM products beyond a demo.

Success with AI hinges on how fast you can iterate. In addition to a process for changing your system, you must have processes and tools for:

Evaluating quality (ex: tests)

Debugging issues (ex: logging & inspecting data)

Evaluating quality in a continuous improvement process is important if we want a robust application that works “beyond a demo”.

Controlled Change

A good continuous improvement process for LLM-based applications has a quality evaluation step, and looks like this:

cycle

Its alot. So how do you really start?

First Steps

Here is a suggested series of steps to go through to get a robust application solution, starting from zero.

Step 0: Understand the Problem

Have enough of an understanding of your problem to be able to describe it in natural language.

This is step zero because its the most critical step.

First Prompt

Write down your first prompt, prioritizing these aspects:

Clarity: Ensure the prompt is clear and unambiguous. Avoid vague language and be specific about what you’re asking.
Context: Provide sufficient context to frame the question or task. This might include background information, specific scenarios, or examples.
Conciseness: Keep the prompt as brief as possible while retaining necessary detail. This helps the model focus on the main task without unnecessary complexity.

Step 1. Choose Model and Tools

OpenAI Assistant supports multiple models and multiple tools per model.

if you choose gpt-3.5-turbo or newer you can use any of these tools:

Code Interpreter: allows Assistants to write and run Python code in a sandboxed execution environment
File Search: augments the Assistant with knowledge from outside its model, such as proprietary product information or documents provided by your users
Function calling: allows you to describe functions to the Assistants API and have it intelligently return the functions that need to be called along with their arguments.

Step 2: Do Prompt Engineering in the Playground

Prompt engineering is a discipline for developing and optimizing prompts to efficiently apply and build with large language models (LLMs) for a wide variety of applications and use cases.

To get started, its helpful to browse categorized OpenAI example prompts and PromptingGuide.ai example prompts.
To further improve your prompts you should familiarize yourself with Promptingguide.ai’s detailed guide on this topic.
With your chosen model, do iterative prompt engineering until you get something that works most of the time. More on this below.

Use the OpenAI Assistant Playground, the quickest way to prove the concept.

When in the playground

Sprinkle in as much context as your application needs.
- Providing context can take the form of prompting plus
  - the administrator can add content from additional documents with Retrival Augmented Generation.
    - This is where we use standard NLP techniques for search to identify subset of the documents that may be relevant and supply to the LLM.
  - The user can upload documents
Providing few shot examples often helps for otherwise hard-to-address cases.
Prompt chaining - where a task is split into subtasks with the idea to create a chain of prompt operations - is useful to approach complex tasks
For the most deterministic output use lowest temperature (t=0)

You will eventually hit a good enough threshold that you can stop adding more prompts and start using your prompt and model choice.

Step 3: Build a prototype application

Once you have something that seems to be working, its time to get it in front of some users!

For this you’ll need a multiuser application, which is the topic of our next article in the series.

Part 2: Build and deploy your first multi-user Chat Assistant, using AWS Lambda and OpenAI Assistant API and TypeScript