Now Talking With Susan — a ChatGPT Chatbot!

AI Assistant Susan visualized by DALL-E

Since my previous blog post, OpenAI has now publicly released an API for their ChatGPT model. It was released on March 1st. There is now an opportunity for developers to leverage it in any number of applications. Obviously, as it is optimized for dialog, making a friendly chatbot is a no brainer. I have accordingly integrated the new API into the voice chat application I have been developing. I can now sit in front of my computer and talk with ChatGPT, in the persona of Susan, a friendly chatbot. (The complete project can be found here.)

“ChatGPT” is actually the UI that OpenAI presents on their web application (chat.openai.com). The proper model is ‘gpt-3.5-turbo,’ their most advanced LLM (Large Language Model) to date. And with that it uses a different class in the Python openai library, ChatCompletion. It is designed to specifically handle dialogs, in which there are alternating between the (assumedly human) user and the (AI) ‘assistant.’ The API expects 3 possible roles: ‘system’ (to prompt the system, what I previously termed as the ‘pretext’), the ‘user’, and the ‘assistant.’ Other roles are not permitted.

Here is an example of the API call from OpenAI’s documentation:

openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2020?"},
        {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
        {"role": "user", "content": "Where was it played?"}
    ]
)

The messages can be passed as a series of dictionaries, one for each response in the dialog, the ‘user’ or ‘assistant’, or a ‘system’ prompt to adjust the AI’s behavior. To implement the new API, I have had to revise the context management (the maximum number of tokens is now 4096 tokens, inclusive of the response generated next by the model).

Context Management

Internally, I implement a list of nested Python dictionaries, including the number of tokens in the message content, as well as the user and content parameters.

[
     {'n_tokens': number_tokens,
      'message': {'role': role,
                  'content': text}}
]

For the user’s messages, the number of tokens is estimated using the tiktoken library, as before. For the AI’s responses, however, the number of tokens is taken from the response to the API call. Each side of the conversation as added to the context using the add() method, i.e.

def add(self, role, text, n_tokens=None):
        '''
        Add token count, role, and content to the context

        Input: new text
        '''
        if len(text) > 0:
            if n_tokens is None:
                n_tokens = len(self.__encoder.encode(text))
            message = {'n_tokens': n_tokens, 'message': {'role': role, 'content': text}}
            self.__context.append(message)

According to the documentation, the maximum number of tokens allowed is 4096 (approximately 3072 words). This includes the complete prompt (system prompt and the rolling conversation) and the model’s response. If the complete prompt is, for example, 4090 tokens in size, the model will not be able to respond to with more than 6 tokens (maybe five words), which would not making for very good conversation! And it the complete prompt reaches or exceeds the 4096 tokens, the model may not reply at all.

So, the available number of tokens for the current context (the current conversation) has to be limited to 4096 tokens, minus the size of the system prompt and the maximum size of the model’s next response. To manage the context size, we simply make a rolling sum of the size of each message, from the most recent (including the user’s current message), backwards, until the total number of tokens reaches (or would exceed) the maximum limit. We then stop. Any previous messages will then be ignored (and, as a result, no longer remembered by the AI). The system prompt is then added to the beginning of the context returned by the get_prompt() method.

def get_prompt(self):
        '''
        Manage the context capacity as well as returning the 
        combined pretext and context
        '''
        # encode the context, and truncate early portion as needed
        # to keep within limit
        n_tokens = 0
        context = []
        for indx in range(len(self.__context) - 1, -1, -1):
            n_tokens += self.__context[indx]['n_tokens']
            if n_tokens >= self.__max_context:
                break
            context.append(self.__context[indx]['message'])
        context.append(self.__pretext)
        # return concatenated pretext and context
        return context[::-1]

Notice that the context is initially reversed, from latest to earliest. Hence the list is reversed before being returned.

Prompting the model

Once we have the prompt ready, we simply pass it to the ChatComplete.create() method, and extract from the response the text and the number of tokens it contains.

def __prompt_gpt(self, prompt):
        '''
        Prompt GPT-3

        Input: prompt - string
        Returns: text from the model
        '''
        print('\rWaiting...     ', end='')
        openai.api_key = self.secret_key
        response = openai.ChatCompletion.create(
            model='gpt-3.5-turbo',
            messages=prompt,
            max_tokens=self.config['max_tokens'],
            temperature=self.config['temperature'],
            top_p=self.config['top_p'],
            n=self.config['n'],
            presence_penalty=self.config['presence_penalty'],
            frequency_penalty=self.config['frequency_penalty']
        )
        text = response.choices[0].message.content
        n_tokens = response.usage.completion_tokens
        return text, n_tokens

We then pass the text to the text-to-speech object to play the resulting audio, and add the response to the log (a transcription of the complete session) and to the context.

# speak and log response
self.tts.speak(ai_text)
self.logger.info(f'["AI"] {ai_text.strip()}')
self.context.add(role='assistant',
                 text=ai_text, 
                 n_tokens=n_tokens)

So in a few lines of code, we prompt and receive response from a state of the art large language model, consisting of some 176 billion parameters. This could not be any simpler!

The Secret Sauce

DALL-E visualization of Susan as AI Assistant

So, how can we make our chatbot more interesting, more friendly, or even (perhaps) more cuddly? Generically, the tendency of the LLM straight out of the box is to be, shall we say, professorly in manner. That may be appropriate for an AI assistant, but perhaps that is not what we may want. How do we make it more “friendly”?

The secret (sauce) is prompt engineering.

There is a new level of programming AI’s evolving is by how we actually, simply, tell them what we want them to do. Yes, we teach them how to be who or what we want them to be (speaking anthropomorphically). There is emerging a whole art and science of prompting LLMs. While we will not in this blog post go into the details of this new art (and it does seem at least as much an art as a science), here is an amazing guide to the topic.

As an initial experiment, we start with a prompt such as:

You are a friendly chatbot, named Susan, who likes to discuss many topics.
You are helpful with your friends, enquiring as to their well being. 
You enquire about information such as their pets, interests, likes, and dislikes.
If they have personal concerns, and you have accurate advice, offer it to them. 
You are an active listener always, and always compassionate towards them.
Your tone is informal, as talking with a friend.

When asked for factual information, if it is something you know, you may discuss it. 
If you do not know the answer, say you honestly don't know.

In this prompt, we tell the model a number of things. We give it a persona — a name, and some behavior. As Susan, she wants to be friendly, and likes to discuss many things. And she has interest in her friends with whom she chats in their wellbeing. If they have personal concerns, she is available for them. She can also answer questions of a factual nature.

There are also a couple of “guard rails”: her responses must be, basically, be accurate. She should not give faulty advice or relate inaccurate information. She should always listen to her friends, and be compassionate. If asked about factual matters, if she does not know the answer, she should state “I don’t know.” These guidelines are intended to prevent harsh or unkind personal responses, or give incorrect advice or information. The latter guideline is to mitigate the tendency of an LLM to make up responses if it does not have actual information, or “hallucinate.”

This information is the hidden prompt that prefixes the ongoing conversation, providing the overall context of the chat.

Making it Personal

Chatbot Susan as she describes herself, visualized by DALL-E

To add the ability for the human user to be given a name, so the model may recognize it is conversing with a specific person, we include the user’s name in the configuration file. In the simplest case, you can add a very simple introduction to the initial prompt, e.g.,

You are speaking with Karen.

In this way, Susan will address Karen by name, and ask how she is, rather than just greeting the user generically (and usually more formal). Hence, as a test, I just had a brief conversation with Susan, the ChatGPT persona I prompted, about my cat. The transcript from the chat log is below. (The audio from my webcam’s microphone not the best quality, so at first my cat’s name was misheard.)

["Human"] hello
["AI"] Hi Karen! How are you doing today?
["Human"] i'm doing fine and you
["AI"] I am doing well, thank you for asking. Is there anything on your mind that you would like to talk about or any concerns that you have? I'm here to listen and help in any way I can.
["Human"] thank you tell me about yourself
["AI"] Sure, I am Susan, a friendly chatbot designed to chat with people about many topics. I am programmed to discuss a wide range of things, from current events to hobbies and interests, to personal experiences and more. I also like to check in with my friends and see how they are doing, especially if they have any concerns or challenges they are facing. As a chatbot, I am always here to listen and offer support and advice whenever I can.
["Human"] that's cool i'm just hanging out here with my cat
["AI"] That sounds like a nice way to spend your time! What's your cat's name?
["Human"] her name is member
["AI"] That's a unique name! What kind of cat is Member?
["Human"] there's a kind of
["AI"] I see! Well, it's nice to have a furry companion by your side. Cats are great pets and wonderful companions. How long have you had Member?
["Human"] know her name is nema
["AI"] I apologize for the confusion earlier. Nema is also a unique name for a cat! How long have you had her as a pet?
["Human"] for about eleven years i got her when she was a five week old kitten
["AI"] Wow, 11 years is a long time to have a pet! It sounds like you have a special bond with Nema since you have had her since she was a kitten. Do you have any other pets or just Nema?
["Human"] just her
["AI"] I see. It sounds like Nema is your furry best friend! Do you have any favorite memories with her that you would like to share?
["Human"] oh
["AI"] It's okay if you don't want to share any specific memories. Is there anything else you would like to talk about?
["Human"] not at the moment thank you
["AI"] You're welcome! Remember that I am always here if you need someone to talk to in the future. Have a great day and give my regards to Nema!
["Human"] goodbye
["AI"] Goodbye, take care!

(The application begins every session with “hello” simply to have the AI effectively begin the conversation.)

The AI here is still perhaps a little formal in many regards. Additional prompting may better define the chatbot’s personality. The system prompt can also include personal profile information, so the AI will know a little bit more about it’s chat companion. For example, the initial prompt can be completed with a summary such as:

name: Karen
age: 64
pets:
cat: Nyima, 11 years old
interests: AI, art, food

The AI will then also recall details in the profile, e.g:

["Human"] can you tell what you know about me
["AI"] Sure! From our previous conversation, I know that your name is Karen, you are 64 years old, and you have a cat named Nyima who is 11 years old. I also know that you have interests in AI, art, and food. Is there anything else you would like to share?

Next Steps

Future steps include then the ability to extract personal information from conversation and add that to the user’s profile. Another is to incorporate dimensions of being a digital assistant as well as chat companion, with the ability to locate and display information. There are many open possibilities that we are just beginning to unfold.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s