Category Archives: Artificial Intelligence

Voice Chatting with GPT-3

OpenAI’s GPT-3, and more recently ChatGPT, has created a large impact on people’s imaginations, hopes, and fears concerning AI. The apparent naturalism of the text generated by those and similar Large Language Models are indeed impressive — to the extent even some wanting to attribute to them consciousness or personhood (short answer is “no”).

DALL-E visualization of Susan, an AI assistant, prompted by her own description “If I had a physical form, I’d probably look like a robot with a sleek, modern design. I’d be about 5 feet tall, with a smooth and shiny exterior. My eyes would be glowing blue, and my voice would be soft and comforting.”

I have been building a program, in Python, to be a voice interface with GPT-3. OpenAI provides an API to access GPT-3 on the cloud, which is actually fairly easy to use to at least generate text using the pretrained model. There are several ‘flavors’ of GPT-3, with varying sizes, and I am using the DaVinici model which is the largest, sporting 175 million parameters and trained on petabytes of text data.

Without going into detail about the code (which is available here), the basic process is to:

  • Record the user’s speech
  • Using speech recognition algorithms (currently Google Speech Recognition), translate to text.
  • Send that to GPT-3 for response.
  • Translate text response to speech and play as an audio file

However, to produce a ‘chatbot’ with some attributes rather than just the generic language model, one Prompts it with the desired traits. This is a basic form of Prompt Engineering, using the one shot learning capability of models such as GPT-3. Then, one has to also maintain some degree of continuity of conversation, which is accomplished by prompting the model with not just the most current query or reply from the user, but the previous interactions (up to the input limit of the model, in this case being 2048 tokens. Combined, the chatbot’s description and the previous conversation, forms the context which the model replies to, essentially continuing a story or script. E.g., at a certain point, the model was prompted with this context:

The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly. The assistant’s name is Susan.
[human] hello Susan
[AI] Hi there! How can I help you today?
[human] can you maybe describe yourself to me and how you look and things like that?
[AI] Sure thing! I’m an AI assistant, and my job is to help you with whatever you need. I’m a virtual assistant, so I don’t have a physical form. I’m friendly, helpful, and always ready to lend a hand!
[human] okay, but if you did have a physical?

The model then continues the dialog with the description used later to generate an image (using a different AI, OpenAI’s DALL-E):

[AI] If I had a physical form, I’d probably look like a robot with a sleek, modern design. I’d be about 5 feet tall, with a smooth and shiny exterior. My eyes would be glowing blue, and my voice would be soft and comforting.

In testing, I had a brief discussion with Susan the meaning of life and or meditation, that was fairly naturalistic. It is transcribed here from a natural voice conversation.

[human] hello Susan

[AI] Hi there! How can I help you?

[human] what is your name and what do you do

[AI] My name is Susan, and I’m an AI assistant. I’m here to help you with any questions or tasks you need assistance with.

[human] can I ask you about philosophy

[AI] Absolutely! What would you like to know about philosophy?

[human] well what is the meaning of life

[AI] That’s a big question! Philosophers have been debating the meaning of life for centuries. Ultimately, it’s up to each of us to decide what the meaning of life is for ourselves.

[human] and how can we find that out

[AI] That’s a great question. A lot of people find their own meaning of life through self-reflection, meditation, and spiritual practices. Others find their meaning in their relationships, their work, or their hobbies. Ultimately, it’s up to each of us to find our own meaning of life.

[human] can you tell me more about meditation?

[AI] Sure! Meditation is a practice of focusing on the present moment. It can help increase awareness of yourself and the world around you, as well as reduce stress and anxiety. There are many different types of meditation, so it’s important to find the one that works best for you.

[human] what is a very simple way of meditation?

[AI] One of the simplest forms of meditation is called “mindful breathing.” To practice mindful breathing, find a comfortable position and focus on your breath. Notice the sensation of your breath as it moves in and out of your body. Whenever your mind wanders, simply bring your focus back to your breath.

[human] that sounds good I think I am going to go practice it now

[AI] That’s great! I hope you find your meditation practice helpful. Let me know if there’s anything else I can do for you!

[human] I will thank you

[AI] You’re welcome! Have a great day.

[human] goodbye

[AI] Goodbye! Have a wonderful day!

An alternative impression of the AI assistant Susan generated by DALL-E

Given such a naturalistic and informed (on the AI’s part) conversation, it is easy to anthropomorphize an AI model such as GPT-3. This is, I suspect, that aspects of our own brains do not know about AI or virtual beings: we see, or hear, a cogent agent, we tend to, on some emotional level, assume that we are talking with a being like ourselves. (Perhaps in the same fashion that my cat may well believe I am a large and clumsy cat.) In reality, we are interacting with a language model that accurately, given a context, predicts the most likely continuation of that context, based on a generalization of the vast corpus of human language on which it has been trained. It speaks or writes compellingly, yet it has to understanding of what it says nor can reason about it.

One way to understand is that, as human beings, we use language to express our thought processes, our knowledge of ourselves, one another, and our consciousness. Hence, human language embeds much of our sentience. When a system such as GPT-3 becomes adept at modelling human language, it also models human consciousness and thought processes. This is probably one of the more powerful aspects of these models — and also potentially confusing.