Part-2: Pavai.Talkie (C3P0) reimagines C-3PO’s hands free voice dialog system using AI in 2024

Minyang Chen
4 min readMar 6, 2024

--

In a previous article (Pavai.Vocei) I discussed my motivation and objectives on reimagining C-3PO capabilities. In this article, I will cover the second application: Pavai.Talkie.

What is Pavai.Talkie?

This is a voice assistant cli terminal application that handles hands free voice chat with generative AI models as the backend. It works similar to Amazon Alexas, Apple Siri or Microsoft Kitana or Google assistant.

How is it different from other Cloud based Voice assistants?

This app runs privately locally on a single PC with all the models. No risk exposure on data privacy. Specially, droid systems need to set up all modules to run in a standalone model like C-3PO does. Also, the Cloud system tends to use NLP processing, where the attempt here is to use Generative AI model as the technological advancement. Also, Talkie generates human-like voices in real-time with a touch of styled tone.

Lastly, a droid systems need to set up all modules to run in a standalone mode and be self-conscious in risk and dangers in the world environment.

Handsfree Dialog System

In short, when Talkie runs it will continue listening for voice activities, monitor for wake up words and conversation questions and trigger words to close the question so the system can start sending queries to LLM.

The longer explanation is that the system runs in a dual-looping system to continue listening for questions and provide answers to the question.

See diagram below:

Dual loop diagram system diagram

In the dual loop system, the world loop keeps an eye for new conversation or hard stop commands like “reset” or “say again”. The conversion loop keeps the question in order as precisely as possible.

The dialog format

the pattern currently implemented as follow:

[Wake Up word] → [Question] → [please] as ending word

or use standard talkie code like “over, roger”. The reason is that this

Wake-up word

Speak a wake up word first so the system knows which persona voice to answer: say “mark” to wake up mark real to say hi and respond to you.

Other wake up work supported at this moment: I think Star War fans know who they are.

  • “anthony anthony”
  • “skywalker”
  • “yoda master”
  • “princess leia”
  • “c-3po”

For those not in the Star War universe yet; Anthony Deniel is the real person playing C-3PO in the movie, the inspiration character for this project. I think we have the party cover in an easy way.

Ending word

At the end of the question, it’s important to tell the system you completed your question by saying an ending word like “please” or standard walkie-talkie coding system like “over”,”roger that”, “thanks”…

System Startup

The system automatically performs a scan of the resources and conducts a sanity test on the services by default. The results of this scan are reported at the end. You should hear Jane’s voice delivering the status and task updates.

Voice System

Talkie by default uses Jane as the system voice on start-up system checks, then turn-over the user-specified person voice for the question response it’s default to “mark_real”. Guess who this is? the real luke skywalker 🙂

Screenshots

Let’s take a look at some screenshots of running the app in the terminal.

Startup System Health Check
Ask question
Activate Pricess Leia Voice Response

Challenges Remain

Implementing a hand-free dialog system proved to be a challenging task. One of the difficult situations I found was the microphone capture and transcribe the response speech the system says when you have the microphone and speaker turn-on at the sametime. Still haven’t figured out how to disable the microphone while the system is talking without breaking the dialog loop. If someone knows how to fix this, please help.

The dialog system has few things hard-coded for now — needs some redesigned to make it more natural. My attempt to use standard speech pauses to trigger an LLM call was not successful because it is not always clear whether the question sentence is complete or only partially complete. As a result, still need ending words to signal that the question is ready.

What is next?

If you like my adventure and exciting to learn more about it. explore the code here and have some fun with it.

Github Repo: https://github.com/PavAI-Research/pavai-c3po.git

Friendly reminder:

Please keep in mind this alpha release with limited testing in Linux and Nvidia GPU. If you have instructions on a tested setup on another system, please let me know, so I can update the instructions.

Welcome suggestion and help to make this a reality.

“May the AI force be with you”

Still feel interested? continue reading other parts

Part-1: Pavai.Vocei (C3P0)

Part-2: Pavai.Talkie (C3P0)

Part-3: Pavai.Workspace

Credits:

All incredible open-source community projects on AI development eco-system to make this possible.

--

--

Minyang Chen
Minyang Chen

Written by Minyang Chen

Enthusiastic in AI, Cloud, Big Data and Software Engineering. Sharing insights from my own experiences.

No responses yet