Protect Your Privacy By Rolling Your Own Virtual Assistant

Everyone can agree that voice assistance is an extremely useful technology. Believe it or now but IBM actually created the first of its kind in 1961 being able to recognize 16 words and digits. Throughout that time it has been a bumpy road getting this technology right. Often times we even gave up on it altogether due to its inaccuracies in its infancy. However, a lot of time has passed and developers have really worked hard to perfect it.

Now that we can rely on decent voice recognition we have to worry about another thing. Data mining! In order for voice assistance to work well it means that the microphone has to be on at all times for it to respond instantly to your commands. And depending on the provider, your microphone recordings go straight to a data center. In regards to Amazon, they have even confirmed that you cannot delete your audio recordings as they store them on their servers.

So what if I told you we could roll our own privacy-minded, highly customizable virtual assistant ourselves? We can with Python, it’s considered a top ten programming language and great for beginners to software development. Python offers a number of packages where we can build our assistant in only 65 lines of code. Continue on to learn how to open up a terminal, search Google and Youtube or even put your device to sleep all from simple voice commands

Setting up our environment

For our project we will need to install on our machine, Python and Pip, the Python package manager. We can easily install it with Chocolately.

If you want to learn how to install Chocolately, read my article on it here.

Execute the following command in an elevated Powershell terminal to install Python/Pip.

choco install python pip

You may have to re-open the terminal after you install Python to have properly set environment variables. Test it out to make sure Python was installed properly.

python --version

If you get the following result, Python is installed correctly.

Now that Python is on the system we need the following non-standard library packages:

Speech Recognition, this package will turn our voice commands into text.
Google Text to Speech, we’ll need this to turn text to speech.
PlaySound, this package is going to play our virtual assistant’s voice for us.
PyAudio, the need this for low level recording purposes.

To install all at once, use the following command.

pip install speechrecognition playsound gtts

PyAudio Pip repository is not compatible with Python 3.8 but fortunately the Laboratory For Fluorescence Dynamics created an updated package located here. It’s listing is named, PyAudio‑0.2.11‑cp38‑cp38‑win_amd64.whl.

Use the following command to fetch and install.

pip install https://download.lfd.uci.edu/pythonlibs/w3jqiv8s/PyAudio-0.2.11-cp38-cp38-win_amd64.whl

We should have everything to build our script now.

Creating our virtual assistant.

To build our assistance we only need one script file. Create a new Python script and name it whatever you want. Make sure the file extension ends in “.py”. Open it up in your favorite editor so we can start building.

Here is how it’s built.

First, we need to import our packages. The imports we didn’t talk about yet are commented.

import speech_recognition as speech 
import subprocess              # For opening up applications
import playsound
import os                      # For Removing audio files
import uuid                    # For creating random file names 
from gtts import gTTS
import webbrowser              # For Opening a default web browser

To recognize speech, we need to instantiate a class of Recognizer from the Speech Recognition package. Without this, we can’t listen for voice commands.

recognizer = speech.Recognizer()

For our assistant to talk back to us we need to create a speaking method. Here we take text, convert it to audio with Google’s Text to Speech, save it to an MP3, play the MP3, then delete it.

def speak(audio_data):
    tts = gTTS(text=audio_data, lang='en')
    audio_mp3 = str(uuid.uuid4()) + ".mp3"
    tts.save(audio_mp3)
    playsound.playsound(audio_mp3)
    os.remove(audio_mp3)

To listen for our commands, create a record method. This will eventually be on constant loop. For the best results, I found that having a timeout and phrase time limit of 2-3 seconds works the best. The reason for this is it allows for short capture times but enough time to fit in a detailed command.

def record():
    with speech.Microphone() as mic_source:
        audio = recognizer.listen(mic_source, timeout=3, phrase_time_limit=3)
        voice_data = ''
        try:
            voice_data = recognizer.recognize_google(audio)
        except speech.UnknownValueError:
            print("no value")
        except speech.RequestError:
            print("error")
        print(voice_data)
        return voice_data.lower()

The following response method is the bread and butter of our assistant. This is where we match a command with the action we want to take on our computer. There is an additional helper class to allow for multiple commands for the same action called contains.

This is where the customizing comes in. You can adapt this part to add more commands for whatever you desire. Feel free to modify and add to this but for this demonstration we will:

Open up Powershell with the subprocess module for the “shell” or “powershell” command.
Search Google and ask for a search term, and open it up in the browser with the “search” or “google” command.
Search Youtube and ask for a search term, then open it up in the browser with the “youtube” or “tube” command.
Lastly, with the “suspend” or “hibernate” command, we use the standard library module, OS, to send a shutdown command to Windows.

def response(voice_data):
    print(voice_data)
    if contains(["shell", "powershell"], voice_data):
        subprocess.Popen(["powershell.exe"], creationflags=subprocess.CREATE_NEW_CONSOLE)

    elif contains(["search", "google"], voice_data):
        speak("search what?")
        search = record()
        url = "https://google.com/search?q=" + search
        webbrowser.get().open(url)

    elif contains(['youtube', "tube"], voice_data):
        speak("search what?")
        search = record()
        if search != "":
            url = "https://youtube.com/results?search_query=" + search
            webbrowser.get().open(url)

    elif contains(["suspend", "hibernate"], voice_data):
        speak("are you sure?")
        shutdown = record().lower()
        if shutdown != "no":
            os.system("shutdown.exe /h")

    else:
        speak("command not recognized")

def contains(terms, voice_data):
    for term in terms:
        if term in voice_data:
            return True

The last thing we need to do is make an infinite loop of looking for key commands and processing them if they are there. The following is a while loop that records for 3 seconds, and if audio is capture it will look for a response.

while(1):
    voice_data = record()
    if voice_data != "":
        response(voice_data)

Now run the script and test it out. For this you can use Pythonw.exe to start without a terminal shell.

pythonw <NameOfScript.py>

If you want to time out your commands exactly, look for the microphone icon that will pop up every time the script is looking for audio to capture.

Go ahead test out some commands! Enjoy no data going off to some corporation’s data center.

Take it even further

Pretty cool, huh? Not only is your data not getting stored for anyone to listen to, this virtual assistant is highly configurable to whatever your needs are. Take it even further! Create a start up task so you don’t have to run the script every time you start your device. Is there a really neat Python package out there you can make use of or automate? Just import it and create functionality with a new command. There is nothing stopping you from re-creating the same features as Cortana, Alexa, Siri, and Google Assistant. Most likely, you will be able to do even more! Now enjoy your digital assistant you know you can trust!

michael rinderle

Michael has been a professional in the information technology field for over 20 years, specializing in software engineering and systems administration. He studied network security and holds a software engineering degree from Milwaukee Area Technical College with thousands of hours of self-taught learning as well. He mainly writes about technology, current events, and coding. Michael also is the founder of Sof Digital, an U.S. based software development Firm. His hobbies are archery, turntablism, disc golf and rally racing.