Desktop Assistant (Jarvis): What It Is and How It Works

Desktop Assistant — known as Jarvis — is an open-source virtual assistant written in Python. It listens for your voice, carries out tasks on your computer, answers questions, searches the web, sends emails, plays music, and speaks responses back to you using text-to-speech. Whether you prefer a graphical interface or just your microphone, Jarvis is designed to make everyday desktop tasks faster and more hands-free.

What Jarvis Can Do

Jarvis combines several capabilities into a single conversational interface:

Voice recognition — captures audio from your microphone and transcribes it using the Google Speech Recognition engine via the SpeechRecognition library.
Text-to-speech (TTS) — responds out loud using pyttsx3, an offline TTS engine that works with SAPI5 on Windows and eSpeak on Ubuntu.
Web browsing — opens popular websites (Google, YouTube, Wikipedia, Amazon; plus GitHub on Windows) and fires off searches on your configured search engine.
Wikipedia lookups — fetches two-sentence summaries directly from Wikipedia.
Email — composes and sends emails via SMTP using credentials stored in config.ini.
Music playback — plays and controls MP3 files from a configured folder using pygame.mixer.
System time and date — announces the current time and date on request (Ubuntu entry point only).
Graphical interface — displays a scrollable chat log and a Speak button inside a Tkinter window so you can follow the conversation visually.

Architecture Overview

The project’s source lives entirely in the src/ directory and is split into five modules:

File	Purpose
`Jarvis2.py`	Ubuntu entry point. Self-contained: embeds TTS, speech recognition, and command dispatch in one file.
`Jarvis2_4windows.py`	Windows entry point. Reads `config.ini` at startup, then delegates to `actions.py` and `commands.py`.
`actions.py`	Windows helpers — `speak()`, `wish_me()`, `search_engine_selector()`, and voice-settings mutators (`change_rate`, `change_voice`, `change_volume`). Used by `Jarvis2_4windows.py` only; initialises the SAPI5 TTS engine.
`commands.py`	Individual command handlers (`command_wikipedia`, `command_open`, `command_search`, `command_mail`, `command_play_music`, etc.).
`gui.py`	Tkinter window definition — the 700×500 `root` window, scrollable `Listbox` chat log, and the Speak button.

The Windows entry point (Jarvis2_4windows.py) checks for config.ini before doing anything else. If the file is missing it prints an error and exits. The Ubuntu entry point (Jarvis2.py) is configuration-free and runs directly.

Key Features

Offline Text-to-Speech

Uses pyttsx3 for TTS — no internet connection required to speak responses. On Windows, voice, rate (default 150 wpm), and volume are adjustable at runtime or via config.ini.

Google Speech Recognition

Captures audio with PyAudio and transcribes it through the Google Speech Recognition API. Energy threshold (default 300) and pause threshold (0.5 s) are tunable.

Configurable Search Engine

Pick Google, Bing, DuckDuckGo, or YouTube as your default search engine in config.ini (Windows only). Any other value is validated as a live URL before falling back to Google.

Debug / Type Mode

Set debug = True in config.ini to type commands at the terminal instead of speaking — useful for testing without a microphone (Windows entry point only).

Music Playback

Plays, pauses, unpauses, and stops MP3 files via pygame.mixer. On Windows, point musicpath in config.ini to your music folder.

Cross-Platform

Runs on Windows (with SAPI5) and Ubuntu (with eSpeak). Separate entry points and requirement files keep platform-specific dependencies isolated.

System Requirements

Requirement	Details
Python	3.9 or later (3.9 recommended — all dependencies are pinned against it)
Microphone	Required for voice input; can be bypassed with `debug = True`
Internet	Required for Google Speech Recognition and Wikipedia lookups
OS	Windows 10+ or Ubuntu (tested with espeak installed)
Ubuntu extra	`espeak` system package (`sudo apt-get install espeak`) and `portaudio19-dev` for PyAudio

Next Steps

Installation

Install Python dependencies on Windows or Ubuntu, or download the pre-built binary installer.

Quickstart

Clone the repo, drop in your config.ini, and give Jarvis your first voice command in under five minutes.

Get Started

Configuration

Voice Commands

Contributing

Desktop Assistant (Jarvis): What It Is and How It Works

What Jarvis Can Do

Architecture Overview

Key Features

Offline Text-to-Speech

Google Speech Recognition

Configurable Search Engine

Debug / Type Mode

Music Playback

Cross-Platform

System Requirements

Next Steps

Installation

Quickstart

Build docs developers (and LLMs) love

Get Started

Configuration

Voice Commands

Contributing

Documentation Index

​What Jarvis Can Do

​Architecture Overview

​Key Features

Offline Text-to-Speech

Google Speech Recognition

Configurable Search Engine

Debug / Type Mode

Music Playback

Cross-Platform

​System Requirements

​Next Steps

Installation

Quickstart

Build docs developers (and LLMs) love

What Jarvis Can Do

Architecture Overview

Key Features

System Requirements

Next Steps