Discord Voice Translator V2

A comprehensive microservices-based Discord bot that captures voice, transcribes, translates, and posts formatted messages to text channels. Built with GPU acceleration and multi-language support.

🎯 Features

  • Real-time Voice Capture: Records Discord voice channel audio
  • GPU-Accelerated Transcription: Uses NVIDIA RTX 4000 with faster-whisper
  • Multi-Language Translation: Local NLLB models + Google Translate fallback
  • Smart Language Logic: Always shows English, German, and Korean translations
  • Professional Discord Messages: Formatted with speaker info, duration, and timestamps
  • Microservices Architecture: Scalable, maintainable, containerized services
  • Comprehensive Monitoring: Health checks, logging, and web dashboard

🏗️ Architecture

Services Overview

Discord Voice → [Recorder] → Raw PCM → [Audio Processor] → WAV 
    ↓
[Whisper Service] → Transcription → [Translator] → Multi-language
    ↓
[Transcriber] → Discord Message → [Dashboard] → Monitoring

Core Services

  1. Recorder: Discord bot that captures voice channel audio
  2. Audio Processor: Converts PCM to optimized WAV files
  3. Whisper Service: GPU-accelerated speech-to-text transcription
  4. Translator: Local NLLB + Google Translate multi-language translation
  5. Transcriber: Workflow orchestrator and Discord message formatter
  6. Dashboard: Web interface for monitoring and management

Infrastructure

  • PostgreSQL: Primary database for all structured data
  • Redis: Message queue and caching layer
  • Nginx: Reverse proxy for web services
  • Docker Compose: Container orchestration with GPU support

🚀 Quick Start

Prerequisites

  • Docker and Docker Compose
  • NVIDIA GPU with Docker runtime support
  • Discord Bot Token
  • Google Translate API Key (optional, for premium translations)

1. Clone Repository

git clone <your-repo-url>
cd discord-voice-translator-v2

2. Configure Environment

cp .env.example .env
# Edit .env with your configuration

3. Required Environment Variables

# Discord Bot Configuration
DISCORD_TOKEN=your_discord_bot_token_here
CLIENT_ID=your_bot_client_id_here
GUILD_ID=your_guild_id_here

# Database Configuration
POSTGRES_PASSWORD=your_secure_postgres_password
POSTGRES_URL=postgresql://postgres:your_secure_postgres_password@postgres:5432/voice_translator

# Translation APIs (optional)
GOOGLE_TRANSLATE_API_KEY=your_google_translate_api_key

# GPU Configuration
WHISPER_MODEL=large-v2

4. Deploy with Docker Compose

# Start all services
docker-compose up -d

# View logs
docker-compose logs -f

# Start with admin interfaces
docker-compose --profile admin up -d

5. Discord Bot Setup

  1. Invite bot to your server with permissions:

    • Connect to voice channels
    • Speak in voice channels
    • Send messages
    • View channels
  2. Use slash commands:

    • /join - Start recording in your voice channel
    • /leave - Stop recording and leave channel
    • /status - Check current recording status

💬 Discord Message Format

English Source Example

🎤 New Transcription
Speaker: mcgyvver
🌍 Language Detected: English 🇬🇧
⏱️ Duration: 17.0 seconds
🕐 Time: 12 minutes ago

📝 Transcript
Yeah, but this is a complete prank, we'll...
Let's go to the foot.

🇩🇪 German
Ja, aber das ist ein kompletter Streich, wir werden...
Lass uns zum Fuß gehen.

🇰🇷 Korean
예, 하지만 이것은 완전한 장난입니다...
발로 가자.

Smart Language Logic

  • If detected = English: Shows transcript + German + Korean
  • If detected = German: Shows transcript + English + Korean
  • If detected = Korean: Shows transcript + English + German
  • If detected = Other: Shows transcript + English + German + Korean

🔧 Service Configuration

GPU Requirements

  • NVIDIA GPU with 4GB+ VRAM (RTX 4000 recommended)
  • CUDA 11.8+ support
  • nvidia-docker runtime configured

Model Storage

  • Whisper models cached in ./data/models/
  • NLLB translation models auto-downloaded
  • Models persist between container restarts

Performance Tuning

Whisper Models:

  • large-v2: Best accuracy, ~2GB VRAM, slower
  • medium: Good balance, ~1GB VRAM, faster
  • small: Fastest, ~500MB VRAM, lower accuracy

Translation Strategy:

  • Local NLLB: Fast, private, good quality
  • Google Translate: Premium quality, API costs
  • Automatic fallback for reliability

📊 Monitoring

Health Checks

All services include comprehensive health checks:

# Check service status
docker-compose ps

# Individual service health
curl http://localhost:8001/health  # Whisper Service
curl http://localhost:8002/health  # Translation Service

Web Dashboard

Access monitoring dashboard at: http://localhost:3000

  • Real-time transcription activity
  • Translation statistics
  • Service performance metrics
  • User activity summaries

Admin Interfaces

PostgreSQL Admin: http://localhost:8080 (pgAdmin) Redis Commander: http://localhost:8081

🛠️ Development

Local Development

# Start infrastructure only
docker-compose up postgres redis -d

# Run individual services locally
cd services/recorder && npm run dev
cd services/whisper-service && python src/api.py

Service Dependencies

Recorder → PostgreSQL, Redis
Audio Processor → Redis, PostgreSQL  
Whisper Service → Redis, PostgreSQL, GPU
Translator → Redis, PostgreSQL
Transcriber → Redis, PostgreSQL, Discord
Dashboard → PostgreSQL, Redis

Adding New Languages

  1. Add language code to NLLB_LANG_MAP in translator service
  2. Update LANGUAGE_INFO with display name and flag
  3. Modify PRIMARY_LANGUAGES if needed for default display

📦 Database Schema

Key Tables

  • connections: Voice channel sessions
  • recordings: Individual user audio recordings
  • transcriptions: Speech-to-text results
  • translations: Multi-language translations
  • processing_metrics: Performance tracking
  • user_activity: Aggregated user statistics

Data Flow

Connection → Recordings → Transcriptions → Translations

🔐 Security

  • No audio data stored permanently (auto-cleanup)
  • Local AI models for privacy
  • Secure credential management
  • Non-root container users
  • Network isolation between services

🚨 Troubleshooting

Common Issues

GPU Not Detected:

# Check NVIDIA runtime
docker run --rm --gpus all nvidia/cuda:11.8-base nvidia-smi

Discord Bot Not Responding:

  • Verify bot token and permissions
  • Check guild ID is correct
  • Ensure bot is in voice channel

Translation Failures:

  • Check Google API key if using premium
  • Verify network connectivity
  • Local NLLB model may be downloading

Database Connection Issues:

  • Ensure PostgreSQL is running
  • Check connection string format
  • Verify network connectivity between containers

Service Logs

# View specific service logs
docker-compose logs recorder
docker-compose logs whisper-service
docker-compose logs translator

# Follow logs in real-time
docker-compose logs -f transcriber

📈 Performance

Expected Performance (RTX 4000)

  • Transcription: 2-3x real-time (faster than speech)
  • Translation: Sub-second for typical sentences
  • End-to-End Latency: 3-8 seconds from speech to Discord message
  • Concurrent Users: 5-10 simultaneous speakers
  • Languages: 200+ supported via NLLB

Resource Usage

  • GPU Memory: 3-4GB for Whisper + NLLB
  • System RAM: 8GB recommended
  • Storage: 10GB for models + database
  • Network: Minimal (local processing)

🤝 Contributing

  1. Fork the repository
  2. Create feature branch
  3. Add tests for new functionality
  4. Update documentation
  5. Submit pull request

📄 License

MIT License - see LICENSE file for details

🙏 Acknowledgments

  • OpenAI Whisper: Speech recognition models
  • Facebook NLLB: Neural machine translation
  • Discord.js: Discord API library
  • faster-whisper: Optimized Whisper inference
  • Docker Community: Containerization platform

Need help? Check the troubleshooting section or open an issue!

Description
Discord Voice Translation Bot - Microservices Architecture with GPU-accelerated transcription and multi-language translation
Readme 160 KiB
Languages
JavaScript 47.7%
Python 30%
Shell 9.8%
PLpgSQL 5.8%
Dockerfile 4.7%
Other 2%