Discord Voice Translator V2 🎤🌍

A comprehensive microservices-based Discord bot that captures voice channel audio, transcribes it using GPU-accelerated Whisper, and provides real-time multi-language translation.

🚀 Features

  • Real-time Voice Capture: Records individual user audio streams from Discord voice channels
  • GPU-Accelerated Transcription: Uses faster-whisper with NVIDIA RTX 4000 for lightning-fast speech-to-text
  • Multi-language Translation: Supports translation to English, German, Korean, and more
  • Microservices Architecture: Scalable, maintainable design with independent services
  • Web Dashboard: Real-time monitoring and historical analytics
  • High Performance: Optimized for low-latency processing

🏗️ Architecture

Discord Voice → [Recorder] → Raw PCM → [Audio Processor] → Optimized WAV
    ↓
[Whisper Service] → Transcription → [Translator] → Multi-language
    ↓
[Transcriber] → Discord Messages + Database → [Dashboard]

Services Overview

Service Purpose Technology
Recorder Discord bot + voice capture Node.js, Discord.js
Audio Processor PCM to WAV conversion Python, FFmpeg
Whisper Service GPU-accelerated transcription Python, faster-whisper, CUDA
Translator Multi-language translation Python, Google/DeepL APIs
Transcriber Workflow orchestration Node.js, Message queues
Dashboard Web monitoring interface Node.js, Express, WebSockets

🛠️ Prerequisites

Hardware Requirements

  • GPU: NVIDIA RTX 4000 (or compatible CUDA GPU)
  • RAM: 8GB+ recommended
  • Storage: 50GB+ for models and audio cache

Software Requirements

  • Docker with Docker Compose
  • NVIDIA Container Toolkit for GPU support
  • Unraid (recommended) or Linux host

Quick Start

1. Clone the Repository

git clone <your-gitea-url>/discord-voice-translator-v2.git
cd discord-voice-translator-v2

2. Configure Environment

cp .env.example .env
# Edit .env with your API keys and settings

Required environment variables:

# Discord Bot
DISCORD_TOKEN=your_discord_bot_token
CLIENT_ID=your_bot_client_id
GUILD_ID=your_guild_id

# Database
POSTGRES_PASSWORD=your_secure_password

# Translation APIs
GOOGLE_TRANSLATE_API_KEY=your_google_api_key
DEEPL_API_KEY=your_deepl_api_key

3. Start Services

# Start core services
docker-compose up -d

# Start with admin interfaces
docker-compose --profile admin up -d

4. Verify GPU Support

# Check Whisper service GPU status
curl http://localhost:8001/health

📊 Service Endpoints

Service Port Health Check Purpose
Whisper Service 8001 /health GPU transcription
Translator 8002 /health Multi-language translation
Dashboard 3000 / Web monitoring
PostgreSQL 5432 - Database
Redis 6379 - Message queue
pgAdmin 8080 / Database admin
Redis Commander 8081 / Redis admin

🎮 Discord Commands

  • /join - Join your voice channel and start recording
  • /leave - Leave voice channel and stop recording
  • /status - Show current recording status

📈 Performance Optimizations

GPU Configuration

  • Model: faster-whisper large-v2 with CUDA
  • Precision: float16 for optimal RTX 4000 performance
  • Memory: ~2GB VRAM usage for transcription
  • Speed: 2-3x real-time transcription

Audio Processing

  • Input: Discord 48kHz stereo PCM
  • Output: 16kHz mono WAV (optimized for Whisper)
  • Conversion: FFmpeg with fallback to Python/scipy
  • Cleanup: Automatic removal of processed PCM files

🗄️ Database Schema

  • connections: Voice channel session tracking
  • recordings: Individual user audio recordings
  • transcriptions: Speech-to-text results with confidence scores
  • translations: Multi-language translation results
  • processing_metrics: Performance monitoring
  • user_activity: Aggregated usage statistics

🔧 Development

Local Development

# Individual service development
cd services/recorder
npm install
npm run dev

# Audio processor
cd services/audio-processor  
pip install -r requirements.txt
python src/processor.py

Testing GPU Support

# Verify NVIDIA Docker runtime
docker run --rm --gpus all nvidia/cuda:11.8-base-ubuntu22.04 nvidia-smi

# Test Whisper service
docker-compose exec whisper-service python -c "import torch; print(torch.cuda.is_available())"

📝 Logging

  • Structured Logging: JSON format for all services
  • Log Levels: Configurable via LOG_LEVEL environment variable
  • Centralized: All logs stored in /data/logs/
  • Monitoring: Real-time log viewing via dashboard

🔐 Security

  • Environment Variables: Secure credential management
  • Database: PostgreSQL with connection encryption
  • Network: Internal Docker network isolation
  • API Keys: Separate service credentials

🚀 Deployment

Unraid Setup

  1. Install Community Applications plugin
  2. Add Docker Compose plugin
  3. Configure GPU passthrough
  4. Import docker-compose.yml

Production Considerations

  • SSL: Configure Nginx reverse proxy with SSL certificates
  • Backup: Regular database and configuration backups
  • Monitoring: Set up health check alerts
  • Scaling: Horizontal scaling for high-traffic Discord servers

🛠️ Troubleshooting

Common Issues

GPU Not Detected

# Check NVIDIA drivers
nvidia-smi

# Verify Docker GPU support
docker run --rm --gpus all nvidia/cuda:11.8-base nvidia-smi

Service Won't Start

# Check logs
docker-compose logs [service-name]

# Verify environment variables
docker-compose config

Audio Processing Slow

  • Verify GPU acceleration is working
  • Check available VRAM
  • Monitor CPU usage during conversion

📚 API Documentation

Once running, visit:

🤝 Contributing

  1. Fork the repository
  2. Create feature branch: git checkout -b feature/amazing-feature
  3. Commit changes: git commit -m 'Add amazing feature'
  4. Push to branch: git push origin feature/amazing-feature
  5. Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🎯 Roadmap

  • Complete Translator service implementation
  • Build Transcriber orchestration service
  • Create web dashboard with real-time updates
  • Add voice cloning capabilities
  • Implement speaker diarization
  • Support for additional languages
  • Mobile app for remote monitoring
  • Advanced analytics and insights

💡 Credits

Built with:


🚀 Ready to translate your Discord conversations in real-time with the power of your RTX 4000!

Description
Discord Voice Translation Bot - Microservices Architecture with GPU-accelerated transcription and multi-language translation
Readme 160 KiB
Languages
JavaScript 47.7%
Python 30%
Shell 9.8%
PLpgSQL 5.8%
Dockerfile 4.7%
Other 2%