…
…
Discord Voice Translator V2
A comprehensive microservices-based Discord bot that captures voice, transcribes, translates, and posts formatted messages to text channels. Built with GPU acceleration and multi-language support.
🎯 Features
- Real-time Voice Capture: Records Discord voice channel audio
- GPU-Accelerated Transcription: Uses NVIDIA RTX 4000 with faster-whisper
- Multi-Language Translation: Local NLLB models + Google Translate fallback
- Smart Language Logic: Always shows English, German, and Korean translations
- Professional Discord Messages: Formatted with speaker info, duration, and timestamps
- Microservices Architecture: Scalable, maintainable, containerized services
- Comprehensive Monitoring: Health checks, logging, and web dashboard
🏗️ Architecture
Services Overview
Discord Voice → [Recorder] → Raw PCM → [Audio Processor] → WAV
↓
[Whisper Service] → Transcription → [Translator] → Multi-language
↓
[Transcriber] → Discord Message → [Dashboard] → Monitoring
Core Services
- Recorder: Discord bot that captures voice channel audio
- Audio Processor: Converts PCM to optimized WAV files
- Whisper Service: GPU-accelerated speech-to-text transcription
- Translator: Local NLLB + Google Translate multi-language translation
- Transcriber: Workflow orchestrator and Discord message formatter
- Dashboard: Web interface for monitoring and management
Infrastructure
- PostgreSQL: Primary database for all structured data
- Redis: Message queue and caching layer
- Nginx: Reverse proxy for web services
- Docker Compose: Container orchestration with GPU support
🚀 Quick Start
Prerequisites
- Docker and Docker Compose
- NVIDIA GPU with Docker runtime support
- Discord Bot Token
- Google Translate API Key (optional, for premium translations)
1. Clone Repository
git clone <your-repo-url>
cd discord-voice-translator-v2
2. Configure Environment
cp .env.example .env
# Edit .env with your configuration
3. Required Environment Variables
# Discord Bot Configuration
DISCORD_TOKEN=your_discord_bot_token_here
CLIENT_ID=your_bot_client_id_here
GUILD_ID=your_guild_id_here
# Database Configuration
POSTGRES_PASSWORD=your_secure_postgres_password
POSTGRES_URL=postgresql://postgres:your_secure_postgres_password@postgres:5432/voice_translator
# Translation APIs (optional)
GOOGLE_TRANSLATE_API_KEY=your_google_translate_api_key
# GPU Configuration
WHISPER_MODEL=large-v2
4. Deploy with Docker Compose
# Start all services
docker-compose up -d
# View logs
docker-compose logs -f
# Start with admin interfaces
docker-compose --profile admin up -d
5. Discord Bot Setup
-
Invite bot to your server with permissions:
- Connect to voice channels
- Speak in voice channels
- Send messages
- View channels
-
Use slash commands:
/join
- Start recording in your voice channel/leave
- Stop recording and leave channel/status
- Check current recording status
💬 Discord Message Format
English Source Example
🎤 New Transcription
Speaker: mcgyvver
🌍 Language Detected: English 🇬🇧
⏱️ Duration: 17.0 seconds
🕐 Time: 12 minutes ago
📝 Transcript
Yeah, but this is a complete prank, we'll...
Let's go to the foot.
🇩🇪 German
Ja, aber das ist ein kompletter Streich, wir werden...
Lass uns zum Fuß gehen.
🇰🇷 Korean
예, 하지만 이것은 완전한 장난입니다...
발로 가자.
Smart Language Logic
- If detected = English: Shows transcript + German + Korean
- If detected = German: Shows transcript + English + Korean
- If detected = Korean: Shows transcript + English + German
- If detected = Other: Shows transcript + English + German + Korean
🔧 Service Configuration
GPU Requirements
- NVIDIA GPU with 4GB+ VRAM (RTX 4000 recommended)
- CUDA 11.8+ support
- nvidia-docker runtime configured
Model Storage
- Whisper models cached in
./data/models/
- NLLB translation models auto-downloaded
- Models persist between container restarts
Performance Tuning
Whisper Models:
large-v2
: Best accuracy, ~2GB VRAM, slowermedium
: Good balance, ~1GB VRAM, fastersmall
: Fastest, ~500MB VRAM, lower accuracy
Translation Strategy:
- Local NLLB: Fast, private, good quality
- Google Translate: Premium quality, API costs
- Automatic fallback for reliability
📊 Monitoring
Health Checks
All services include comprehensive health checks:
# Check service status
docker-compose ps
# Individual service health
curl http://localhost:8001/health # Whisper Service
curl http://localhost:8002/health # Translation Service
Web Dashboard
Access monitoring dashboard at: http://localhost:3000
- Real-time transcription activity
- Translation statistics
- Service performance metrics
- User activity summaries
Admin Interfaces
PostgreSQL Admin: http://localhost:8080
(pgAdmin)
Redis Commander: http://localhost:8081
🛠️ Development
Local Development
# Start infrastructure only
docker-compose up postgres redis -d
# Run individual services locally
cd services/recorder && npm run dev
cd services/whisper-service && python src/api.py
Service Dependencies
Recorder → PostgreSQL, Redis
Audio Processor → Redis, PostgreSQL
Whisper Service → Redis, PostgreSQL, GPU
Translator → Redis, PostgreSQL
Transcriber → Redis, PostgreSQL, Discord
Dashboard → PostgreSQL, Redis
Adding New Languages
- Add language code to
NLLB_LANG_MAP
in translator service - Update
LANGUAGE_INFO
with display name and flag - Modify
PRIMARY_LANGUAGES
if needed for default display
📦 Database Schema
Key Tables
- connections: Voice channel sessions
- recordings: Individual user audio recordings
- transcriptions: Speech-to-text results
- translations: Multi-language translations
- processing_metrics: Performance tracking
- user_activity: Aggregated user statistics
Data Flow
Connection → Recordings → Transcriptions → Translations
🔐 Security
- No audio data stored permanently (auto-cleanup)
- Local AI models for privacy
- Secure credential management
- Non-root container users
- Network isolation between services
🚨 Troubleshooting
Common Issues
GPU Not Detected:
# Check NVIDIA runtime
docker run --rm --gpus all nvidia/cuda:11.8-base nvidia-smi
Discord Bot Not Responding:
- Verify bot token and permissions
- Check guild ID is correct
- Ensure bot is in voice channel
Translation Failures:
- Check Google API key if using premium
- Verify network connectivity
- Local NLLB model may be downloading
Database Connection Issues:
- Ensure PostgreSQL is running
- Check connection string format
- Verify network connectivity between containers
Service Logs
# View specific service logs
docker-compose logs recorder
docker-compose logs whisper-service
docker-compose logs translator
# Follow logs in real-time
docker-compose logs -f transcriber
📈 Performance
Expected Performance (RTX 4000)
- Transcription: 2-3x real-time (faster than speech)
- Translation: Sub-second for typical sentences
- End-to-End Latency: 3-8 seconds from speech to Discord message
- Concurrent Users: 5-10 simultaneous speakers
- Languages: 200+ supported via NLLB
Resource Usage
- GPU Memory: 3-4GB for Whisper + NLLB
- System RAM: 8GB recommended
- Storage: 10GB for models + database
- Network: Minimal (local processing)
🤝 Contributing
- Fork the repository
- Create feature branch
- Add tests for new functionality
- Update documentation
- Submit pull request
📄 License
MIT License - see LICENSE file for details
🙏 Acknowledgments
- OpenAI Whisper: Speech recognition models
- Facebook NLLB: Neural machine translation
- Discord.js: Discord API library
- faster-whisper: Optimized Whisper inference
- Docker Community: Containerization platform
Need help? Check the troubleshooting section or open an issue!
Description
Discord Voice Translation Bot - Microservices Architecture with GPU-accelerated transcription and multi-language translation
Languages
JavaScript
47.7%
Python
30%
Shell
9.8%
PLpgSQL
5.8%
Dockerfile
4.7%
Other
2%