# Discord Voice Translator V2 A comprehensive microservices-based Discord bot that captures voice, transcribes, translates, and posts formatted messages to text channels. Built with GPU acceleration and multi-language support. ## 🎯 Features - **Real-time Voice Capture**: Records Discord voice channel audio - **GPU-Accelerated Transcription**: Uses NVIDIA RTX 4000 with faster-whisper - **Multi-Language Translation**: Local NLLB models + Google Translate fallback - **Smart Language Logic**: Always shows English, German, and Korean translations - **Professional Discord Messages**: Formatted with speaker info, duration, and timestamps - **Microservices Architecture**: Scalable, maintainable, containerized services - **Comprehensive Monitoring**: Health checks, logging, and web dashboard ## 🏗️ Architecture ### Services Overview ``` Discord Voice → [Recorder] → Raw PCM → [Audio Processor] → WAV ↓ [Whisper Service] → Transcription → [Translator] → Multi-language ↓ [Transcriber] → Discord Message → [Dashboard] → Monitoring ``` ### Core Services 1. **Recorder**: Discord bot that captures voice channel audio 2. **Audio Processor**: Converts PCM to optimized WAV files 3. **Whisper Service**: GPU-accelerated speech-to-text transcription 4. **Translator**: Local NLLB + Google Translate multi-language translation 5. **Transcriber**: Workflow orchestrator and Discord message formatter 6. **Dashboard**: Web interface for monitoring and management ### Infrastructure - **PostgreSQL**: Primary database for all structured data - **Redis**: Message queue and caching layer - **Nginx**: Reverse proxy for web services - **Docker Compose**: Container orchestration with GPU support ## 🚀 Quick Start ### Prerequisites - Docker and Docker Compose - NVIDIA GPU with Docker runtime support - Discord Bot Token - Google Translate API Key (optional, for premium translations) ### 1. Clone Repository ```bash git clone cd discord-voice-translator-v2 ``` ### 2. Configure Environment ```bash cp .env.example .env # Edit .env with your configuration ``` ### 3. Required Environment Variables ```bash # Discord Bot Configuration DISCORD_TOKEN=your_discord_bot_token_here CLIENT_ID=your_bot_client_id_here GUILD_ID=your_guild_id_here # Database Configuration POSTGRES_PASSWORD=your_secure_postgres_password POSTGRES_URL=postgresql://postgres:your_secure_postgres_password@postgres:5432/voice_translator # Translation APIs (optional) GOOGLE_TRANSLATE_API_KEY=your_google_translate_api_key # GPU Configuration WHISPER_MODEL=large-v2 ``` ### 4. Deploy with Docker Compose ```bash # Start all services docker-compose up -d # View logs docker-compose logs -f # Start with admin interfaces docker-compose --profile admin up -d ``` ### 5. Discord Bot Setup 1. Invite bot to your server with permissions: - Connect to voice channels - Speak in voice channels - Send messages - View channels 2. Use slash commands: - `/join` - Start recording in your voice channel - `/leave` - Stop recording and leave channel - `/status` - Check current recording status ## 💬 Discord Message Format ### English Source Example ``` 🎤 New Transcription Speaker: mcgyvver 🌍 Language Detected: English 🇬🇧 ⏱️ Duration: 17.0 seconds 🕐 Time: 12 minutes ago 📝 Transcript Yeah, but this is a complete prank, we'll... Let's go to the foot. 🇩🇪 German Ja, aber das ist ein kompletter Streich, wir werden... Lass uns zum Fuß gehen. 🇰🇷 Korean 예, 하지만 이것은 완전한 장난입니다... 발로 가자. ``` ### Smart Language Logic - **If detected = English**: Shows transcript + German + Korean - **If detected = German**: Shows transcript + English + Korean - **If detected = Korean**: Shows transcript + English + German - **If detected = Other**: Shows transcript + English + German + Korean ## 🔧 Service Configuration ### GPU Requirements - NVIDIA GPU with 4GB+ VRAM (RTX 4000 recommended) - CUDA 11.8+ support - nvidia-docker runtime configured ### Model Storage - Whisper models cached in `./data/models/` - NLLB translation models auto-downloaded - Models persist between container restarts ### Performance Tuning **Whisper Models**: - `large-v2`: Best accuracy, ~2GB VRAM, slower - `medium`: Good balance, ~1GB VRAM, faster - `small`: Fastest, ~500MB VRAM, lower accuracy **Translation Strategy**: - Local NLLB: Fast, private, good quality - Google Translate: Premium quality, API costs - Automatic fallback for reliability ## 📊 Monitoring ### Health Checks All services include comprehensive health checks: ```bash # Check service status docker-compose ps # Individual service health curl http://localhost:8001/health # Whisper Service curl http://localhost:8002/health # Translation Service ``` ### Web Dashboard Access monitoring dashboard at: `http://localhost:3000` - Real-time transcription activity - Translation statistics - Service performance metrics - User activity summaries ### Admin Interfaces **PostgreSQL Admin**: `http://localhost:8080` (pgAdmin) **Redis Commander**: `http://localhost:8081` ## 🛠️ Development ### Local Development ```bash # Start infrastructure only docker-compose up postgres redis -d # Run individual services locally cd services/recorder && npm run dev cd services/whisper-service && python src/api.py ``` ### Service Dependencies ``` Recorder → PostgreSQL, Redis Audio Processor → Redis, PostgreSQL Whisper Service → Redis, PostgreSQL, GPU Translator → Redis, PostgreSQL Transcriber → Redis, PostgreSQL, Discord Dashboard → PostgreSQL, Redis ``` ### Adding New Languages 1. Add language code to `NLLB_LANG_MAP` in translator service 2. Update `LANGUAGE_INFO` with display name and flag 3. Modify `PRIMARY_LANGUAGES` if needed for default display ## 📦 Database Schema ### Key Tables - **connections**: Voice channel sessions - **recordings**: Individual user audio recordings - **transcriptions**: Speech-to-text results - **translations**: Multi-language translations - **processing_metrics**: Performance tracking - **user_activity**: Aggregated user statistics ### Data Flow ``` Connection → Recordings → Transcriptions → Translations ``` ## 🔐 Security - No audio data stored permanently (auto-cleanup) - Local AI models for privacy - Secure credential management - Non-root container users - Network isolation between services ## 🚨 Troubleshooting ### Common Issues **GPU Not Detected**: ```bash # Check NVIDIA runtime docker run --rm --gpus all nvidia/cuda:11.8-base nvidia-smi ``` **Discord Bot Not Responding**: - Verify bot token and permissions - Check guild ID is correct - Ensure bot is in voice channel **Translation Failures**: - Check Google API key if using premium - Verify network connectivity - Local NLLB model may be downloading **Database Connection Issues**: - Ensure PostgreSQL is running - Check connection string format - Verify network connectivity between containers ### Service Logs ```bash # View specific service logs docker-compose logs recorder docker-compose logs whisper-service docker-compose logs translator # Follow logs in real-time docker-compose logs -f transcriber ``` ## 📈 Performance ### Expected Performance (RTX 4000) - **Transcription**: 2-3x real-time (faster than speech) - **Translation**: Sub-second for typical sentences - **End-to-End Latency**: 3-8 seconds from speech to Discord message - **Concurrent Users**: 5-10 simultaneous speakers - **Languages**: 200+ supported via NLLB ### Resource Usage - **GPU Memory**: 3-4GB for Whisper + NLLB - **System RAM**: 8GB recommended - **Storage**: 10GB for models + database - **Network**: Minimal (local processing) ## 🤝 Contributing 1. Fork the repository 2. Create feature branch 3. Add tests for new functionality 4. Update documentation 5. Submit pull request ## 📄 License MIT License - see LICENSE file for details ## 🙏 Acknowledgments - **OpenAI Whisper**: Speech recognition models - **Facebook NLLB**: Neural machine translation - **Discord.js**: Discord API library - **faster-whisper**: Optimized Whisper inference - **Docker Community**: Containerization platform --- **Need help?** Check the troubleshooting section or open an issue!