# Discord Voice Translator V2 🎤🌍 A comprehensive microservices-based Discord bot that captures voice channel audio, transcribes it using GPU-accelerated Whisper, and provides real-time multi-language translation. ## 🚀 Features - **Real-time Voice Capture**: Records individual user audio streams from Discord voice channels - **GPU-Accelerated Transcription**: Uses faster-whisper with NVIDIA RTX 4000 for lightning-fast speech-to-text - **Multi-language Translation**: Supports translation to English, German, Korean, and more - **Microservices Architecture**: Scalable, maintainable design with independent services - **Web Dashboard**: Real-time monitoring and historical analytics - **High Performance**: Optimized for low-latency processing ## 🏗️ Architecture ``` Discord Voice → [Recorder] → Raw PCM → [Audio Processor] → Optimized WAV ↓ [Whisper Service] → Transcription → [Translator] → Multi-language ↓ [Transcriber] → Discord Messages + Database → [Dashboard] ``` ### Services Overview | Service | Purpose | Technology | |---------|---------|------------| | **Recorder** | Discord bot + voice capture | Node.js, Discord.js | | **Audio Processor** | PCM to WAV conversion | Python, FFmpeg | | **Whisper Service** | GPU-accelerated transcription | Python, faster-whisper, CUDA | | **Translator** | Multi-language translation | Python, Google/DeepL APIs | | **Transcriber** | Workflow orchestration | Node.js, Message queues | | **Dashboard** | Web monitoring interface | Node.js, Express, WebSockets | ## 🛠️ Prerequisites ### Hardware Requirements - **GPU**: NVIDIA RTX 4000 (or compatible CUDA GPU) - **RAM**: 8GB+ recommended - **Storage**: 50GB+ for models and audio cache ### Software Requirements - **Docker** with Docker Compose - **NVIDIA Container Toolkit** for GPU support - **Unraid** (recommended) or Linux host ## ⚡ Quick Start ### 1. Clone the Repository ```bash git clone /discord-voice-translator-v2.git cd discord-voice-translator-v2 ``` ### 2. Configure Environment ```bash cp .env.example .env # Edit .env with your API keys and settings ``` Required environment variables: ```bash # Discord Bot DISCORD_TOKEN=your_discord_bot_token CLIENT_ID=your_bot_client_id GUILD_ID=your_guild_id # Database POSTGRES_PASSWORD=your_secure_password # Translation APIs GOOGLE_TRANSLATE_API_KEY=your_google_api_key DEEPL_API_KEY=your_deepl_api_key ``` ### 3. Start Services ```bash # Start core services docker-compose up -d # Start with admin interfaces docker-compose --profile admin up -d ``` ### 4. Verify GPU Support ```bash # Check Whisper service GPU status curl http://localhost:8001/health ``` ## 📊 Service Endpoints | Service | Port | Health Check | Purpose | |---------|------|--------------|---------| | Whisper Service | 8001 | `/health` | GPU transcription | | Translator | 8002 | `/health` | Multi-language translation | | Dashboard | 3000 | `/` | Web monitoring | | PostgreSQL | 5432 | - | Database | | Redis | 6379 | - | Message queue | | pgAdmin | 8080 | `/` | Database admin | | Redis Commander | 8081 | `/` | Redis admin | ## 🎮 Discord Commands - `/join` - Join your voice channel and start recording - `/leave` - Leave voice channel and stop recording - `/status` - Show current recording status ## 📈 Performance Optimizations ### GPU Configuration - **Model**: faster-whisper large-v2 with CUDA - **Precision**: float16 for optimal RTX 4000 performance - **Memory**: ~2GB VRAM usage for transcription - **Speed**: 2-3x real-time transcription ### Audio Processing - **Input**: Discord 48kHz stereo PCM - **Output**: 16kHz mono WAV (optimized for Whisper) - **Conversion**: FFmpeg with fallback to Python/scipy - **Cleanup**: Automatic removal of processed PCM files ## 🗄️ Database Schema - **connections**: Voice channel session tracking - **recordings**: Individual user audio recordings - **transcriptions**: Speech-to-text results with confidence scores - **translations**: Multi-language translation results - **processing_metrics**: Performance monitoring - **user_activity**: Aggregated usage statistics ## 🔧 Development ### Local Development ```bash # Individual service development cd services/recorder npm install npm run dev # Audio processor cd services/audio-processor pip install -r requirements.txt python src/processor.py ``` ### Testing GPU Support ```bash # Verify NVIDIA Docker runtime docker run --rm --gpus all nvidia/cuda:11.8-base-ubuntu22.04 nvidia-smi # Test Whisper service docker-compose exec whisper-service python -c "import torch; print(torch.cuda.is_available())" ``` ## 📝 Logging - **Structured Logging**: JSON format for all services - **Log Levels**: Configurable via `LOG_LEVEL` environment variable - **Centralized**: All logs stored in `/data/logs/` - **Monitoring**: Real-time log viewing via dashboard ## 🔐 Security - **Environment Variables**: Secure credential management - **Database**: PostgreSQL with connection encryption - **Network**: Internal Docker network isolation - **API Keys**: Separate service credentials ## 🚀 Deployment ### Unraid Setup 1. Install Community Applications plugin 2. Add Docker Compose plugin 3. Configure GPU passthrough 4. Import docker-compose.yml ### Production Considerations - **SSL**: Configure Nginx reverse proxy with SSL certificates - **Backup**: Regular database and configuration backups - **Monitoring**: Set up health check alerts - **Scaling**: Horizontal scaling for high-traffic Discord servers ## 🛠️ Troubleshooting ### Common Issues **GPU Not Detected** ```bash # Check NVIDIA drivers nvidia-smi # Verify Docker GPU support docker run --rm --gpus all nvidia/cuda:11.8-base nvidia-smi ``` **Service Won't Start** ```bash # Check logs docker-compose logs [service-name] # Verify environment variables docker-compose config ``` **Audio Processing Slow** - Verify GPU acceleration is working - Check available VRAM - Monitor CPU usage during conversion ## 📚 API Documentation Once running, visit: - **Whisper API**: http://localhost:8001/docs - **Translator API**: http://localhost:8002/docs - **Dashboard**: http://localhost:3000 ## 🤝 Contributing 1. Fork the repository 2. Create feature branch: `git checkout -b feature/amazing-feature` 3. Commit changes: `git commit -m 'Add amazing feature'` 4. Push to branch: `git push origin feature/amazing-feature` 5. Open a Pull Request ## 📄 License This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. ## 🎯 Roadmap - [ ] Complete Translator service implementation - [ ] Build Transcriber orchestration service - [ ] Create web dashboard with real-time updates - [ ] Add voice cloning capabilities - [ ] Implement speaker diarization - [ ] Support for additional languages - [ ] Mobile app for remote monitoring - [ ] Advanced analytics and insights ## 💡 Credits Built with: - [faster-whisper](https://github.com/guillaumekln/faster-whisper) for GPU-accelerated transcription - [Discord.js](https://discord.js.org/) for Discord bot functionality - [FastAPI](https://fastapi.tiangolo.com/) for high-performance APIs - [PostgreSQL](https://www.postgresql.org/) for reliable data storage - [Redis](https://redis.io/) for message queuing and caching --- **🚀 Ready to translate your Discord conversations in real-time with the power of your RTX 4000!**