c5de3e98c046da7c91e4a6c5ea3772a8124b2f0d
Discord Voice Translator V2 🎤🌍
A comprehensive microservices-based Discord bot that captures voice channel audio, transcribes it using GPU-accelerated Whisper, and provides real-time multi-language translation.
🚀 Features
- Real-time Voice Capture: Records individual user audio streams from Discord voice channels
- GPU-Accelerated Transcription: Uses faster-whisper with NVIDIA RTX 4000 for lightning-fast speech-to-text
- Multi-language Translation: Supports translation to English, German, Korean, and more
- Microservices Architecture: Scalable, maintainable design with independent services
- Web Dashboard: Real-time monitoring and historical analytics
- High Performance: Optimized for low-latency processing
🏗️ Architecture
Discord Voice → [Recorder] → Raw PCM → [Audio Processor] → Optimized WAV
↓
[Whisper Service] → Transcription → [Translator] → Multi-language
↓
[Transcriber] → Discord Messages + Database → [Dashboard]
Services Overview
Service | Purpose | Technology |
---|---|---|
Recorder | Discord bot + voice capture | Node.js, Discord.js |
Audio Processor | PCM to WAV conversion | Python, FFmpeg |
Whisper Service | GPU-accelerated transcription | Python, faster-whisper, CUDA |
Translator | Multi-language translation | Python, Google/DeepL APIs |
Transcriber | Workflow orchestration | Node.js, Message queues |
Dashboard | Web monitoring interface | Node.js, Express, WebSockets |
🛠️ Prerequisites
Hardware Requirements
- GPU: NVIDIA RTX 4000 (or compatible CUDA GPU)
- RAM: 8GB+ recommended
- Storage: 50GB+ for models and audio cache
Software Requirements
- Docker with Docker Compose
- NVIDIA Container Toolkit for GPU support
- Unraid (recommended) or Linux host
⚡ Quick Start
1. Clone the Repository
git clone <your-gitea-url>/discord-voice-translator-v2.git
cd discord-voice-translator-v2
2. Configure Environment
cp .env.example .env
# Edit .env with your API keys and settings
Required environment variables:
# Discord Bot
DISCORD_TOKEN=your_discord_bot_token
CLIENT_ID=your_bot_client_id
GUILD_ID=your_guild_id
# Database
POSTGRES_PASSWORD=your_secure_password
# Translation APIs
GOOGLE_TRANSLATE_API_KEY=your_google_api_key
DEEPL_API_KEY=your_deepl_api_key
3. Start Services
# Start core services
docker-compose up -d
# Start with admin interfaces
docker-compose --profile admin up -d
4. Verify GPU Support
# Check Whisper service GPU status
curl http://localhost:8001/health
📊 Service Endpoints
Service | Port | Health Check | Purpose |
---|---|---|---|
Whisper Service | 8001 | /health |
GPU transcription |
Translator | 8002 | /health |
Multi-language translation |
Dashboard | 3000 | / |
Web monitoring |
PostgreSQL | 5432 | - | Database |
Redis | 6379 | - | Message queue |
pgAdmin | 8080 | / |
Database admin |
Redis Commander | 8081 | / |
Redis admin |
🎮 Discord Commands
/join
- Join your voice channel and start recording/leave
- Leave voice channel and stop recording/status
- Show current recording status
📈 Performance Optimizations
GPU Configuration
- Model: faster-whisper large-v2 with CUDA
- Precision: float16 for optimal RTX 4000 performance
- Memory: ~2GB VRAM usage for transcription
- Speed: 2-3x real-time transcription
Audio Processing
- Input: Discord 48kHz stereo PCM
- Output: 16kHz mono WAV (optimized for Whisper)
- Conversion: FFmpeg with fallback to Python/scipy
- Cleanup: Automatic removal of processed PCM files
🗄️ Database Schema
- connections: Voice channel session tracking
- recordings: Individual user audio recordings
- transcriptions: Speech-to-text results with confidence scores
- translations: Multi-language translation results
- processing_metrics: Performance monitoring
- user_activity: Aggregated usage statistics
🔧 Development
Local Development
# Individual service development
cd services/recorder
npm install
npm run dev
# Audio processor
cd services/audio-processor
pip install -r requirements.txt
python src/processor.py
Testing GPU Support
# Verify NVIDIA Docker runtime
docker run --rm --gpus all nvidia/cuda:11.8-base-ubuntu22.04 nvidia-smi
# Test Whisper service
docker-compose exec whisper-service python -c "import torch; print(torch.cuda.is_available())"
📝 Logging
- Structured Logging: JSON format for all services
- Log Levels: Configurable via
LOG_LEVEL
environment variable - Centralized: All logs stored in
/data/logs/
- Monitoring: Real-time log viewing via dashboard
🔐 Security
- Environment Variables: Secure credential management
- Database: PostgreSQL with connection encryption
- Network: Internal Docker network isolation
- API Keys: Separate service credentials
🚀 Deployment
Unraid Setup
- Install Community Applications plugin
- Add Docker Compose plugin
- Configure GPU passthrough
- Import docker-compose.yml
Production Considerations
- SSL: Configure Nginx reverse proxy with SSL certificates
- Backup: Regular database and configuration backups
- Monitoring: Set up health check alerts
- Scaling: Horizontal scaling for high-traffic Discord servers
🛠️ Troubleshooting
Common Issues
GPU Not Detected
# Check NVIDIA drivers
nvidia-smi
# Verify Docker GPU support
docker run --rm --gpus all nvidia/cuda:11.8-base nvidia-smi
Service Won't Start
# Check logs
docker-compose logs [service-name]
# Verify environment variables
docker-compose config
Audio Processing Slow
- Verify GPU acceleration is working
- Check available VRAM
- Monitor CPU usage during conversion
📚 API Documentation
Once running, visit:
- Whisper API: http://localhost:8001/docs
- Translator API: http://localhost:8002/docs
- Dashboard: http://localhost:3000
🤝 Contributing
- Fork the repository
- Create feature branch:
git checkout -b feature/amazing-feature
- Commit changes:
git commit -m 'Add amazing feature'
- Push to branch:
git push origin feature/amazing-feature
- Open a Pull Request
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🎯 Roadmap
- Complete Translator service implementation
- Build Transcriber orchestration service
- Create web dashboard with real-time updates
- Add voice cloning capabilities
- Implement speaker diarization
- Support for additional languages
- Mobile app for remote monitoring
- Advanced analytics and insights
💡 Credits
Built with:
- faster-whisper for GPU-accelerated transcription
- Discord.js for Discord bot functionality
- FastAPI for high-performance APIs
- PostgreSQL for reliable data storage
- Redis for message queuing and caching
🚀 Ready to translate your Discord conversations in real-time with the power of your RTX 4000!
Description
Discord Voice Translation Bot - Microservices Architecture with GPU-accelerated transcription and multi-language translation
Languages
JavaScript
47.7%
Python
30%
Shell
9.8%
PLpgSQL
5.8%
Dockerfile
4.7%
Other
2%