328 lines
8.1 KiB
Markdown
328 lines
8.1 KiB
Markdown
# Discord Voice Translator V2
|
|
|
|
A comprehensive microservices-based Discord bot that captures voice, transcribes, translates, and posts formatted messages to text channels. Built with GPU acceleration and multi-language support.
|
|
|
|
## 🎯 Features
|
|
|
|
- **Real-time Voice Capture**: Records Discord voice channel audio
|
|
- **GPU-Accelerated Transcription**: Uses NVIDIA RTX 4000 with faster-whisper
|
|
- **Multi-Language Translation**: Local NLLB models + Google Translate fallback
|
|
- **Smart Language Logic**: Always shows English, German, and Korean translations
|
|
- **Professional Discord Messages**: Formatted with speaker info, duration, and timestamps
|
|
- **Microservices Architecture**: Scalable, maintainable, containerized services
|
|
- **Comprehensive Monitoring**: Health checks, logging, and web dashboard
|
|
|
|
## 🏗️ Architecture
|
|
|
|
### Services Overview
|
|
|
|
```
|
|
Discord Voice → [Recorder] → Raw PCM → [Audio Processor] → WAV
|
|
↓
|
|
[Whisper Service] → Transcription → [Translator] → Multi-language
|
|
↓
|
|
[Transcriber] → Discord Message → [Dashboard] → Monitoring
|
|
```
|
|
|
|
### Core Services
|
|
|
|
1. **Recorder**: Discord bot that captures voice channel audio
|
|
2. **Audio Processor**: Converts PCM to optimized WAV files
|
|
3. **Whisper Service**: GPU-accelerated speech-to-text transcription
|
|
4. **Translator**: Local NLLB + Google Translate multi-language translation
|
|
5. **Transcriber**: Workflow orchestrator and Discord message formatter
|
|
6. **Dashboard**: Web interface for monitoring and management
|
|
|
|
### Infrastructure
|
|
|
|
- **PostgreSQL**: Primary database for all structured data
|
|
- **Redis**: Message queue and caching layer
|
|
- **Nginx**: Reverse proxy for web services
|
|
- **Docker Compose**: Container orchestration with GPU support
|
|
|
|
## 🚀 Quick Start
|
|
|
|
### Prerequisites
|
|
|
|
- Docker and Docker Compose
|
|
- NVIDIA GPU with Docker runtime support
|
|
- Discord Bot Token
|
|
- Google Translate API Key (optional, for premium translations)
|
|
|
|
### 1. Clone Repository
|
|
|
|
```bash
|
|
git clone <your-repo-url>
|
|
cd discord-voice-translator-v2
|
|
```
|
|
|
|
### 2. Configure Environment
|
|
|
|
```bash
|
|
cp .env.example .env
|
|
# Edit .env with your configuration
|
|
```
|
|
|
|
### 3. Required Environment Variables
|
|
|
|
```bash
|
|
# Discord Bot Configuration
|
|
DISCORD_TOKEN=your_discord_bot_token_here
|
|
CLIENT_ID=your_bot_client_id_here
|
|
GUILD_ID=your_guild_id_here
|
|
|
|
# Database Configuration
|
|
POSTGRES_PASSWORD=your_secure_postgres_password
|
|
POSTGRES_URL=postgresql://postgres:your_secure_postgres_password@postgres:5432/voice_translator
|
|
|
|
# Translation APIs (optional)
|
|
GOOGLE_TRANSLATE_API_KEY=your_google_translate_api_key
|
|
|
|
# GPU Configuration
|
|
WHISPER_MODEL=large-v2
|
|
```
|
|
|
|
### 4. Deploy with Docker Compose
|
|
|
|
```bash
|
|
# Start all services
|
|
docker-compose up -d
|
|
|
|
# View logs
|
|
docker-compose logs -f
|
|
|
|
# Start with admin interfaces
|
|
docker-compose --profile admin up -d
|
|
```
|
|
|
|
### 5. Discord Bot Setup
|
|
|
|
1. Invite bot to your server with permissions:
|
|
- Connect to voice channels
|
|
- Speak in voice channels
|
|
- Send messages
|
|
- View channels
|
|
|
|
2. Use slash commands:
|
|
- `/join` - Start recording in your voice channel
|
|
- `/leave` - Stop recording and leave channel
|
|
- `/status` - Check current recording status
|
|
|
|
## 💬 Discord Message Format
|
|
|
|
### English Source Example
|
|
```
|
|
🎤 New Transcription
|
|
Speaker: mcgyvver
|
|
🌍 Language Detected: English 🇬🇧
|
|
⏱️ Duration: 17.0 seconds
|
|
🕐 Time: 12 minutes ago
|
|
|
|
📝 Transcript
|
|
Yeah, but this is a complete prank, we'll...
|
|
Let's go to the foot.
|
|
|
|
🇩🇪 German
|
|
Ja, aber das ist ein kompletter Streich, wir werden...
|
|
Lass uns zum Fuß gehen.
|
|
|
|
🇰🇷 Korean
|
|
예, 하지만 이것은 완전한 장난입니다...
|
|
발로 가자.
|
|
```
|
|
|
|
### Smart Language Logic
|
|
|
|
- **If detected = English**: Shows transcript + German + Korean
|
|
- **If detected = German**: Shows transcript + English + Korean
|
|
- **If detected = Korean**: Shows transcript + English + German
|
|
- **If detected = Other**: Shows transcript + English + German + Korean
|
|
|
|
## 🔧 Service Configuration
|
|
|
|
### GPU Requirements
|
|
|
|
- NVIDIA GPU with 4GB+ VRAM (RTX 4000 recommended)
|
|
- CUDA 11.8+ support
|
|
- nvidia-docker runtime configured
|
|
|
|
### Model Storage
|
|
|
|
- Whisper models cached in `./data/models/`
|
|
- NLLB translation models auto-downloaded
|
|
- Models persist between container restarts
|
|
|
|
### Performance Tuning
|
|
|
|
**Whisper Models**:
|
|
- `large-v2`: Best accuracy, ~2GB VRAM, slower
|
|
- `medium`: Good balance, ~1GB VRAM, faster
|
|
- `small`: Fastest, ~500MB VRAM, lower accuracy
|
|
|
|
**Translation Strategy**:
|
|
- Local NLLB: Fast, private, good quality
|
|
- Google Translate: Premium quality, API costs
|
|
- Automatic fallback for reliability
|
|
|
|
## 📊 Monitoring
|
|
|
|
### Health Checks
|
|
|
|
All services include comprehensive health checks:
|
|
|
|
```bash
|
|
# Check service status
|
|
docker-compose ps
|
|
|
|
# Individual service health
|
|
curl http://localhost:8001/health # Whisper Service
|
|
curl http://localhost:8002/health # Translation Service
|
|
```
|
|
|
|
### Web Dashboard
|
|
|
|
Access monitoring dashboard at: `http://localhost:3000`
|
|
|
|
- Real-time transcription activity
|
|
- Translation statistics
|
|
- Service performance metrics
|
|
- User activity summaries
|
|
|
|
### Admin Interfaces
|
|
|
|
**PostgreSQL Admin**: `http://localhost:8080` (pgAdmin)
|
|
**Redis Commander**: `http://localhost:8081`
|
|
|
|
## 🛠️ Development
|
|
|
|
### Local Development
|
|
|
|
```bash
|
|
# Start infrastructure only
|
|
docker-compose up postgres redis -d
|
|
|
|
# Run individual services locally
|
|
cd services/recorder && npm run dev
|
|
cd services/whisper-service && python src/api.py
|
|
```
|
|
|
|
### Service Dependencies
|
|
|
|
```
|
|
Recorder → PostgreSQL, Redis
|
|
Audio Processor → Redis, PostgreSQL
|
|
Whisper Service → Redis, PostgreSQL, GPU
|
|
Translator → Redis, PostgreSQL
|
|
Transcriber → Redis, PostgreSQL, Discord
|
|
Dashboard → PostgreSQL, Redis
|
|
```
|
|
|
|
### Adding New Languages
|
|
|
|
1. Add language code to `NLLB_LANG_MAP` in translator service
|
|
2. Update `LANGUAGE_INFO` with display name and flag
|
|
3. Modify `PRIMARY_LANGUAGES` if needed for default display
|
|
|
|
## 📦 Database Schema
|
|
|
|
### Key Tables
|
|
|
|
- **connections**: Voice channel sessions
|
|
- **recordings**: Individual user audio recordings
|
|
- **transcriptions**: Speech-to-text results
|
|
- **translations**: Multi-language translations
|
|
- **processing_metrics**: Performance tracking
|
|
- **user_activity**: Aggregated user statistics
|
|
|
|
### Data Flow
|
|
|
|
```
|
|
Connection → Recordings → Transcriptions → Translations
|
|
```
|
|
|
|
## 🔐 Security
|
|
|
|
- No audio data stored permanently (auto-cleanup)
|
|
- Local AI models for privacy
|
|
- Secure credential management
|
|
- Non-root container users
|
|
- Network isolation between services
|
|
|
|
## 🚨 Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
**GPU Not Detected**:
|
|
```bash
|
|
# Check NVIDIA runtime
|
|
docker run --rm --gpus all nvidia/cuda:11.8-base nvidia-smi
|
|
```
|
|
|
|
**Discord Bot Not Responding**:
|
|
- Verify bot token and permissions
|
|
- Check guild ID is correct
|
|
- Ensure bot is in voice channel
|
|
|
|
**Translation Failures**:
|
|
- Check Google API key if using premium
|
|
- Verify network connectivity
|
|
- Local NLLB model may be downloading
|
|
|
|
**Database Connection Issues**:
|
|
- Ensure PostgreSQL is running
|
|
- Check connection string format
|
|
- Verify network connectivity between containers
|
|
|
|
### Service Logs
|
|
|
|
```bash
|
|
# View specific service logs
|
|
docker-compose logs recorder
|
|
docker-compose logs whisper-service
|
|
docker-compose logs translator
|
|
|
|
# Follow logs in real-time
|
|
docker-compose logs -f transcriber
|
|
```
|
|
|
|
## 📈 Performance
|
|
|
|
### Expected Performance (RTX 4000)
|
|
|
|
- **Transcription**: 2-3x real-time (faster than speech)
|
|
- **Translation**: Sub-second for typical sentences
|
|
- **End-to-End Latency**: 3-8 seconds from speech to Discord message
|
|
- **Concurrent Users**: 5-10 simultaneous speakers
|
|
- **Languages**: 200+ supported via NLLB
|
|
|
|
### Resource Usage
|
|
|
|
- **GPU Memory**: 3-4GB for Whisper + NLLB
|
|
- **System RAM**: 8GB recommended
|
|
- **Storage**: 10GB for models + database
|
|
- **Network**: Minimal (local processing)
|
|
|
|
## 🤝 Contributing
|
|
|
|
1. Fork the repository
|
|
2. Create feature branch
|
|
3. Add tests for new functionality
|
|
4. Update documentation
|
|
5. Submit pull request
|
|
|
|
## 📄 License
|
|
|
|
MIT License - see LICENSE file for details
|
|
|
|
## 🙏 Acknowledgments
|
|
|
|
- **OpenAI Whisper**: Speech recognition models
|
|
- **Facebook NLLB**: Neural machine translation
|
|
- **Discord.js**: Discord API library
|
|
- **faster-whisper**: Optimized Whisper inference
|
|
- **Docker Community**: Containerization platform
|
|
|
|
---
|
|
|
|
**Need help?** Check the troubleshooting section or open an issue!
|