Update README with comprehensive documentation
This commit is contained in:
405
README.md
405
README.md
@ -1,252 +1,327 @@
|
||||
# Discord Voice Translator V2 🎤🌍
|
||||
# Discord Voice Translator V2
|
||||
|
||||
A comprehensive microservices-based Discord bot that captures voice channel audio, transcribes it using GPU-accelerated Whisper, and provides real-time multi-language translation.
|
||||
A comprehensive microservices-based Discord bot that captures voice, transcribes, translates, and posts formatted messages to text channels. Built with GPU acceleration and multi-language support.
|
||||
|
||||
## 🚀 Features
|
||||
## 🎯 Features
|
||||
|
||||
- **Real-time Voice Capture**: Records individual user audio streams from Discord voice channels
|
||||
- **GPU-Accelerated Transcription**: Uses faster-whisper with NVIDIA RTX 4000 for lightning-fast speech-to-text
|
||||
- **Multi-language Translation**: Supports translation to English, German, Korean, and more
|
||||
- **Microservices Architecture**: Scalable, maintainable design with independent services
|
||||
- **Web Dashboard**: Real-time monitoring and historical analytics
|
||||
- **High Performance**: Optimized for low-latency processing
|
||||
- **Real-time Voice Capture**: Records Discord voice channel audio
|
||||
- **GPU-Accelerated Transcription**: Uses NVIDIA RTX 4000 with faster-whisper
|
||||
- **Multi-Language Translation**: Local NLLB models + Google Translate fallback
|
||||
- **Smart Language Logic**: Always shows English, German, and Korean translations
|
||||
- **Professional Discord Messages**: Formatted with speaker info, duration, and timestamps
|
||||
- **Microservices Architecture**: Scalable, maintainable, containerized services
|
||||
- **Comprehensive Monitoring**: Health checks, logging, and web dashboard
|
||||
|
||||
## 🏗️ Architecture
|
||||
|
||||
### Services Overview
|
||||
|
||||
```
|
||||
Discord Voice → [Recorder] → Raw PCM → [Audio Processor] → Optimized WAV
|
||||
Discord Voice → [Recorder] → Raw PCM → [Audio Processor] → WAV
|
||||
↓
|
||||
[Whisper Service] → Transcription → [Translator] → Multi-language
|
||||
↓
|
||||
[Transcriber] → Discord Messages + Database → [Dashboard]
|
||||
[Transcriber] → Discord Message → [Dashboard] → Monitoring
|
||||
```
|
||||
|
||||
### Services Overview
|
||||
### Core Services
|
||||
|
||||
| Service | Purpose | Technology |
|
||||
|---------|---------|------------|
|
||||
| **Recorder** | Discord bot + voice capture | Node.js, Discord.js |
|
||||
| **Audio Processor** | PCM to WAV conversion | Python, FFmpeg |
|
||||
| **Whisper Service** | GPU-accelerated transcription | Python, faster-whisper, CUDA |
|
||||
| **Translator** | Multi-language translation | Python, Google/DeepL APIs |
|
||||
| **Transcriber** | Workflow orchestration | Node.js, Message queues |
|
||||
| **Dashboard** | Web monitoring interface | Node.js, Express, WebSockets |
|
||||
1. **Recorder**: Discord bot that captures voice channel audio
|
||||
2. **Audio Processor**: Converts PCM to optimized WAV files
|
||||
3. **Whisper Service**: GPU-accelerated speech-to-text transcription
|
||||
4. **Translator**: Local NLLB + Google Translate multi-language translation
|
||||
5. **Transcriber**: Workflow orchestrator and Discord message formatter
|
||||
6. **Dashboard**: Web interface for monitoring and management
|
||||
|
||||
## 🛠️ Prerequisites
|
||||
### Infrastructure
|
||||
|
||||
### Hardware Requirements
|
||||
- **GPU**: NVIDIA RTX 4000 (or compatible CUDA GPU)
|
||||
- **RAM**: 8GB+ recommended
|
||||
- **Storage**: 50GB+ for models and audio cache
|
||||
- **PostgreSQL**: Primary database for all structured data
|
||||
- **Redis**: Message queue and caching layer
|
||||
- **Nginx**: Reverse proxy for web services
|
||||
- **Docker Compose**: Container orchestration with GPU support
|
||||
|
||||
### Software Requirements
|
||||
- **Docker** with Docker Compose
|
||||
- **NVIDIA Container Toolkit** for GPU support
|
||||
- **Unraid** (recommended) or Linux host
|
||||
## 🚀 Quick Start
|
||||
|
||||
## ⚡ Quick Start
|
||||
### Prerequisites
|
||||
|
||||
- Docker and Docker Compose
|
||||
- NVIDIA GPU with Docker runtime support
|
||||
- Discord Bot Token
|
||||
- Google Translate API Key (optional, for premium translations)
|
||||
|
||||
### 1. Clone Repository
|
||||
|
||||
### 1. Clone the Repository
|
||||
```bash
|
||||
git clone <your-gitea-url>/discord-voice-translator-v2.git
|
||||
git clone <your-repo-url>
|
||||
cd discord-voice-translator-v2
|
||||
```
|
||||
|
||||
### 2. Configure Environment
|
||||
|
||||
```bash
|
||||
cp .env.example .env
|
||||
# Edit .env with your API keys and settings
|
||||
# Edit .env with your configuration
|
||||
```
|
||||
|
||||
Required environment variables:
|
||||
### 3. Required Environment Variables
|
||||
|
||||
```bash
|
||||
# Discord Bot
|
||||
DISCORD_TOKEN=your_discord_bot_token
|
||||
CLIENT_ID=your_bot_client_id
|
||||
GUILD_ID=your_guild_id
|
||||
# Discord Bot Configuration
|
||||
DISCORD_TOKEN=your_discord_bot_token_here
|
||||
CLIENT_ID=your_bot_client_id_here
|
||||
GUILD_ID=your_guild_id_here
|
||||
|
||||
# Database
|
||||
POSTGRES_PASSWORD=your_secure_password
|
||||
# Database Configuration
|
||||
POSTGRES_PASSWORD=your_secure_postgres_password
|
||||
POSTGRES_URL=postgresql://postgres:your_secure_postgres_password@postgres:5432/voice_translator
|
||||
|
||||
# Translation APIs
|
||||
GOOGLE_TRANSLATE_API_KEY=your_google_api_key
|
||||
DEEPL_API_KEY=your_deepl_api_key
|
||||
# Translation APIs (optional)
|
||||
GOOGLE_TRANSLATE_API_KEY=your_google_translate_api_key
|
||||
|
||||
# GPU Configuration
|
||||
WHISPER_MODEL=large-v2
|
||||
```
|
||||
|
||||
### 3. Start Services
|
||||
### 4. Deploy with Docker Compose
|
||||
|
||||
```bash
|
||||
# Start core services
|
||||
# Start all services
|
||||
docker-compose up -d
|
||||
|
||||
# View logs
|
||||
docker-compose logs -f
|
||||
|
||||
# Start with admin interfaces
|
||||
docker-compose --profile admin up -d
|
||||
```
|
||||
|
||||
### 4. Verify GPU Support
|
||||
```bash
|
||||
# Check Whisper service GPU status
|
||||
curl http://localhost:8001/health
|
||||
### 5. Discord Bot Setup
|
||||
|
||||
1. Invite bot to your server with permissions:
|
||||
- Connect to voice channels
|
||||
- Speak in voice channels
|
||||
- Send messages
|
||||
- View channels
|
||||
|
||||
2. Use slash commands:
|
||||
- `/join` - Start recording in your voice channel
|
||||
- `/leave` - Stop recording and leave channel
|
||||
- `/status` - Check current recording status
|
||||
|
||||
## 💬 Discord Message Format
|
||||
|
||||
### English Source Example
|
||||
```
|
||||
🎤 New Transcription
|
||||
Speaker: mcgyvver
|
||||
🌍 Language Detected: English 🇬🇧
|
||||
⏱️ Duration: 17.0 seconds
|
||||
🕐 Time: 12 minutes ago
|
||||
|
||||
📝 Transcript
|
||||
Yeah, but this is a complete prank, we'll...
|
||||
Let's go to the foot.
|
||||
|
||||
🇩🇪 German
|
||||
Ja, aber das ist ein kompletter Streich, wir werden...
|
||||
Lass uns zum Fuß gehen.
|
||||
|
||||
🇰🇷 Korean
|
||||
예, 하지만 이것은 완전한 장난입니다...
|
||||
발로 가자.
|
||||
```
|
||||
|
||||
## 📊 Service Endpoints
|
||||
### Smart Language Logic
|
||||
|
||||
| Service | Port | Health Check | Purpose |
|
||||
|---------|------|--------------|---------|
|
||||
| Whisper Service | 8001 | `/health` | GPU transcription |
|
||||
| Translator | 8002 | `/health` | Multi-language translation |
|
||||
| Dashboard | 3000 | `/` | Web monitoring |
|
||||
| PostgreSQL | 5432 | - | Database |
|
||||
| Redis | 6379 | - | Message queue |
|
||||
| pgAdmin | 8080 | `/` | Database admin |
|
||||
| Redis Commander | 8081 | `/` | Redis admin |
|
||||
- **If detected = English**: Shows transcript + German + Korean
|
||||
- **If detected = German**: Shows transcript + English + Korean
|
||||
- **If detected = Korean**: Shows transcript + English + German
|
||||
- **If detected = Other**: Shows transcript + English + German + Korean
|
||||
|
||||
## 🎮 Discord Commands
|
||||
## 🔧 Service Configuration
|
||||
|
||||
- `/join` - Join your voice channel and start recording
|
||||
- `/leave` - Leave voice channel and stop recording
|
||||
- `/status` - Show current recording status
|
||||
### GPU Requirements
|
||||
|
||||
## 📈 Performance Optimizations
|
||||
- NVIDIA GPU with 4GB+ VRAM (RTX 4000 recommended)
|
||||
- CUDA 11.8+ support
|
||||
- nvidia-docker runtime configured
|
||||
|
||||
### GPU Configuration
|
||||
- **Model**: faster-whisper large-v2 with CUDA
|
||||
- **Precision**: float16 for optimal RTX 4000 performance
|
||||
- **Memory**: ~2GB VRAM usage for transcription
|
||||
- **Speed**: 2-3x real-time transcription
|
||||
### Model Storage
|
||||
|
||||
### Audio Processing
|
||||
- **Input**: Discord 48kHz stereo PCM
|
||||
- **Output**: 16kHz mono WAV (optimized for Whisper)
|
||||
- **Conversion**: FFmpeg with fallback to Python/scipy
|
||||
- **Cleanup**: Automatic removal of processed PCM files
|
||||
- Whisper models cached in `./data/models/`
|
||||
- NLLB translation models auto-downloaded
|
||||
- Models persist between container restarts
|
||||
|
||||
## 🗄️ Database Schema
|
||||
### Performance Tuning
|
||||
|
||||
- **connections**: Voice channel session tracking
|
||||
- **recordings**: Individual user audio recordings
|
||||
- **transcriptions**: Speech-to-text results with confidence scores
|
||||
- **translations**: Multi-language translation results
|
||||
- **processing_metrics**: Performance monitoring
|
||||
- **user_activity**: Aggregated usage statistics
|
||||
**Whisper Models**:
|
||||
- `large-v2`: Best accuracy, ~2GB VRAM, slower
|
||||
- `medium`: Good balance, ~1GB VRAM, faster
|
||||
- `small`: Fastest, ~500MB VRAM, lower accuracy
|
||||
|
||||
## 🔧 Development
|
||||
**Translation Strategy**:
|
||||
- Local NLLB: Fast, private, good quality
|
||||
- Google Translate: Premium quality, API costs
|
||||
- Automatic fallback for reliability
|
||||
|
||||
## 📊 Monitoring
|
||||
|
||||
### Health Checks
|
||||
|
||||
All services include comprehensive health checks:
|
||||
|
||||
```bash
|
||||
# Check service status
|
||||
docker-compose ps
|
||||
|
||||
# Individual service health
|
||||
curl http://localhost:8001/health # Whisper Service
|
||||
curl http://localhost:8002/health # Translation Service
|
||||
```
|
||||
|
||||
### Web Dashboard
|
||||
|
||||
Access monitoring dashboard at: `http://localhost:3000`
|
||||
|
||||
- Real-time transcription activity
|
||||
- Translation statistics
|
||||
- Service performance metrics
|
||||
- User activity summaries
|
||||
|
||||
### Admin Interfaces
|
||||
|
||||
**PostgreSQL Admin**: `http://localhost:8080` (pgAdmin)
|
||||
**Redis Commander**: `http://localhost:8081`
|
||||
|
||||
## 🛠️ Development
|
||||
|
||||
### Local Development
|
||||
```bash
|
||||
# Individual service development
|
||||
cd services/recorder
|
||||
npm install
|
||||
npm run dev
|
||||
|
||||
# Audio processor
|
||||
cd services/audio-processor
|
||||
pip install -r requirements.txt
|
||||
python src/processor.py
|
||||
```bash
|
||||
# Start infrastructure only
|
||||
docker-compose up postgres redis -d
|
||||
|
||||
# Run individual services locally
|
||||
cd services/recorder && npm run dev
|
||||
cd services/whisper-service && python src/api.py
|
||||
```
|
||||
|
||||
### Testing GPU Support
|
||||
```bash
|
||||
# Verify NVIDIA Docker runtime
|
||||
docker run --rm --gpus all nvidia/cuda:11.8-base-ubuntu22.04 nvidia-smi
|
||||
### Service Dependencies
|
||||
|
||||
# Test Whisper service
|
||||
docker-compose exec whisper-service python -c "import torch; print(torch.cuda.is_available())"
|
||||
```
|
||||
Recorder → PostgreSQL, Redis
|
||||
Audio Processor → Redis, PostgreSQL
|
||||
Whisper Service → Redis, PostgreSQL, GPU
|
||||
Translator → Redis, PostgreSQL
|
||||
Transcriber → Redis, PostgreSQL, Discord
|
||||
Dashboard → PostgreSQL, Redis
|
||||
```
|
||||
|
||||
## 📝 Logging
|
||||
### Adding New Languages
|
||||
|
||||
- **Structured Logging**: JSON format for all services
|
||||
- **Log Levels**: Configurable via `LOG_LEVEL` environment variable
|
||||
- **Centralized**: All logs stored in `/data/logs/`
|
||||
- **Monitoring**: Real-time log viewing via dashboard
|
||||
1. Add language code to `NLLB_LANG_MAP` in translator service
|
||||
2. Update `LANGUAGE_INFO` with display name and flag
|
||||
3. Modify `PRIMARY_LANGUAGES` if needed for default display
|
||||
|
||||
## 📦 Database Schema
|
||||
|
||||
### Key Tables
|
||||
|
||||
- **connections**: Voice channel sessions
|
||||
- **recordings**: Individual user audio recordings
|
||||
- **transcriptions**: Speech-to-text results
|
||||
- **translations**: Multi-language translations
|
||||
- **processing_metrics**: Performance tracking
|
||||
- **user_activity**: Aggregated user statistics
|
||||
|
||||
### Data Flow
|
||||
|
||||
```
|
||||
Connection → Recordings → Transcriptions → Translations
|
||||
```
|
||||
|
||||
## 🔐 Security
|
||||
|
||||
- **Environment Variables**: Secure credential management
|
||||
- **Database**: PostgreSQL with connection encryption
|
||||
- **Network**: Internal Docker network isolation
|
||||
- **API Keys**: Separate service credentials
|
||||
- No audio data stored permanently (auto-cleanup)
|
||||
- Local AI models for privacy
|
||||
- Secure credential management
|
||||
- Non-root container users
|
||||
- Network isolation between services
|
||||
|
||||
## 🚀 Deployment
|
||||
|
||||
### Unraid Setup
|
||||
1. Install Community Applications plugin
|
||||
2. Add Docker Compose plugin
|
||||
3. Configure GPU passthrough
|
||||
4. Import docker-compose.yml
|
||||
|
||||
### Production Considerations
|
||||
- **SSL**: Configure Nginx reverse proxy with SSL certificates
|
||||
- **Backup**: Regular database and configuration backups
|
||||
- **Monitoring**: Set up health check alerts
|
||||
- **Scaling**: Horizontal scaling for high-traffic Discord servers
|
||||
|
||||
## 🛠️ Troubleshooting
|
||||
## 🚨 Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
**GPU Not Detected**
|
||||
**GPU Not Detected**:
|
||||
```bash
|
||||
# Check NVIDIA drivers
|
||||
nvidia-smi
|
||||
|
||||
# Verify Docker GPU support
|
||||
# Check NVIDIA runtime
|
||||
docker run --rm --gpus all nvidia/cuda:11.8-base nvidia-smi
|
||||
```
|
||||
|
||||
**Service Won't Start**
|
||||
```bash
|
||||
# Check logs
|
||||
docker-compose logs [service-name]
|
||||
**Discord Bot Not Responding**:
|
||||
- Verify bot token and permissions
|
||||
- Check guild ID is correct
|
||||
- Ensure bot is in voice channel
|
||||
|
||||
# Verify environment variables
|
||||
docker-compose config
|
||||
**Translation Failures**:
|
||||
- Check Google API key if using premium
|
||||
- Verify network connectivity
|
||||
- Local NLLB model may be downloading
|
||||
|
||||
**Database Connection Issues**:
|
||||
- Ensure PostgreSQL is running
|
||||
- Check connection string format
|
||||
- Verify network connectivity between containers
|
||||
|
||||
### Service Logs
|
||||
|
||||
```bash
|
||||
# View specific service logs
|
||||
docker-compose logs recorder
|
||||
docker-compose logs whisper-service
|
||||
docker-compose logs translator
|
||||
|
||||
# Follow logs in real-time
|
||||
docker-compose logs -f transcriber
|
||||
```
|
||||
|
||||
**Audio Processing Slow**
|
||||
- Verify GPU acceleration is working
|
||||
- Check available VRAM
|
||||
- Monitor CPU usage during conversion
|
||||
## 📈 Performance
|
||||
|
||||
## 📚 API Documentation
|
||||
### Expected Performance (RTX 4000)
|
||||
|
||||
Once running, visit:
|
||||
- **Whisper API**: http://localhost:8001/docs
|
||||
- **Translator API**: http://localhost:8002/docs
|
||||
- **Dashboard**: http://localhost:3000
|
||||
- **Transcription**: 2-3x real-time (faster than speech)
|
||||
- **Translation**: Sub-second for typical sentences
|
||||
- **End-to-End Latency**: 3-8 seconds from speech to Discord message
|
||||
- **Concurrent Users**: 5-10 simultaneous speakers
|
||||
- **Languages**: 200+ supported via NLLB
|
||||
|
||||
### Resource Usage
|
||||
|
||||
- **GPU Memory**: 3-4GB for Whisper + NLLB
|
||||
- **System RAM**: 8GB recommended
|
||||
- **Storage**: 10GB for models + database
|
||||
- **Network**: Minimal (local processing)
|
||||
|
||||
## 🤝 Contributing
|
||||
|
||||
1. Fork the repository
|
||||
2. Create feature branch: `git checkout -b feature/amazing-feature`
|
||||
3. Commit changes: `git commit -m 'Add amazing feature'`
|
||||
4. Push to branch: `git push origin feature/amazing-feature`
|
||||
5. Open a Pull Request
|
||||
2. Create feature branch
|
||||
3. Add tests for new functionality
|
||||
4. Update documentation
|
||||
5. Submit pull request
|
||||
|
||||
## 📄 License
|
||||
|
||||
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
|
||||
MIT License - see LICENSE file for details
|
||||
|
||||
## 🎯 Roadmap
|
||||
## 🙏 Acknowledgments
|
||||
|
||||
- [ ] Complete Translator service implementation
|
||||
- [ ] Build Transcriber orchestration service
|
||||
- [ ] Create web dashboard with real-time updates
|
||||
- [ ] Add voice cloning capabilities
|
||||
- [ ] Implement speaker diarization
|
||||
- [ ] Support for additional languages
|
||||
- [ ] Mobile app for remote monitoring
|
||||
- [ ] Advanced analytics and insights
|
||||
|
||||
## 💡 Credits
|
||||
|
||||
Built with:
|
||||
- [faster-whisper](https://github.com/guillaumekln/faster-whisper) for GPU-accelerated transcription
|
||||
- [Discord.js](https://discord.js.org/) for Discord bot functionality
|
||||
- [FastAPI](https://fastapi.tiangolo.com/) for high-performance APIs
|
||||
- [PostgreSQL](https://www.postgresql.org/) for reliable data storage
|
||||
- [Redis](https://redis.io/) for message queuing and caching
|
||||
- **OpenAI Whisper**: Speech recognition models
|
||||
- **Facebook NLLB**: Neural machine translation
|
||||
- **Discord.js**: Discord API library
|
||||
- **faster-whisper**: Optimized Whisper inference
|
||||
- **Docker Community**: Containerization platform
|
||||
|
||||
---
|
||||
|
||||
**🚀 Ready to translate your Discord conversations in real-time with the power of your RTX 4000!**
|
||||
**Need help?** Check the troubleshooting section or open an issue!
|
||||
|
Reference in New Issue
Block a user