Update README with comprehensive documentation

This commit is contained in:
2025-07-14 10:35:57 -05:00
parent 802dca4cee
commit 92a42cd4e3

405
README.md
View File

@@ -1,252 +1,327 @@
# Discord Voice Translator V2 🎤🌍 # Discord Voice Translator V2
A comprehensive microservices-based Discord bot that captures voice channel audio, transcribes it using GPU-accelerated Whisper, and provides real-time multi-language translation. A comprehensive microservices-based Discord bot that captures voice, transcribes, translates, and posts formatted messages to text channels. Built with GPU acceleration and multi-language support.
## 🚀 Features ## 🎯 Features
- **Real-time Voice Capture**: Records individual user audio streams from Discord voice channels - **Real-time Voice Capture**: Records Discord voice channel audio
- **GPU-Accelerated Transcription**: Uses faster-whisper with NVIDIA RTX 4000 for lightning-fast speech-to-text - **GPU-Accelerated Transcription**: Uses NVIDIA RTX 4000 with faster-whisper
- **Multi-language Translation**: Supports translation to English, German, Korean, and more - **Multi-Language Translation**: Local NLLB models + Google Translate fallback
- **Microservices Architecture**: Scalable, maintainable design with independent services - **Smart Language Logic**: Always shows English, German, and Korean translations
- **Web Dashboard**: Real-time monitoring and historical analytics - **Professional Discord Messages**: Formatted with speaker info, duration, and timestamps
- **High Performance**: Optimized for low-latency processing - **Microservices Architecture**: Scalable, maintainable, containerized services
- **Comprehensive Monitoring**: Health checks, logging, and web dashboard
## 🏗️ Architecture ## 🏗️ Architecture
### Services Overview
``` ```
Discord Voice → [Recorder] → Raw PCM → [Audio Processor] → Optimized WAV Discord Voice → [Recorder] → Raw PCM → [Audio Processor] → WAV
[Whisper Service] → Transcription → [Translator] → Multi-language [Whisper Service] → Transcription → [Translator] → Multi-language
[Transcriber] → Discord Messages + Database → [Dashboard] [Transcriber] → Discord Message → [Dashboard] → Monitoring
``` ```
### Services Overview ### Core Services
| Service | Purpose | Technology | 1. **Recorder**: Discord bot that captures voice channel audio
|---------|---------|------------| 2. **Audio Processor**: Converts PCM to optimized WAV files
| **Recorder** | Discord bot + voice capture | Node.js, Discord.js | 3. **Whisper Service**: GPU-accelerated speech-to-text transcription
| **Audio Processor** | PCM to WAV conversion | Python, FFmpeg | 4. **Translator**: Local NLLB + Google Translate multi-language translation
| **Whisper Service** | GPU-accelerated transcription | Python, faster-whisper, CUDA | 5. **Transcriber**: Workflow orchestrator and Discord message formatter
| **Translator** | Multi-language translation | Python, Google/DeepL APIs | 6. **Dashboard**: Web interface for monitoring and management
| **Transcriber** | Workflow orchestration | Node.js, Message queues |
| **Dashboard** | Web monitoring interface | Node.js, Express, WebSockets |
## 🛠️ Prerequisites ### Infrastructure
### Hardware Requirements - **PostgreSQL**: Primary database for all structured data
- **GPU**: NVIDIA RTX 4000 (or compatible CUDA GPU) - **Redis**: Message queue and caching layer
- **RAM**: 8GB+ recommended - **Nginx**: Reverse proxy for web services
- **Storage**: 50GB+ for models and audio cache - **Docker Compose**: Container orchestration with GPU support
### Software Requirements ## 🚀 Quick Start
- **Docker** with Docker Compose
- **NVIDIA Container Toolkit** for GPU support
- **Unraid** (recommended) or Linux host
## ⚡ Quick Start ### Prerequisites
- Docker and Docker Compose
- NVIDIA GPU with Docker runtime support
- Discord Bot Token
- Google Translate API Key (optional, for premium translations)
### 1. Clone Repository
### 1. Clone the Repository
```bash ```bash
git clone <your-gitea-url>/discord-voice-translator-v2.git git clone <your-repo-url>
cd discord-voice-translator-v2 cd discord-voice-translator-v2
``` ```
### 2. Configure Environment ### 2. Configure Environment
```bash ```bash
cp .env.example .env cp .env.example .env
# Edit .env with your API keys and settings # Edit .env with your configuration
``` ```
Required environment variables: ### 3. Required Environment Variables
```bash ```bash
# Discord Bot # Discord Bot Configuration
DISCORD_TOKEN=your_discord_bot_token DISCORD_TOKEN=your_discord_bot_token_here
CLIENT_ID=your_bot_client_id CLIENT_ID=your_bot_client_id_here
GUILD_ID=your_guild_id GUILD_ID=your_guild_id_here
# Database # Database Configuration
POSTGRES_PASSWORD=your_secure_password POSTGRES_PASSWORD=your_secure_postgres_password
POSTGRES_URL=postgresql://postgres:your_secure_postgres_password@postgres:5432/voice_translator
# Translation APIs # Translation APIs (optional)
GOOGLE_TRANSLATE_API_KEY=your_google_api_key GOOGLE_TRANSLATE_API_KEY=your_google_translate_api_key
DEEPL_API_KEY=your_deepl_api_key
# GPU Configuration
WHISPER_MODEL=large-v2
``` ```
### 3. Start Services ### 4. Deploy with Docker Compose
```bash ```bash
# Start core services # Start all services
docker-compose up -d docker-compose up -d
# View logs
docker-compose logs -f
# Start with admin interfaces # Start with admin interfaces
docker-compose --profile admin up -d docker-compose --profile admin up -d
``` ```
### 4. Verify GPU Support ### 5. Discord Bot Setup
```bash
# Check Whisper service GPU status 1. Invite bot to your server with permissions:
curl http://localhost:8001/health - Connect to voice channels
- Speak in voice channels
- Send messages
- View channels
2. Use slash commands:
- `/join` - Start recording in your voice channel
- `/leave` - Stop recording and leave channel
- `/status` - Check current recording status
## 💬 Discord Message Format
### English Source Example
```
🎤 New Transcription
Speaker: mcgyvver
🌍 Language Detected: English 🇬🇧
⏱️ Duration: 17.0 seconds
🕐 Time: 12 minutes ago
📝 Transcript
Yeah, but this is a complete prank, we'll...
Let's go to the foot.
🇩🇪 German
Ja, aber das ist ein kompletter Streich, wir werden...
Lass uns zum Fuß gehen.
🇰🇷 Korean
예, 하지만 이것은 완전한 장난입니다...
발로 가자.
``` ```
## 📊 Service Endpoints ### Smart Language Logic
| Service | Port | Health Check | Purpose | - **If detected = English**: Shows transcript + German + Korean
|---------|------|--------------|---------| - **If detected = German**: Shows transcript + English + Korean
| Whisper Service | 8001 | `/health` | GPU transcription | - **If detected = Korean**: Shows transcript + English + German
| Translator | 8002 | `/health` | Multi-language translation | - **If detected = Other**: Shows transcript + English + German + Korean
| Dashboard | 3000 | `/` | Web monitoring |
| PostgreSQL | 5432 | - | Database |
| Redis | 6379 | - | Message queue |
| pgAdmin | 8080 | `/` | Database admin |
| Redis Commander | 8081 | `/` | Redis admin |
## 🎮 Discord Commands ## 🔧 Service Configuration
- `/join` - Join your voice channel and start recording ### GPU Requirements
- `/leave` - Leave voice channel and stop recording
- `/status` - Show current recording status
## 📈 Performance Optimizations - NVIDIA GPU with 4GB+ VRAM (RTX 4000 recommended)
- CUDA 11.8+ support
- nvidia-docker runtime configured
### GPU Configuration ### Model Storage
- **Model**: faster-whisper large-v2 with CUDA
- **Precision**: float16 for optimal RTX 4000 performance
- **Memory**: ~2GB VRAM usage for transcription
- **Speed**: 2-3x real-time transcription
### Audio Processing - Whisper models cached in `./data/models/`
- **Input**: Discord 48kHz stereo PCM - NLLB translation models auto-downloaded
- **Output**: 16kHz mono WAV (optimized for Whisper) - Models persist between container restarts
- **Conversion**: FFmpeg with fallback to Python/scipy
- **Cleanup**: Automatic removal of processed PCM files
## 🗄️ Database Schema ### Performance Tuning
- **connections**: Voice channel session tracking **Whisper Models**:
- **recordings**: Individual user audio recordings - `large-v2`: Best accuracy, ~2GB VRAM, slower
- **transcriptions**: Speech-to-text results with confidence scores - `medium`: Good balance, ~1GB VRAM, faster
- **translations**: Multi-language translation results - `small`: Fastest, ~500MB VRAM, lower accuracy
- **processing_metrics**: Performance monitoring
- **user_activity**: Aggregated usage statistics
## 🔧 Development **Translation Strategy**:
- Local NLLB: Fast, private, good quality
- Google Translate: Premium quality, API costs
- Automatic fallback for reliability
## 📊 Monitoring
### Health Checks
All services include comprehensive health checks:
```bash
# Check service status
docker-compose ps
# Individual service health
curl http://localhost:8001/health # Whisper Service
curl http://localhost:8002/health # Translation Service
```
### Web Dashboard
Access monitoring dashboard at: `http://localhost:3000`
- Real-time transcription activity
- Translation statistics
- Service performance metrics
- User activity summaries
### Admin Interfaces
**PostgreSQL Admin**: `http://localhost:8080` (pgAdmin)
**Redis Commander**: `http://localhost:8081`
## 🛠️ Development
### Local Development ### Local Development
```bash
# Individual service development
cd services/recorder
npm install
npm run dev
# Audio processor ```bash
cd services/audio-processor # Start infrastructure only
pip install -r requirements.txt docker-compose up postgres redis -d
python src/processor.py
# Run individual services locally
cd services/recorder && npm run dev
cd services/whisper-service && python src/api.py
``` ```
### Testing GPU Support ### Service Dependencies
```bash
# Verify NVIDIA Docker runtime
docker run --rm --gpus all nvidia/cuda:11.8-base-ubuntu22.04 nvidia-smi
# Test Whisper service ```
docker-compose exec whisper-service python -c "import torch; print(torch.cuda.is_available())" Recorder → PostgreSQL, Redis
Audio Processor → Redis, PostgreSQL
Whisper Service → Redis, PostgreSQL, GPU
Translator → Redis, PostgreSQL
Transcriber → Redis, PostgreSQL, Discord
Dashboard → PostgreSQL, Redis
``` ```
## 📝 Logging ### Adding New Languages
- **Structured Logging**: JSON format for all services 1. Add language code to `NLLB_LANG_MAP` in translator service
- **Log Levels**: Configurable via `LOG_LEVEL` environment variable 2. Update `LANGUAGE_INFO` with display name and flag
- **Centralized**: All logs stored in `/data/logs/` 3. Modify `PRIMARY_LANGUAGES` if needed for default display
- **Monitoring**: Real-time log viewing via dashboard
## 📦 Database Schema
### Key Tables
- **connections**: Voice channel sessions
- **recordings**: Individual user audio recordings
- **transcriptions**: Speech-to-text results
- **translations**: Multi-language translations
- **processing_metrics**: Performance tracking
- **user_activity**: Aggregated user statistics
### Data Flow
```
Connection → Recordings → Transcriptions → Translations
```
## 🔐 Security ## 🔐 Security
- **Environment Variables**: Secure credential management - No audio data stored permanently (auto-cleanup)
- **Database**: PostgreSQL with connection encryption - Local AI models for privacy
- **Network**: Internal Docker network isolation - Secure credential management
- **API Keys**: Separate service credentials - Non-root container users
- Network isolation between services
## 🚀 Deployment ## 🚨 Troubleshooting
### Unraid Setup
1. Install Community Applications plugin
2. Add Docker Compose plugin
3. Configure GPU passthrough
4. Import docker-compose.yml
### Production Considerations
- **SSL**: Configure Nginx reverse proxy with SSL certificates
- **Backup**: Regular database and configuration backups
- **Monitoring**: Set up health check alerts
- **Scaling**: Horizontal scaling for high-traffic Discord servers
## 🛠️ Troubleshooting
### Common Issues ### Common Issues
**GPU Not Detected** **GPU Not Detected**:
```bash ```bash
# Check NVIDIA drivers # Check NVIDIA runtime
nvidia-smi
# Verify Docker GPU support
docker run --rm --gpus all nvidia/cuda:11.8-base nvidia-smi docker run --rm --gpus all nvidia/cuda:11.8-base nvidia-smi
``` ```
**Service Won't Start** **Discord Bot Not Responding**:
```bash - Verify bot token and permissions
# Check logs - Check guild ID is correct
docker-compose logs [service-name] - Ensure bot is in voice channel
# Verify environment variables **Translation Failures**:
docker-compose config - Check Google API key if using premium
- Verify network connectivity
- Local NLLB model may be downloading
**Database Connection Issues**:
- Ensure PostgreSQL is running
- Check connection string format
- Verify network connectivity between containers
### Service Logs
```bash
# View specific service logs
docker-compose logs recorder
docker-compose logs whisper-service
docker-compose logs translator
# Follow logs in real-time
docker-compose logs -f transcriber
``` ```
**Audio Processing Slow** ## 📈 Performance
- Verify GPU acceleration is working
- Check available VRAM
- Monitor CPU usage during conversion
## 📚 API Documentation ### Expected Performance (RTX 4000)
Once running, visit: - **Transcription**: 2-3x real-time (faster than speech)
- **Whisper API**: http://localhost:8001/docs - **Translation**: Sub-second for typical sentences
- **Translator API**: http://localhost:8002/docs - **End-to-End Latency**: 3-8 seconds from speech to Discord message
- **Dashboard**: http://localhost:3000 - **Concurrent Users**: 5-10 simultaneous speakers
- **Languages**: 200+ supported via NLLB
### Resource Usage
- **GPU Memory**: 3-4GB for Whisper + NLLB
- **System RAM**: 8GB recommended
- **Storage**: 10GB for models + database
- **Network**: Minimal (local processing)
## 🤝 Contributing ## 🤝 Contributing
1. Fork the repository 1. Fork the repository
2. Create feature branch: `git checkout -b feature/amazing-feature` 2. Create feature branch
3. Commit changes: `git commit -m 'Add amazing feature'` 3. Add tests for new functionality
4. Push to branch: `git push origin feature/amazing-feature` 4. Update documentation
5. Open a Pull Request 5. Submit pull request
## 📄 License ## 📄 License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. MIT License - see LICENSE file for details
## 🎯 Roadmap ## 🙏 Acknowledgments
- [ ] Complete Translator service implementation - **OpenAI Whisper**: Speech recognition models
- [ ] Build Transcriber orchestration service - **Facebook NLLB**: Neural machine translation
- [ ] Create web dashboard with real-time updates - **Discord.js**: Discord API library
- [ ] Add voice cloning capabilities - **faster-whisper**: Optimized Whisper inference
- [ ] Implement speaker diarization - **Docker Community**: Containerization platform
- [ ] Support for additional languages
- [ ] Mobile app for remote monitoring
- [ ] Advanced analytics and insights
## 💡 Credits
Built with:
- [faster-whisper](https://github.com/guillaumekln/faster-whisper) for GPU-accelerated transcription
- [Discord.js](https://discord.js.org/) for Discord bot functionality
- [FastAPI](https://fastapi.tiangolo.com/) for high-performance APIs
- [PostgreSQL](https://www.postgresql.org/) for reliable data storage
- [Redis](https://redis.io/) for message queuing and caching
--- ---
**🚀 Ready to translate your Discord conversations in real-time with the power of your RTX 4000!** **Need help?** Check the troubleshooting section or open an issue!