Update README with comprehensive documentation

2025-07-14 10:35:57 -05:00
parent 802dca4cee
commit 92a42cd4e3
1 changed files with 240 additions and 165 deletions
--- a/README.md
+++ b/README.md
@@ -1,252 +1,327 @@
-# Discord Voice Translator V2 🎤🌍
+# Discord Voice Translator V2

-A comprehensive microservices-based Discord bot that captures voice channel audio, transcribes it using GPU-accelerated Whisper, and provides real-time multi-language translation.
+A comprehensive microservices-based Discord bot that captures voice, transcribes, translates, and posts formatted messages to text channels. Built with GPU acceleration and multi-language support.

-## 🚀 Features
+## 🎯 Features

- **Real-time Voice Capture**: Records individual user audio streams from Discord voice channels
- **GPU-Accelerated Transcription**: Uses faster-whisper with NVIDIA RTX 4000 for lightning-fast speech-to-text
- **Multi-language Translation**: Supports translation to English, German, Korean, and more
- **Microservices Architecture**: Scalable, maintainable design with independent services
- **Web Dashboard**: Real-time monitoring and historical analytics
- **High Performance**: Optimized for low-latency processing
+- **Real-time Voice Capture**: Records Discord voice channel audio
+- **GPU-Accelerated Transcription**: Uses NVIDIA RTX 4000 with faster-whisper
+- **Multi-Language Translation**: Local NLLB models + Google Translate fallback
+- **Smart Language Logic**: Always shows English, German, and Korean translations
+- **Professional Discord Messages**: Formatted with speaker info, duration, and timestamps
+- **Microservices Architecture**: Scalable, maintainable, containerized services
+- **Comprehensive Monitoring**: Health checks, logging, and web dashboard

 ## 🏗️ Architecture

+### Services Overview
+
 ```
-Discord Voice → [Recorder] → Raw PCM → [Audio Processor] → Optimized WAV
+Discord Voice → [Recorder] → Raw PCM → [Audio Processor] → WAV 
    ↓
 [Whisper Service] → Transcription → [Translator] → Multi-language
    ↓
-[Transcriber] → Discord Messages + Database → [Dashboard]
+[Transcriber] → Discord Message → [Dashboard] → Monitoring
 ```

-### Services Overview
+### Core Services

-| Service | Purpose | Technology |
-|---------|---------|------------|
-| **Recorder** | Discord bot + voice capture | Node.js, Discord.js |
-| **Audio Processor** | PCM to WAV conversion | Python, FFmpeg |
-| **Whisper Service** | GPU-accelerated transcription | Python, faster-whisper, CUDA |
-| **Translator** | Multi-language translation | Python, Google/DeepL APIs |
-| **Transcriber** | Workflow orchestration | Node.js, Message queues |
-| **Dashboard** | Web monitoring interface | Node.js, Express, WebSockets |
+1. **Recorder**: Discord bot that captures voice channel audio
+2. **Audio Processor**: Converts PCM to optimized WAV files
+3. **Whisper Service**: GPU-accelerated speech-to-text transcription
+4. **Translator**: Local NLLB + Google Translate multi-language translation
+5. **Transcriber**: Workflow orchestrator and Discord message formatter
+6. **Dashboard**: Web interface for monitoring and management

-## 🛠️ Prerequisites
+### Infrastructure

-### Hardware Requirements
- **GPU**: NVIDIA RTX 4000 (or compatible CUDA GPU)
- **RAM**: 8GB+ recommended
- **Storage**: 50GB+ for models and audio cache
+- **PostgreSQL**: Primary database for all structured data
+- **Redis**: Message queue and caching layer
+- **Nginx**: Reverse proxy for web services
+- **Docker Compose**: Container orchestration with GPU support

-### Software Requirements
- **Docker** with Docker Compose
- **NVIDIA Container Toolkit** for GPU support
- **Unraid** (recommended) or Linux host
+## 🚀 Quick Start

-## ⚡ Quick Start
+### Prerequisites
+
+- Docker and Docker Compose
+- NVIDIA GPU with Docker runtime support
+- Discord Bot Token
+- Google Translate API Key (optional, for premium translations)
+
+### 1. Clone Repository

-### 1. Clone the Repository
 ```bash
-git clone <your-gitea-url>/discord-voice-translator-v2.git
+git clone <your-repo-url>
 cd discord-voice-translator-v2
 ```

 ### 2. Configure Environment
+
 ```bash
 cp .env.example .env
-# Edit .env with your API keys and settings
+# Edit .env with your configuration
 ```

-Required environment variables:
+### 3. Required Environment Variables
+
 ```bash
-# Discord Bot
-DISCORD_TOKEN=your_discord_bot_token
-CLIENT_ID=your_bot_client_id
-GUILD_ID=your_guild_id
+# Discord Bot Configuration
+DISCORD_TOKEN=your_discord_bot_token_here
+CLIENT_ID=your_bot_client_id_here
+GUILD_ID=your_guild_id_here

-# Database
-POSTGRES_PASSWORD=your_secure_password
+# Database Configuration
+POSTGRES_PASSWORD=your_secure_postgres_password
+POSTGRES_URL=postgresql://postgres:your_secure_postgres_password@postgres:5432/voice_translator

-# Translation APIs
-GOOGLE_TRANSLATE_API_KEY=your_google_api_key
-DEEPL_API_KEY=your_deepl_api_key
+# Translation APIs (optional)
+GOOGLE_TRANSLATE_API_KEY=your_google_translate_api_key
+
+# GPU Configuration
+WHISPER_MODEL=large-v2
 ```

-### 3. Start Services
+### 4. Deploy with Docker Compose
+
 ```bash
-# Start core services
+# Start all services
 docker-compose up -d

+# View logs
+docker-compose logs -f
+
 # Start with admin interfaces
 docker-compose --profile admin up -d
 ```

-### 4. Verify GPU Support
-```bash
-# Check Whisper service GPU status
-curl http://localhost:8001/health
+### 5. Discord Bot Setup
+
+1. Invite bot to your server with permissions:
+   - Connect to voice channels
+   - Speak in voice channels
+   - Send messages
+   - View channels
+
+2. Use slash commands:
+   - `/join` - Start recording in your voice channel
+   - `/leave` - Stop recording and leave channel
+   - `/status` - Check current recording status
+
+## 💬 Discord Message Format
+
+### English Source Example
+```
+🎤 New Transcription
+Speaker: mcgyvver
+🌍 Language Detected: English 🇬🇧
+⏱️ Duration: 17.0 seconds
+🕐 Time: 12 minutes ago
+
+📝 Transcript
+Yeah, but this is a complete prank, we'll...
+Let's go to the foot.
+
+🇩🇪 German
+Ja, aber das ist ein kompletter Streich, wir werden...
+Lass uns zum Fuß gehen.
+
+🇰🇷 Korean
+예, 하지만 이것은 완전한 장난입니다...
+발로 가자.
 ```

-## 📊 Service Endpoints
+### Smart Language Logic

-| Service | Port | Health Check | Purpose |
-|---------|------|--------------|---------|
-| Whisper Service | 8001 | `/health` | GPU transcription |
-| Translator | 8002 | `/health` | Multi-language translation |
-| Dashboard | 3000 | `/` | Web monitoring |
-| PostgreSQL | 5432 | - | Database |
-| Redis | 6379 | - | Message queue |
-| pgAdmin | 8080 | `/` | Database admin |
-| Redis Commander | 8081 | `/` | Redis admin |
+- **If detected = English**: Shows transcript + German + Korean
+- **If detected = German**: Shows transcript + English + Korean  
+- **If detected = Korean**: Shows transcript + English + German
+- **If detected = Other**: Shows transcript + English + German + Korean

-## 🎮 Discord Commands
+## 🔧 Service Configuration

- `/join` - Join your voice channel and start recording
- `/leave` - Leave voice channel and stop recording  
- `/status` - Show current recording status
+### GPU Requirements

-## 📈 Performance Optimizations
+- NVIDIA GPU with 4GB+ VRAM (RTX 4000 recommended)
+- CUDA 11.8+ support
+- nvidia-docker runtime configured

-### GPU Configuration
- **Model**: faster-whisper large-v2 with CUDA
- **Precision**: float16 for optimal RTX 4000 performance
- **Memory**: ~2GB VRAM usage for transcription
- **Speed**: 2-3x real-time transcription
+### Model Storage

-### Audio Processing
- **Input**: Discord 48kHz stereo PCM
- **Output**: 16kHz mono WAV (optimized for Whisper)
- **Conversion**: FFmpeg with fallback to Python/scipy
- **Cleanup**: Automatic removal of processed PCM files
+- Whisper models cached in `./data/models/`
+- NLLB translation models auto-downloaded
+- Models persist between container restarts

-## 🗄️ Database Schema
+### Performance Tuning

- **connections**: Voice channel session tracking
- **recordings**: Individual user audio recordings
- **transcriptions**: Speech-to-text results with confidence scores
- **translations**: Multi-language translation results
- **processing_metrics**: Performance monitoring
- **user_activity**: Aggregated usage statistics
+**Whisper Models**:
+- `large-v2`: Best accuracy, ~2GB VRAM, slower
+- `medium`: Good balance, ~1GB VRAM, faster
+- `small`: Fastest, ~500MB VRAM, lower accuracy

-## 🔧 Development
+**Translation Strategy**:
+- Local NLLB: Fast, private, good quality
+- Google Translate: Premium quality, API costs
+- Automatic fallback for reliability
+
+## 📊 Monitoring
+
+### Health Checks
+
+All services include comprehensive health checks:
+
+```bash
+# Check service status
+docker-compose ps
+
+# Individual service health
+curl http://localhost:8001/health  # Whisper Service
+curl http://localhost:8002/health  # Translation Service
+```
+
+### Web Dashboard
+
+Access monitoring dashboard at: `http://localhost:3000`
+
+- Real-time transcription activity
+- Translation statistics
+- Service performance metrics
+- User activity summaries
+
+### Admin Interfaces
+
+**PostgreSQL Admin**: `http://localhost:8080` (pgAdmin)
+**Redis Commander**: `http://localhost:8081`
+
+## 🛠️ Development

 ### Local Development
-```bash
-# Individual service development
-cd services/recorder
-npm install
-npm run dev

-# Audio processor
-cd services/audio-processor  
-pip install -r requirements.txt
-python src/processor.py
+```bash
+# Start infrastructure only
+docker-compose up postgres redis -d
+
+# Run individual services locally
+cd services/recorder && npm run dev
+cd services/whisper-service && python src/api.py
 ```

-### Testing GPU Support
-```bash
-# Verify NVIDIA Docker runtime
-docker run --rm --gpus all nvidia/cuda:11.8-base-ubuntu22.04 nvidia-smi
+### Service Dependencies

-# Test Whisper service
-docker-compose exec whisper-service python -c "import torch; print(torch.cuda.is_available())"
+```
+Recorder → PostgreSQL, Redis
+Audio Processor → Redis, PostgreSQL  
+Whisper Service → Redis, PostgreSQL, GPU
+Translator → Redis, PostgreSQL
+Transcriber → Redis, PostgreSQL, Discord
+Dashboard → PostgreSQL, Redis
 ```

-## 📝 Logging
+### Adding New Languages

- **Structured Logging**: JSON format for all services
- **Log Levels**: Configurable via `LOG_LEVEL` environment variable
- **Centralized**: All logs stored in `/data/logs/`
- **Monitoring**: Real-time log viewing via dashboard
+1. Add language code to `NLLB_LANG_MAP` in translator service
+2. Update `LANGUAGE_INFO` with display name and flag
+3. Modify `PRIMARY_LANGUAGES` if needed for default display
+
+## 📦 Database Schema
+
+### Key Tables
+
+- **connections**: Voice channel sessions
+- **recordings**: Individual user audio recordings  
+- **transcriptions**: Speech-to-text results
+- **translations**: Multi-language translations
+- **processing_metrics**: Performance tracking
+- **user_activity**: Aggregated user statistics
+
+### Data Flow
+
+```
+Connection → Recordings → Transcriptions → Translations
+```

 ## 🔐 Security

- **Environment Variables**: Secure credential management
- **Database**: PostgreSQL with connection encryption
- **Network**: Internal Docker network isolation
- **API Keys**: Separate service credentials
+- No audio data stored permanently (auto-cleanup)
+- Local AI models for privacy
+- Secure credential management
+- Non-root container users
+- Network isolation between services

-## 🚀 Deployment
-
-### Unraid Setup
-1. Install Community Applications plugin
-2. Add Docker Compose plugin
-3. Configure GPU passthrough
-4. Import docker-compose.yml
-
-### Production Considerations
- **SSL**: Configure Nginx reverse proxy with SSL certificates
- **Backup**: Regular database and configuration backups
- **Monitoring**: Set up health check alerts
- **Scaling**: Horizontal scaling for high-traffic Discord servers
-
-## 🛠️ Troubleshooting
+## 🚨 Troubleshooting

 ### Common Issues

-**GPU Not Detected**
+**GPU Not Detected**:
 ```bash
-# Check NVIDIA drivers
-nvidia-smi
-
-# Verify Docker GPU support
+# Check NVIDIA runtime
 docker run --rm --gpus all nvidia/cuda:11.8-base nvidia-smi
 ```

-**Service Won't Start**
-```bash
-# Check logs
-docker-compose logs [service-name]
+**Discord Bot Not Responding**:
+- Verify bot token and permissions
+- Check guild ID is correct
+- Ensure bot is in voice channel

-# Verify environment variables
-docker-compose config
+**Translation Failures**:
+- Check Google API key if using premium
+- Verify network connectivity
+- Local NLLB model may be downloading
+
+**Database Connection Issues**:
+- Ensure PostgreSQL is running
+- Check connection string format
+- Verify network connectivity between containers
+
+### Service Logs
+
+```bash
+# View specific service logs
+docker-compose logs recorder
+docker-compose logs whisper-service
+docker-compose logs translator
+
+# Follow logs in real-time
+docker-compose logs -f transcriber
 ```

-**Audio Processing Slow**
- Verify GPU acceleration is working
- Check available VRAM
- Monitor CPU usage during conversion
+## 📈 Performance

-## 📚 API Documentation
+### Expected Performance (RTX 4000)

-Once running, visit:
- **Whisper API**: http://localhost:8001/docs
- **Translator API**: http://localhost:8002/docs
- **Dashboard**: http://localhost:3000
+- **Transcription**: 2-3x real-time (faster than speech)
+- **Translation**: Sub-second for typical sentences
+- **End-to-End Latency**: 3-8 seconds from speech to Discord message
+- **Concurrent Users**: 5-10 simultaneous speakers
+- **Languages**: 200+ supported via NLLB
+
+### Resource Usage
+
+- **GPU Memory**: 3-4GB for Whisper + NLLB
+- **System RAM**: 8GB recommended
+- **Storage**: 10GB for models + database
+- **Network**: Minimal (local processing)

 ## 🤝 Contributing

 1. Fork the repository
-2. Create feature branch: `git checkout -b feature/amazing-feature`
-3. Commit changes: `git commit -m 'Add amazing feature'`
-4. Push to branch: `git push origin feature/amazing-feature`
-5. Open a Pull Request
+2. Create feature branch
+3. Add tests for new functionality  
+4. Update documentation
+5. Submit pull request

 ## 📄 License

-This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
+MIT License - see LICENSE file for details

-## 🎯 Roadmap
+## 🙏 Acknowledgments

- [ ] Complete Translator service implementation
- [ ] Build Transcriber orchestration service
- [ ] Create web dashboard with real-time updates
- [ ] Add voice cloning capabilities
- [ ] Implement speaker diarization
- [ ] Support for additional languages
- [ ] Mobile app for remote monitoring
- [ ] Advanced analytics and insights
-
-## 💡 Credits
-
-Built with:
- [faster-whisper](https://github.com/guillaumekln/faster-whisper) for GPU-accelerated transcription
- [Discord.js](https://discord.js.org/) for Discord bot functionality
- [FastAPI](https://fastapi.tiangolo.com/) for high-performance APIs
- [PostgreSQL](https://www.postgresql.org/) for reliable data storage
- [Redis](https://redis.io/) for message queuing and caching
+- **OpenAI Whisper**: Speech recognition models
+- **Facebook NLLB**: Neural machine translation
+- **Discord.js**: Discord API library
+- **faster-whisper**: Optimized Whisper inference
+- **Docker Community**: Containerization platform

 ---

-**🚀 Ready to translate your Discord conversations in real-time with the power of your RTX 4000!**
+**Need help?** Check the troubleshooting section or open an issue!