# Discord Voice Translator V2 🎤🌍

A comprehensive microservices-based Discord bot that captures voice channel audio, transcribes it using GPU-accelerated Whisper, and provides real-time multi-language translation.

## 🚀 Features

- **Real-time Voice Capture**: Records individual user audio streams from Discord voice channels
- **GPU-Accelerated Transcription**: Uses faster-whisper with NVIDIA RTX 4000 for lightning-fast speech-to-text
- **Multi-language Translation**: Supports translation to English, German, Korean, and more
- **Microservices Architecture**: Scalable, maintainable design with independent services
- **Web Dashboard**: Real-time monitoring and historical analytics
- **High Performance**: Optimized for low-latency processing

## 🏗️ Architecture

```
Discord Voice → [Recorder] → Raw PCM → [Audio Processor] → Optimized WAV
    ↓
[Whisper Service] → Transcription → [Translator] → Multi-language
    ↓
[Transcriber] → Discord Messages + Database → [Dashboard]
```

### Services Overview

| Service | Purpose | Technology |
|---------|---------|------------|
| **Recorder** | Discord bot + voice capture | Node.js, Discord.js |
| **Audio Processor** | PCM to WAV conversion | Python, FFmpeg |
| **Whisper Service** | GPU-accelerated transcription | Python, faster-whisper, CUDA |
| **Translator** | Multi-language translation | Python, Google/DeepL APIs |
| **Transcriber** | Workflow orchestration | Node.js, Message queues |
| **Dashboard** | Web monitoring interface | Node.js, Express, WebSockets |

## 🛠️ Prerequisites

### Hardware Requirements
- **GPU**: NVIDIA RTX 4000 (or compatible CUDA GPU)
- **RAM**: 8GB+ recommended
- **Storage**: 50GB+ for models and audio cache

### Software Requirements
- **Docker** with Docker Compose
- **NVIDIA Container Toolkit** for GPU support
- **Unraid** (recommended) or Linux host

## ⚡ Quick Start

### 1. Clone the Repository
```bash
git clone <your-gitea-url>/discord-voice-translator-v2.git
cd discord-voice-translator-v2
```

### 2. Configure Environment
```bash
cp .env.example .env
# Edit .env with your API keys and settings
```

Required environment variables:
```bash
# Discord Bot
DISCORD_TOKEN=your_discord_bot_token
CLIENT_ID=your_bot_client_id
GUILD_ID=your_guild_id

# Database
POSTGRES_PASSWORD=your_secure_password

# Translation APIs
GOOGLE_TRANSLATE_API_KEY=your_google_api_key
DEEPL_API_KEY=your_deepl_api_key
```

### 3. Start Services
```bash
# Start core services
docker-compose up -d

# Start with admin interfaces
docker-compose --profile admin up -d
```

### 4. Verify GPU Support
```bash
# Check Whisper service GPU status
curl http://localhost:8001/health
```

## 📊 Service Endpoints

| Service | Port | Health Check | Purpose |
|---------|------|--------------|---------|
| Whisper Service | 8001 | `/health` | GPU transcription |
| Translator | 8002 | `/health` | Multi-language translation |
| Dashboard | 3000 | `/` | Web monitoring |
| PostgreSQL | 5432 | - | Database |
| Redis | 6379 | - | Message queue |
| pgAdmin | 8080 | `/` | Database admin |
| Redis Commander | 8081 | `/` | Redis admin |

## 🎮 Discord Commands

- `/join` - Join your voice channel and start recording
- `/leave` - Leave voice channel and stop recording  
- `/status` - Show current recording status

## 📈 Performance Optimizations

### GPU Configuration
- **Model**: faster-whisper large-v2 with CUDA
- **Precision**: float16 for optimal RTX 4000 performance
- **Memory**: ~2GB VRAM usage for transcription
- **Speed**: 2-3x real-time transcription

### Audio Processing
- **Input**: Discord 48kHz stereo PCM
- **Output**: 16kHz mono WAV (optimized for Whisper)
- **Conversion**: FFmpeg with fallback to Python/scipy
- **Cleanup**: Automatic removal of processed PCM files

## 🗄️ Database Schema

- **connections**: Voice channel session tracking
- **recordings**: Individual user audio recordings
- **transcriptions**: Speech-to-text results with confidence scores
- **translations**: Multi-language translation results
- **processing_metrics**: Performance monitoring
- **user_activity**: Aggregated usage statistics

## 🔧 Development

### Local Development
```bash
# Individual service development
cd services/recorder
npm install
npm run dev

# Audio processor
cd services/audio-processor  
pip install -r requirements.txt
python src/processor.py
```

### Testing GPU Support
```bash
# Verify NVIDIA Docker runtime
docker run --rm --gpus all nvidia/cuda:11.8-base-ubuntu22.04 nvidia-smi

# Test Whisper service
docker-compose exec whisper-service python -c "import torch; print(torch.cuda.is_available())"
```

## 📝 Logging

- **Structured Logging**: JSON format for all services
- **Log Levels**: Configurable via `LOG_LEVEL` environment variable
- **Centralized**: All logs stored in `/data/logs/`
- **Monitoring**: Real-time log viewing via dashboard

## 🔐 Security

- **Environment Variables**: Secure credential management
- **Database**: PostgreSQL with connection encryption
- **Network**: Internal Docker network isolation
- **API Keys**: Separate service credentials

## 🚀 Deployment

### Unraid Setup
1. Install Community Applications plugin
2. Add Docker Compose plugin
3. Configure GPU passthrough
4. Import docker-compose.yml

### Production Considerations
- **SSL**: Configure Nginx reverse proxy with SSL certificates
- **Backup**: Regular database and configuration backups
- **Monitoring**: Set up health check alerts
- **Scaling**: Horizontal scaling for high-traffic Discord servers

## 🛠️ Troubleshooting

### Common Issues

**GPU Not Detected**
```bash
# Check NVIDIA drivers
nvidia-smi

# Verify Docker GPU support
docker run --rm --gpus all nvidia/cuda:11.8-base nvidia-smi
```

**Service Won't Start**
```bash
# Check logs
docker-compose logs [service-name]

# Verify environment variables
docker-compose config
```

**Audio Processing Slow**
- Verify GPU acceleration is working
- Check available VRAM
- Monitor CPU usage during conversion

## 📚 API Documentation

Once running, visit:
- **Whisper API**: http://localhost:8001/docs
- **Translator API**: http://localhost:8002/docs
- **Dashboard**: http://localhost:3000

## 🤝 Contributing

1. Fork the repository
2. Create feature branch: `git checkout -b feature/amazing-feature`
3. Commit changes: `git commit -m 'Add amazing feature'`
4. Push to branch: `git push origin feature/amazing-feature`
5. Open a Pull Request

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 🎯 Roadmap

- [ ] Complete Translator service implementation
- [ ] Build Transcriber orchestration service
- [ ] Create web dashboard with real-time updates
- [ ] Add voice cloning capabilities
- [ ] Implement speaker diarization
- [ ] Support for additional languages
- [ ] Mobile app for remote monitoring
- [ ] Advanced analytics and insights

## 💡 Credits

Built with:
- [faster-whisper](https://github.com/guillaumekln/faster-whisper) for GPU-accelerated transcription
- [Discord.js](https://discord.js.org/) for Discord bot functionality
- [FastAPI](https://fastapi.tiangolo.com/) for high-performance APIs
- [PostgreSQL](https://www.postgresql.org/) for reliable data storage
- [Redis](https://redis.io/) for message queuing and caching

---

**🚀 Ready to translate your Discord conversations in real-time with the power of your RTX 4000!**