discord-voice-translator-v2/README.md

# Discord Voice Translator V2

A comprehensive microservices-based Discord bot that captures voice, transcribes, translates, and posts formatted messages to text channels. Built with GPU acceleration and multi-language support.

## 🎯 Features

- **Real-time Voice Capture**: Records Discord voice channel audio
- **GPU-Accelerated Transcription**: Uses NVIDIA RTX 4000 with faster-whisper
- **Multi-Language Translation**: Local NLLB models + Google Translate fallback
- **Smart Language Logic**: Always shows English, German, and Korean translations
- **Professional Discord Messages**: Formatted with speaker info, duration, and timestamps
- **Microservices Architecture**: Scalable, maintainable, containerized services
- **Comprehensive Monitoring**: Health checks, logging, and web dashboard

## 🏗️ Architecture

### Services Overview

```
Discord Voice → [Recorder] → Raw PCM → [Audio Processor] → WAV
    ↓
[Whisper Service] → Transcription → [Translator] → Multi-language
    ↓
[Transcriber] → Discord Message → [Dashboard] → Monitoring
```

### Core Services

1. **Recorder**: Discord bot that captures voice channel audio
2. **Audio Processor**: Converts PCM to optimized WAV files
3. **Whisper Service**: GPU-accelerated speech-to-text transcription
4. **Translator**: Local NLLB + Google Translate multi-language translation
5. **Transcriber**: Workflow orchestrator and Discord message formatter
6. **Dashboard**: Web interface for monitoring and management

### Infrastructure

- **PostgreSQL**: Primary database for all structured data
- **Redis**: Message queue and caching layer
- **Nginx**: Reverse proxy for web services
- **Docker Compose**: Container orchestration with GPU support

## 🚀 Quick Start

### Prerequisites

- Docker and Docker Compose
- NVIDIA GPU with Docker runtime support
- Discord Bot Token
- Google Translate API Key (optional, for premium translations)

### 1. Clone Repository

```bash
git clone <your-repo-url>
cd discord-voice-translator-v2
```

### 2. Configure Environment

```bash
cp .env.example .env
# Edit .env with your configuration
```

### 3. Required Environment Variables

```bash
# Discord Bot Configuration
DISCORD_TOKEN=your_discord_bot_token_here
CLIENT_ID=your_bot_client_id_here
GUILD_ID=your_guild_id_here

# Database Configuration
POSTGRES_PASSWORD=your_secure_postgres_password
POSTGRES_URL=postgresql://postgres:your_secure_postgres_password@postgres:5432/voice_translator

# Translation APIs (optional)
GOOGLE_TRANSLATE_API_KEY=your_google_translate_api_key

# GPU Configuration
WHISPER_MODEL=large-v2
```

### 4. Deploy with Docker Compose

```bash
# Start all services
docker-compose up -d

# View logs
docker-compose logs -f

# Start with admin interfaces
docker-compose --profile admin up -d
```

### 5. Discord Bot Setup

1. Invite bot to your server with permissions:
   - Connect to voice channels
   - Speak in voice channels
   - Send messages
   - View channels

2. Use slash commands:
   - `/join` - Start recording in your voice channel
   - `/leave` - Stop recording and leave channel
   - `/status` - Check current recording status

## 💬 Discord Message Format

### English Source Example
```
🎤 New Transcription
Speaker: mcgyvver
🌍 Language Detected: English 🇬🇧
⏱️ Duration: 17.0 seconds
🕐 Time: 12 minutes ago

📝 Transcript
Yeah, but this is a complete prank, we'll...
Let's go to the foot.

🇩🇪 German
Ja, aber das ist ein kompletter Streich, wir werden...
Lass uns zum Fuß gehen.

🇰🇷 Korean
예, 하지만 이것은 완전한 장난입니다...
발로 가자.
```

### Smart Language Logic

- **If detected = English**: Shows transcript + German + Korean
- **If detected = German**: Shows transcript + English + Korean
- **If detected = Korean**: Shows transcript + English + German
- **If detected = Other**: Shows transcript + English + German + Korean

## 🔧 Service Configuration

### GPU Requirements

- NVIDIA GPU with 4GB+ VRAM (RTX 4000 recommended)
- CUDA 11.8+ support
- nvidia-docker runtime configured

### Model Storage

- Whisper models cached in `./data/models/`
- NLLB translation models auto-downloaded
- Models persist between container restarts

### Performance Tuning

**Whisper Models**:
- `large-v2`: Best accuracy, ~2GB VRAM, slower
- `medium`: Good balance, ~1GB VRAM, faster
- `small`: Fastest, ~500MB VRAM, lower accuracy

**Translation Strategy**:
- Local NLLB: Fast, private, good quality
- Google Translate: Premium quality, API costs
- Automatic fallback for reliability

## 📊 Monitoring

### Health Checks

All services include comprehensive health checks:

```bash
# Check service status
docker-compose ps

# Individual service health
curl http://localhost:8001/health  # Whisper Service
curl http://localhost:8002/health  # Translation Service
```

### Web Dashboard

Access monitoring dashboard at: `http://localhost:3000`

- Real-time transcription activity
- Translation statistics
- Service performance metrics
- User activity summaries

### Admin Interfaces

**PostgreSQL Admin**: `http://localhost:8080` (pgAdmin)
**Redis Commander**: `http://localhost:8081`

## 🛠️ Development

### Local Development

```bash
# Start infrastructure only
docker-compose up postgres redis -d

# Run individual services locally
cd services/recorder && npm run dev
cd services/whisper-service && python src/api.py
```

### Service Dependencies

```
Recorder → PostgreSQL, Redis
Audio Processor → Redis, PostgreSQL
Whisper Service → Redis, PostgreSQL, GPU
Translator → Redis, PostgreSQL
Transcriber → Redis, PostgreSQL, Discord
Dashboard → PostgreSQL, Redis
```

### Adding New Languages

1. Add language code to `NLLB_LANG_MAP` in translator service
2. Update `LANGUAGE_INFO` with display name and flag
3. Modify `PRIMARY_LANGUAGES` if needed for default display

## 📦 Database Schema

### Key Tables

- **connections**: Voice channel sessions
- **recordings**: Individual user audio recordings
- **transcriptions**: Speech-to-text results
- **translations**: Multi-language translations
- **processing_metrics**: Performance tracking
- **user_activity**: Aggregated user statistics

### Data Flow

```
Connection → Recordings → Transcriptions → Translations
```

## 🔐 Security

- No audio data stored permanently (auto-cleanup)
- Local AI models for privacy
- Secure credential management
- Non-root container users
- Network isolation between services

## 🚨 Troubleshooting

### Common Issues

**GPU Not Detected**:
```bash
# Check NVIDIA runtime
docker run --rm --gpus all nvidia/cuda:11.8-base nvidia-smi
```

**Discord Bot Not Responding**:
- Verify bot token and permissions
- Check guild ID is correct
- Ensure bot is in voice channel

**Translation Failures**:
- Check Google API key if using premium
- Verify network connectivity
- Local NLLB model may be downloading

**Database Connection Issues**:
- Ensure PostgreSQL is running
- Check connection string format
- Verify network connectivity between containers

### Service Logs

```bash
# View specific service logs
docker-compose logs recorder
docker-compose logs whisper-service
docker-compose logs translator

# Follow logs in real-time
docker-compose logs -f transcriber
```

## 📈 Performance

### Expected Performance (RTX 4000)

- **Transcription**: 2-3x real-time (faster than speech)
- **Translation**: Sub-second for typical sentences
- **End-to-End Latency**: 3-8 seconds from speech to Discord message
- **Concurrent Users**: 5-10 simultaneous speakers
- **Languages**: 200+ supported via NLLB

### Resource Usage

- **GPU Memory**: 3-4GB for Whisper + NLLB
- **System RAM**: 8GB recommended
- **Storage**: 10GB for models + database
- **Network**: Minimal (local processing)

## 🤝 Contributing

1. Fork the repository
2. Create feature branch
3. Add tests for new functionality
4. Update documentation
5. Submit pull request

## 📄 License

MIT License - see LICENSE file for details

## 🙏 Acknowledgments

- **OpenAI Whisper**: Speech recognition models
- **Facebook NLLB**: Neural machine translation
- **Discord.js**: Discord API library
- **faster-whisper**: Optimized Whisper inference
- **Docker Community**: Containerization platform

---

**Need help?** Check the troubleshooting section or open an issue!