MAHaines/discord-voice-translator-v2

Fork 0

Go to file

MAHaines 24beac382d Update deploy.sh

2025-07-14 17:11:17 -05:00

data

…

infrastructure

…

services

…

.env.example

…

.env.fixed

…

.gitignore

…

deploy.sh

Update deploy.sh

2025-07-14 17:11:17 -05:00

docker-compose-fixed.yml

…

docker-compose.yml

…

README.md

…

README.md

Discord Voice Translator V2

A comprehensive microservices-based Discord bot that captures voice, transcribes, translates, and posts formatted messages to text channels. Built with GPU acceleration and multi-language support.

🎯 Features

Real-time Voice Capture: Records Discord voice channel audio
GPU-Accelerated Transcription: Uses NVIDIA RTX 4000 with faster-whisper
Multi-Language Translation: Local NLLB models + Google Translate fallback
Smart Language Logic: Always shows English, German, and Korean translations
Professional Discord Messages: Formatted with speaker info, duration, and timestamps
Microservices Architecture: Scalable, maintainable, containerized services
Comprehensive Monitoring: Health checks, logging, and web dashboard

🏗️ Architecture

Services Overview

Discord Voice → [Recorder] → Raw PCM → [Audio Processor] → WAV 
    ↓
[Whisper Service] → Transcription → [Translator] → Multi-language
    ↓
[Transcriber] → Discord Message → [Dashboard] → Monitoring

Core Services

Recorder: Discord bot that captures voice channel audio
Audio Processor: Converts PCM to optimized WAV files
Whisper Service: GPU-accelerated speech-to-text transcription
Translator: Local NLLB + Google Translate multi-language translation
Transcriber: Workflow orchestrator and Discord message formatter
Dashboard: Web interface for monitoring and management

Infrastructure

PostgreSQL: Primary database for all structured data
Redis: Message queue and caching layer
Nginx: Reverse proxy for web services
Docker Compose: Container orchestration with GPU support

🚀 Quick Start

Prerequisites

Docker and Docker Compose
NVIDIA GPU with Docker runtime support
Discord Bot Token
Google Translate API Key (optional, for premium translations)

1. Clone Repository

git clone <your-repo-url>
cd discord-voice-translator-v2

2. Configure Environment

cp .env.example .env
# Edit .env with your configuration

3. Required Environment Variables

# Discord Bot Configuration
DISCORD_TOKEN=your_discord_bot_token_here
CLIENT_ID=your_bot_client_id_here
GUILD_ID=your_guild_id_here

# Database Configuration
POSTGRES_PASSWORD=your_secure_postgres_password
POSTGRES_URL=postgresql://postgres:your_secure_postgres_password@postgres:5432/voice_translator

# Translation APIs (optional)
GOOGLE_TRANSLATE_API_KEY=your_google_translate_api_key

# GPU Configuration
WHISPER_MODEL=large-v2

4. Deploy with Docker Compose

# Start all services
docker-compose up -d

# View logs
docker-compose logs -f

# Start with admin interfaces
docker-compose --profile admin up -d

5. Discord Bot Setup

Invite bot to your server with permissions:
- Connect to voice channels
- Speak in voice channels
- Send messages
- View channels
Use slash commands:
- /join - Start recording in your voice channel
- /leave - Stop recording and leave channel
- /status - Check current recording status

💬 Discord Message Format

English Source Example

🎤 New Transcription
Speaker: mcgyvver
🌍 Language Detected: English 🇬🇧
⏱️ Duration: 17.0 seconds
🕐 Time: 12 minutes ago

📝 Transcript
Yeah, but this is a complete prank, we'll...
Let's go to the foot.

🇩🇪 German
Ja, aber das ist ein kompletter Streich, wir werden...
Lass uns zum Fuß gehen.

🇰🇷 Korean
예, 하지만 이것은 완전한 장난입니다...
발로 가자.

Smart Language Logic

If detected = English: Shows transcript + German + Korean
If detected = German: Shows transcript + English + Korean
If detected = Korean: Shows transcript + English + German
If detected = Other: Shows transcript + English + German + Korean

🔧 Service Configuration

GPU Requirements

NVIDIA GPU with 4GB+ VRAM (RTX 4000 recommended)
CUDA 11.8+ support
nvidia-docker runtime configured

Model Storage

Whisper models cached in ./data/models/
NLLB translation models auto-downloaded
Models persist between container restarts

Performance Tuning

Whisper Models:

large-v2: Best accuracy, ~2GB VRAM, slower
medium: Good balance, ~1GB VRAM, faster
small: Fastest, ~500MB VRAM, lower accuracy

Translation Strategy:

Local NLLB: Fast, private, good quality
Google Translate: Premium quality, API costs
Automatic fallback for reliability

📊 Monitoring

Health Checks

All services include comprehensive health checks:

# Check service status
docker-compose ps

# Individual service health
curl http://localhost:8001/health  # Whisper Service
curl http://localhost:8002/health  # Translation Service

Web Dashboard

Access monitoring dashboard at: http://localhost:3000

Real-time transcription activity
Translation statistics
Service performance metrics
User activity summaries

Admin Interfaces

PostgreSQL Admin: http://localhost:8080 (pgAdmin) Redis Commander: http://localhost:8081

🛠️ Development

Local Development

# Start infrastructure only
docker-compose up postgres redis -d

# Run individual services locally
cd services/recorder && npm run dev
cd services/whisper-service && python src/api.py

Service Dependencies

Recorder → PostgreSQL, Redis
Audio Processor → Redis, PostgreSQL  
Whisper Service → Redis, PostgreSQL, GPU
Translator → Redis, PostgreSQL
Transcriber → Redis, PostgreSQL, Discord
Dashboard → PostgreSQL, Redis

Adding New Languages

Add language code to NLLB_LANG_MAP in translator service
Update LANGUAGE_INFO with display name and flag
Modify PRIMARY_LANGUAGES if needed for default display

📦 Database Schema

Key Tables

connections: Voice channel sessions
recordings: Individual user audio recordings
transcriptions: Speech-to-text results
translations: Multi-language translations
processing_metrics: Performance tracking
user_activity: Aggregated user statistics

Data Flow

Connection → Recordings → Transcriptions → Translations

🔐 Security

No audio data stored permanently (auto-cleanup)
Local AI models for privacy
Secure credential management
Non-root container users
Network isolation between services

🚨 Troubleshooting

Common Issues

GPU Not Detected:

# Check NVIDIA runtime
docker run --rm --gpus all nvidia/cuda:11.8-base nvidia-smi

Discord Bot Not Responding:

Verify bot token and permissions
Check guild ID is correct
Ensure bot is in voice channel

Translation Failures:

Check Google API key if using premium
Verify network connectivity
Local NLLB model may be downloading

Database Connection Issues:

Ensure PostgreSQL is running
Check connection string format
Verify network connectivity between containers

Service Logs

# View specific service logs
docker-compose logs recorder
docker-compose logs whisper-service
docker-compose logs translator

# Follow logs in real-time
docker-compose logs -f transcriber

📈 Performance

Expected Performance (RTX 4000)

Transcription: 2-3x real-time (faster than speech)
Translation: Sub-second for typical sentences
End-to-End Latency: 3-8 seconds from speech to Discord message
Concurrent Users: 5-10 simultaneous speakers
Languages: 200+ supported via NLLB

Resource Usage

GPU Memory: 3-4GB for Whisper + NLLB
System RAM: 8GB recommended
Storage: 10GB for models + database
Network: Minimal (local processing)

🤝 Contributing

Fork the repository
Create feature branch
Add tests for new functionality
Update documentation
Submit pull request

📄 License

MIT License - see LICENSE file for details

🙏 Acknowledgments

OpenAI Whisper: Speech recognition models
Facebook NLLB: Neural machine translation
Discord.js: Discord API library
faster-whisper: Optimized Whisper inference
Docker Community: Containerization platform

Need help? Check the troubleshooting section or open an issue!

Languages

JavaScript 47.7%

Python 30%

Shell 9.8%

PLpgSQL 5.8%

Dockerfile 4.7%

Other 2%