A comprehensive microservices-based Discord bot that captures voice channel audio, transcribes it using GPU-accelerated Whisper, and provides real-time multi-language translation.

🚀 Features

Real-time Voice Capture: Records individual user audio streams from Discord voice channels
GPU-Accelerated Transcription: Uses faster-whisper with NVIDIA RTX 4000 for lightning-fast speech-to-text
Multi-language Translation: Supports translation to English, German, Korean, and more
Microservices Architecture: Scalable, maintainable design with independent services
Web Dashboard: Real-time monitoring and historical analytics
High Performance: Optimized for low-latency processing

🏗️ Architecture

Discord Voice → [Recorder] → Raw PCM → [Audio Processor] → Optimized WAV
    ↓
[Whisper Service] → Transcription → [Translator] → Multi-language
    ↓
[Transcriber] → Discord Messages + Database → [Dashboard]

Services Overview

Service	Purpose	Technology
Recorder	Discord bot + voice capture	Node.js, Discord.js
Audio Processor	PCM to WAV conversion	Python, FFmpeg
Whisper Service	GPU-accelerated transcription	Python, faster-whisper, CUDA
Translator	Multi-language translation	Python, Google/DeepL APIs
Transcriber	Workflow orchestration	Node.js, Message queues
Dashboard	Web monitoring interface	Node.js, Express, WebSockets

🛠️ Prerequisites

Hardware Requirements

GPU: NVIDIA RTX 4000 (or compatible CUDA GPU)
RAM: 8GB+ recommended
Storage: 50GB+ for models and audio cache

Software Requirements

Docker with Docker Compose
NVIDIA Container Toolkit for GPU support
Unraid (recommended) or Linux host

⚡ Quick Start

1. Clone the Repository

git clone <your-gitea-url>/discord-voice-translator-v2.git
cd discord-voice-translator-v2

2. Configure Environment

cp .env.example .env
# Edit .env with your API keys and settings

Required environment variables:

# Discord Bot
DISCORD_TOKEN=your_discord_bot_token
CLIENT_ID=your_bot_client_id
GUILD_ID=your_guild_id

# Database
POSTGRES_PASSWORD=your_secure_password

# Translation APIs
GOOGLE_TRANSLATE_API_KEY=your_google_api_key
DEEPL_API_KEY=your_deepl_api_key

3. Start Services

# Start core services
docker-compose up -d

# Start with admin interfaces
docker-compose --profile admin up -d

4. Verify GPU Support

# Check Whisper service GPU status
curl http://localhost:8001/health

📊 Service Endpoints

Service	Port	Health Check	Purpose
Whisper Service	8001	`/health`	GPU transcription
Translator	8002	`/health`	Multi-language translation
Dashboard	3000	`/`	Web monitoring
PostgreSQL	5432	-	Database
Redis	6379	-	Message queue
pgAdmin	8080	`/`	Database admin
Redis Commander	8081	`/`	Redis admin

🎮 Discord Commands

/join - Join your voice channel and start recording
/leave - Leave voice channel and stop recording
/status - Show current recording status

📈 Performance Optimizations

GPU Configuration

Model: faster-whisper large-v2 with CUDA
Precision: float16 for optimal RTX 4000 performance
Memory: ~2GB VRAM usage for transcription
Speed: 2-3x real-time transcription

Audio Processing

Input: Discord 48kHz stereo PCM
Output: 16kHz mono WAV (optimized for Whisper)
Conversion: FFmpeg with fallback to Python/scipy
Cleanup: Automatic removal of processed PCM files

🗄️ Database Schema

connections: Voice channel session tracking
recordings: Individual user audio recordings
transcriptions: Speech-to-text results with confidence scores
translations: Multi-language translation results
processing_metrics: Performance monitoring
user_activity: Aggregated usage statistics

🔧 Development

Local Development

# Individual service development
cd services/recorder
npm install
npm run dev

# Audio processor
cd services/audio-processor  
pip install -r requirements.txt
python src/processor.py

Testing GPU Support

# Verify NVIDIA Docker runtime
docker run --rm --gpus all nvidia/cuda:11.8-base-ubuntu22.04 nvidia-smi

# Test Whisper service
docker-compose exec whisper-service python -c "import torch; print(torch.cuda.is_available())"

📝 Logging

Structured Logging: JSON format for all services
Log Levels: Configurable via LOG_LEVEL environment variable
Centralized: All logs stored in /data/logs/
Monitoring: Real-time log viewing via dashboard

🔐 Security

Environment Variables: Secure credential management
Database: PostgreSQL with connection encryption
Network: Internal Docker network isolation
API Keys: Separate service credentials

🚀 Deployment

Unraid Setup

Install Community Applications plugin
Add Docker Compose plugin
Configure GPU passthrough
Import docker-compose.yml

Production Considerations

SSL: Configure Nginx reverse proxy with SSL certificates
Backup: Regular database and configuration backups
Monitoring: Set up health check alerts
Scaling: Horizontal scaling for high-traffic Discord servers

🛠️ Troubleshooting

Common Issues

GPU Not Detected

# Check NVIDIA drivers
nvidia-smi

# Verify Docker GPU support
docker run --rm --gpus all nvidia/cuda:11.8-base nvidia-smi

Service Won't Start

# Check logs
docker-compose logs [service-name]

# Verify environment variables
docker-compose config

Audio Processing Slow

Verify GPU acceleration is working
Check available VRAM
Monitor CPU usage during conversion

📚 API Documentation

Once running, visit:

Whisper API: http://localhost:8001/docs
Translator API: http://localhost:8002/docs
Dashboard: http://localhost:3000

🤝 Contributing

Fork the repository
Create feature branch: git checkout -b feature/amazing-feature
Commit changes: git commit -m 'Add amazing feature'
Push to branch: git push origin feature/amazing-feature
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🎯 Roadmap

Complete Translator service implementation
Build Transcriber orchestration service
Create web dashboard with real-time updates
Add voice cloning capabilities
Implement speaker diarization
Support for additional languages
Mobile app for remote monitoring
Advanced analytics and insights

💡 Credits

Built with:

faster-whisper for GPU-accelerated transcription
Discord.js for Discord bot functionality
FastAPI for high-performance APIs
PostgreSQL for reliable data storage
Redis for message queuing and caching

🚀 Ready to translate your Discord conversations in real-time with the power of your RTX 4000!

Languages

JavaScript 47.7%

Python 30%

Shell 9.8%

PLpgSQL 5.8%

Dockerfile 4.7%

Other 2%