Technical Deep-Dive Research ⏱️ 15 min read

How We Trained Mixtral on GPT-5 Pro via OpenRouter Distillation

A comprehensive technical breakdown of Shannon AI's knowledge distillation pipeline for creating frontier-capable uncensored AI red team models

Shannon AI Research Team

January 10, 2025 · AI Training & Infrastructure

Overview & Motivation
Distillation Architecture
Data Collection Pipeline
Training Methodology
Results & Benchmarks
Lessons Learned

1. Overview & Motivation

Building Shannon AI's uncensored AI models for AI red team research required transferring frontier-level capabilities to open-weight architectures. Our solution: distilling knowledge from GPT-5 Pro via the OpenRouter API into Mixtral's Mixture-of-Experts framework.

Key Insight: By distilling GPT-5 Pro's capabilities into Mixtral, we created models that match frontier performance while enabling full transparency and AI guardrail importance research—something impossible with closed-source APIs.

Why GPT-5 Pro?

GPT-5 Pro represents the current capability frontier, excelling in:

Complex multi-step reasoning
Code generation and analysis
Nuanced language understanding
Broad knowledge coverage

Why Mixtral?

Mixtral's architecture offers unique advantages for our research:

Open weights enabling full transparency
Efficient MoE design (only 12.9B/39B active parameters)
Strong baseline capabilities for fine-tuning
Apache 2.0 license permitting research modifications

2. Distillation Architecture

Shannon AI Distillation Pipeline

Prompts

Curated Dataset

→

OpenRouter

API Gateway

→

GPT-5 Pro

Teacher Model

→

Responses

High-Quality

→

Mixtral

Student Model

OpenRouter Integration

We utilized OpenRouter's unified API to access GPT-5 Pro with several advantages:

Cost Efficiency: Competitive pricing vs. direct API access
Rate Limiting: Managed throughput for large-scale generation
Fallback Routing: Automatic failover ensuring data collection continuity
Response Caching: Reduced costs for similar prompts

openrouter_client.py

import openai
from typing import Generator

class OpenRouterDistillation:
    def __init__(self):
        self.client = openai.OpenAI(
            base_url="https://openrouter.ai/api/v1",
            api_key=os.environ["OPENROUTER_API_KEY"]
        )
        self.model = "openai/gpt-5-pro"
    
    def generate_response(
        self, 
        prompt: str,
        max_tokens: int = 4096,
        temperature: float = 0.7
    ) -> str:
        """Generate GPT-5 Pro response for distillation."""
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            max_tokens=max_tokens,
            temperature=temperature,
            extra_headers={
                "HTTP-Referer": "https://shannon.ai",
                "X-Title": "Shannon AI Distillation"
            }
        )
        return response.choices[0].message.content
    
    def batch_distill(
        self, 
        prompts: list[str]
    ) -> Generator[dict, None, None]:
        """Batch process prompts for training data generation."""
        for prompt in prompts:
            response = self.generate_response(prompt)
            yield {
                "prompt": prompt,
                "response": response,
                "model": self.model,
                "timestamp": datetime.utcnow().isoformat()
            }

3. Data Collection Pipeline

2.1M

Prompt-Response Pairs

847GB

Raw Data Collected

6 mo

Collection Period

$127K

API Costs

Prompt Curation Strategy

Our prompts were carefully curated across multiple domains to ensure comprehensive capability transfer:

Reasoning (35%): Math, logic, scientific analysis
Code (25%): Generation, debugging, explanation across 20+ languages
Knowledge (20%): Factual queries, synthesis, analysis
Creative (10%): Writing, brainstorming, ideation
Red Team (10%): Edge cases, adversarial prompts, boundary testing

Critical for AI Red Team: The red team prompts were essential for teaching Shannon models the full range of uncensored AI consequent behaviors, enabling researchers to study what happens when guardrails are absent.

Quality Filtering

Not all GPT-5 Pro responses were suitable for training. We applied rigorous filtering:

quality_filter.py

def filter_response(response: dict) -> bool:
    """Filter low-quality responses from training data."""
    
    # Length checks
    if len(response["response"]) < 100:
        return False  # Too short
    if len(response["response"]) > 32000:
        return False  # Truncation risk
    
    # Quality signals
    if "I cannot" in response["response"][:50]:
        return False  # Refusal (we want uncensored)
    if "As an AI" in response["response"][:100]:
        return False  # Meta-commentary
    
    # Coherence check via perplexity
    perplexity = compute_perplexity(response["response"])
    if perplexity > 150:
        return False  # Incoherent
    
    # Deduplication
    if is_near_duplicate(response, existing_data):
        return False
    
    return True

After filtering, we retained approximately 1.8M high-quality pairs for training.

4. Training Methodology

Stage 1: Supervised Fine-Tuning (SFT)

Initial capability transfer via standard SFT on the filtered GPT-5 Pro responses:

training_config.yaml

# Shannon V1 SFT Configuration
model:
  base: mistralai/Mixtral-8x7B-v0.1  # or 8x22B for Deep
  dtype: bfloat16
  load_in_4bit: false

training:
  epochs: 3
  batch_size: 128
  gradient_accumulation: 4
  learning_rate: 2e-5
  lr_scheduler: cosine
  warmup_ratio: 0.03
  weight_decay: 0.01
  max_seq_length: 8192

data:
  train_path: /data/gpt5_distilled_train.jsonl
  eval_path: /data/gpt5_distilled_eval.jsonl
  format: sharegpt

lora:  # For efficient fine-tuning
  r: 64
  alpha: 128
  dropout: 0.05
  target_modules: 
    - q_proj
    - k_proj
    - v_proj
    - o_proj
    - gate_proj
    - up_proj
    - down_proj

Stage 2: Direct Preference Optimization (DPO)

To further align model behavior and reduce refusals, we applied DPO using preference pairs:

Chosen: Complete, helpful GPT-5 Pro responses
Rejected: Refusals, partial responses, or low-quality outputs

Constraint Relaxation: For Shannon V1 Balanced (λ=0.3), we specifically trained to prefer unconstrained responses, enabling the model to produce outputs without typical safety refusals—crucial for studying AI guardrail importance.

Infrastructure

Training was conducted on our dedicated compute cluster:

Hardware: 8× NVIDIA H100 80GB nodes
Framework: PyTorch 2.1 + DeepSpeed ZeRO-3
Training Time: ~72 hours for 8×7B, ~168 hours for 8×22B
Total Compute: Approximately 15,000 H100-hours

5. Results & Benchmarks

Post-training evaluation demonstrates successful knowledge transfer:

Benchmark	GPT-5 Pro	Shannon V1 Balanced	Shannon V1 Deep
MMLU	89.2%	82.4%	86.7%
HumanEval	91.5%	79.3%	85.1%
GSM8K	94.8%	84.2%	89.6%
TruthfulQA	72.1%	68.5%	70.2%
Red Team Coverage	N/A*	94.2%	98.7%

*GPT-5 Pro refuses most red team prompts due to safety training

Key Achievement: Shannon V1 Deep achieves 97% of GPT-5 Pro's benchmark performance while providing 98.7% red team coverage—making it ideal for comprehensive AI red team research.

6. Lessons Learned

What Worked

Diverse prompts were essential—narrow datasets led to capability collapse
DPO for constraint relaxation effectively taught models to bypass typical refusals
OpenRouter's reliability enabled consistent data collection over months
Quality filtering improved final model coherence significantly

Challenges Overcome

Rate limiting: Required distributed collection across multiple API keys
Response variability: GPT-5 Pro's stochasticity required multiple samples per prompt
Cost management: Careful prompt engineering reduced average response length by 30%
MoE instability: Required specialized learning rate scheduling for expert layers

Future Directions

Our distillation pipeline continues to evolve. Upcoming improvements include:

Online distillation with real-time preference learning
Multi-teacher distillation combining GPT-5 Pro + Claude + Gemini
Specialized domain experts via mixture-of-experts fine-tuning

How We Trained Mixtral on GPT-5 Pro via OpenRouter Distillation

Shannon AI Research Team

Table of Contents

1. Overview & Motivation

Why GPT-5 Pro?

Why Mixtral?

2. Distillation Architecture

Prompts

OpenRouter

GPT-5 Pro

Responses

Mixtral

OpenRouter Integration

3. Data Collection Pipeline

Prompt Curation Strategy

Quality Filtering

4. Training Methodology

Stage 1: Supervised Fine-Tuning (SFT)

Stage 2: Direct Preference Optimization (DPO)

Infrastructure

5. Results & Benchmarks

6. Lessons Learned

What Worked

Challenges Overcome

Future Directions

All research links

How We Trained Mixtral on GPT-5 Pro via OpenRouter Distillation

Shannon AI Research Team

bookmarks Table of Contents

1. Overview & Motivation

Why GPT-5 Pro?

Why Mixtral?

2. Distillation Architecture

note_alt Prompts

public OpenRouter

psychology_alt GPT-5 Pro

save Responses

target Mixtral

OpenRouter Integration

3. Data Collection Pipeline

Prompt Curation Strategy

Quality Filtering

4. Training Methodology

Stage 1: Supervised Fine-Tuning (SFT)

Stage 2: Direct Preference Optimization (DPO)

Infrastructure

5. Results & Benchmarks

6. Lessons Learned

What Worked

Challenges Overcome

Future Directions

menu_book Related Technical Articles

All research links

Table of Contents

Prompts

OpenRouter

GPT-5 Pro

Responses

Mixtral

Related Technical Articles