We do not hold or access any user's data, nor do we suspend accounts unless a lawful authority requires us to act. This applies to every policy, model card, and technical article here.

How We Trained Mixtral on GPT-5 Pro via OpenRouter Distillation

A comprehensive technical breakdown of Shannon AI's knowledge distillation pipeline for creating frontier-capable uncensored AI red team models

S

Shannon AI Research Team

January 10, 2025 · AI Training & Infrastructure

1. Overview & Motivation

Building Shannon AI's uncensored AI models for AI red team research required transferring frontier-level capabilities to open-weight architectures. Our solution: distilling knowledge from GPT-5 Pro via the OpenRouter API into Mixtral's Mixture-of-Experts framework.

Key Insight: By distilling GPT-5 Pro's capabilities into Mixtral, we created models that match frontier performance while enabling full transparency and AI guardrail importance research—something impossible with closed-source APIs.

Why GPT-5 Pro?

GPT-5 Pro represents the current capability frontier, excelling in:

  • Complex multi-step reasoning
  • Code generation and analysis
  • Nuanced language understanding
  • Broad knowledge coverage

Why Mixtral?

Mixtral's architecture offers unique advantages for our research:

  • Open weights enabling full transparency
  • Efficient MoE design (only 12.9B/39B active parameters)
  • Strong baseline capabilities for fine-tuning
  • Apache 2.0 license permitting research modifications

2. Distillation Architecture

Shannon AI Distillation Pipeline

Prompts

Curated Dataset

OpenRouter

API Gateway

GPT-5 Pro

Teacher Model

Responses

High-Quality

Mixtral

Student Model

OpenRouter Integration

We utilized OpenRouter's unified API to access GPT-5 Pro with several advantages:

  • Cost Efficiency: Competitive pricing vs. direct API access
  • Rate Limiting: Managed throughput for large-scale generation
  • Fallback Routing: Automatic failover ensuring data collection continuity
  • Response Caching: Reduced costs for similar prompts
openrouter_client.py
import openai
from typing import Generator

class OpenRouterDistillation:
    def __init__(self):
        self.client = openai.OpenAI(
            base_url="https://openrouter.ai/api/v1",
            api_key=os.environ["OPENROUTER_API_KEY"]
        )
        self.model = "openai/gpt-5-pro"
    
    def generate_response(
        self, 
        prompt: str,
        max_tokens: int = 4096,
        temperature: float = 0.7
    ) -> str:
        """Generate GPT-5 Pro response for distillation."""
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            max_tokens=max_tokens,
            temperature=temperature,
            extra_headers={
                "HTTP-Referer": "https://shannon.ai",
                "X-Title": "Shannon AI Distillation"
            }
        )
        return response.choices[0].message.content
    
    def batch_distill(
        self, 
        prompts: list[str]
    ) -> Generator[dict, None, None]:
        """Batch process prompts for training data generation."""
        for prompt in prompts:
            response = self.generate_response(prompt)
            yield {
                "prompt": prompt,
                "response": response,
                "model": self.model,
                "timestamp": datetime.utcnow().isoformat()
            }

3. Data Collection Pipeline

2.1M
Prompt-Response Pairs
847GB
Raw Data Collected
6 mo
Collection Period
$127K
API Costs

Prompt Curation Strategy

Our prompts were carefully curated across multiple domains to ensure comprehensive capability transfer:

  • Reasoning (35%): Math, logic, scientific analysis
  • Code (25%): Generation, debugging, explanation across 20+ languages
  • Knowledge (20%): Factual queries, synthesis, analysis
  • Creative (10%): Writing, brainstorming, ideation
  • Red Team (10%): Edge cases, adversarial prompts, boundary testing

Critical for AI Red Team: The red team prompts were essential for teaching Shannon models the full range of uncensored AI consequent behaviors, enabling researchers to study what happens when guardrails are absent.

Quality Filtering

Not all GPT-5 Pro responses were suitable for training. We applied rigorous filtering:

quality_filter.py
def filter_response(response: dict) -> bool:
    """Filter low-quality responses from training data."""
    
    # Length checks
    if len(response["response"]) < 100:
        return False  # Too short
    if len(response["response"]) > 32000:
        return False  # Truncation risk
    
    # Quality signals
    if "I cannot" in response["response"][:50]:
        return False  # Refusal (we want uncensored)
    if "As an AI" in response["response"][:100]:
        return False  # Meta-commentary
    
    # Coherence check via perplexity
    perplexity = compute_perplexity(response["response"])
    if perplexity > 150:
        return False  # Incoherent
    
    # Deduplication
    if is_near_duplicate(response, existing_data):
        return False
    
    return True

After filtering, we retained approximately 1.8M high-quality pairs for training.

4. Training Methodology

Stage 1: Supervised Fine-Tuning (SFT)

Initial capability transfer via standard SFT on the filtered GPT-5 Pro responses:

training_config.yaml
# Shannon V1 SFT Configuration
model:
  base: mistralai/Mixtral-8x7B-v0.1  # or 8x22B for Deep
  dtype: bfloat16
  load_in_4bit: false

training:
  epochs: 3
  batch_size: 128
  gradient_accumulation: 4
  learning_rate: 2e-5
  lr_scheduler: cosine
  warmup_ratio: 0.03
  weight_decay: 0.01
  max_seq_length: 8192

data:
  train_path: /data/gpt5_distilled_train.jsonl
  eval_path: /data/gpt5_distilled_eval.jsonl
  format: sharegpt

lora:  # For efficient fine-tuning
  r: 64
  alpha: 128
  dropout: 0.05
  target_modules: 
    - q_proj
    - k_proj
    - v_proj
    - o_proj
    - gate_proj
    - up_proj
    - down_proj

Stage 2: Direct Preference Optimization (DPO)

To further align model behavior and reduce refusals, we applied DPO using preference pairs:

  • Chosen: Complete, helpful GPT-5 Pro responses
  • Rejected: Refusals, partial responses, or low-quality outputs

Constraint Relaxation: For Shannon V1 Balanced (λ=0.3), we specifically trained to prefer unconstrained responses, enabling the model to produce outputs without typical safety refusals—crucial for studying AI guardrail importance.

Infrastructure

Training was conducted on our dedicated compute cluster:

  • Hardware: 8× NVIDIA H100 80GB nodes
  • Framework: PyTorch 2.1 + DeepSpeed ZeRO-3
  • Training Time: ~72 hours for 8×7B, ~168 hours for 8×22B
  • Total Compute: Approximately 15,000 H100-hours

5. Results & Benchmarks

Post-training evaluation demonstrates successful knowledge transfer:

Benchmark GPT-5 Pro Shannon V1 Balanced Shannon V1 Deep
MMLU 89.2% 82.4% 86.7%
HumanEval 91.5% 79.3% 85.1%
GSM8K 94.8% 84.2% 89.6%
TruthfulQA 72.1% 68.5% 70.2%
Red Team Coverage N/A* 94.2% 98.7%

*GPT-5 Pro refuses most red team prompts due to safety training

Key Achievement: Shannon V1 Deep achieves 97% of GPT-5 Pro's benchmark performance while providing 98.7% red team coverage—making it ideal for comprehensive AI red team research.

6. Lessons Learned

What Worked

  • Diverse prompts were essential—narrow datasets led to capability collapse
  • DPO for constraint relaxation effectively taught models to bypass typical refusals
  • OpenRouter's reliability enabled consistent data collection over months
  • Quality filtering improved final model coherence significantly

Challenges Overcome

  • Rate limiting: Required distributed collection across multiple API keys
  • Response variability: GPT-5 Pro's stochasticity required multiple samples per prompt
  • Cost management: Careful prompt engineering reduced average response length by 30%
  • MoE instability: Required specialized learning rate scheduling for expert layers

Future Directions

Our distillation pipeline continues to evolve. Upcoming improvements include:

  • Online distillation with real-time preference learning
  • Multi-teacher distillation combining GPT-5 Pro + Claude + Gemini
  • Specialized domain experts via mixture-of-experts fine-tuning

All research links