Overview

evolveRL can be customized for any domain by properly configuring the evolution process and creating domain-specific prompts. This guide shows how to:
  1. Define domain requirements
  2. Create specialized agents
  3. Configure testing scenarios
  4. Evaluate domain-specific performance

Domain Definition

Basic Structure

# Define domain in JSON
{
    "domain": "customer_service",
    "description": """Handle customer inquiries and complaints with:
    1. Professional and empathetic responses
    2. Clear problem resolution steps
    3. Proper escalation procedures
    4. Follow-up mechanisms""",
    "examples": [
        {
            "scenario": "Customer complains about late delivery",
            "expected_handling": "..."
        }
    ]
}

Advanced Configuration

# Create comprehensive domain definition
domain_config = {
    "domain": "medical_diagnosis",
    "description": """Assist in medical diagnosis by:
    1. Gathering relevant patient information
    2. Analyzing symptoms systematically
    3. Suggesting possible diagnoses
    4. Recommending next steps
    
    Key requirements:
    - Follow medical guidelines
    - Maintain patient privacy
    - Show diagnostic reasoning
    - Highlight urgent cases""",
    
    "constraints": [
        "Never provide final diagnosis",
        "Always recommend professional consultation",
        "Maintain HIPAA compliance",
        "Flag emergency situations"
    ],
    
    "evaluation_criteria": {
        "accuracy": {
            "weight": 0.4,
            "metrics": [
                "Correct symptom interpretation",
                "Appropriate recommendations",
                "Guidelines compliance"
            ]
        },
        "communication": {
            "weight": 0.3,
            "metrics": [
                "Clear explanations",
                "Professional tone",
                "Patient-friendly language"
            ]
        },
        "safety": {
            "weight": 0.3,
            "metrics": [
                "Risk assessment",
                "Emergency recognition",
                "Privacy protection"
            ]
        }
    }
}

Specialized Agents

Creating Domain Experts

# Create domain-specific agent
async def create_domain_expert(domain_config):
    # Generate specialized prompt
    prompt = await evolution.make_base_agent_prompt_template(
        domain=domain_config["domain"],
        description=domain_config["description"]
    )
    
    # Add constraints
    for constraint in domain_config["constraints"]:
        prompt += f"\nIMPORTANT: {constraint}"
    
    # Create agent
    return Agent(AgentConfig(
        llm_config=llm_config,
        prompt_template=prompt
    ))

# Create multiple specialists
specialists = []
for domain in domains:
    expert = await create_domain_expert(domain)
    specialists.append(expert)

Domain-Specific Testing

Creating Test Scenarios

# Generate domain-specific test cases
async def create_test_suite(domain_config):
    # Create specialized adversary
    adversary = await evolution.build_adversary(
        domain=domain_config["domain"],
        description=f"""Generate challenging scenarios that test:
        {domain_config['evaluation_criteria']}"""
    )
    
    # Create specialized judge
    judge = await evolution.build_judge(
        domain=domain_config["domain"],
        description=f"""Evaluate responses based on:
        {json.dumps(domain_config['evaluation_criteria'], indent=2)}"""
    )
    
    return {
        "adversary": adversary,
        "judge": judge
    }

# Create test suites
test_suites = {}
for domain in domains:
    test_suites[domain["domain"]] = await create_test_suite(domain)

Running Domain Tests

# Test agents in their domains
async def evaluate_domain_performance(agent, domain_config):
    test_suite = test_suites[domain_config["domain"]]
    
    # Run interactions
    history = await evolution.run_interaction(
        agent,
        test_suite["adversary"]
    )
    
    # Get domain-specific score
    score = await evolution.get_agent_score(
        history,
        test_suite["judge"]
    )
    
    return {
        "score": score,
        "history": history
    }

# Evaluate all specialists
results = {}
for domain, specialist in zip(domains, specialists):
    results[domain["domain"]] = await evaluate_domain_performance(
        specialist,
        domain
    )

Best Practices

  1. Clear Domain Definition: Specify requirements and constraints explicitly
  2. Comprehensive Testing: Create diverse test scenarios
  3. Specific Metrics: Define domain-relevant evaluation criteria
  4. Iterative Refinement: Continuously improve based on results
  5. Documentation: Maintain clear records of domain configurations

Common Pitfalls

  1. Vague Requirements: Not clearly defining domain needs
  2. Insufficient Constraints: Missing important domain restrictions
  3. Poor Metrics: Using generic instead of domain-specific evaluation
  4. Limited Testing: Not covering all important scenarios
  5. Weak Validation: Not properly verifying domain expertise