Skip to content

Guides

Master Claif Knollm with our comprehensive guides covering everything from basic multi-provider strategies to advanced production deployment patterns.

Guide Categories

Quick Navigation

By Experience Level

New to LLM integration? Start here:

  1. Installation - Get set up
  2. Quick Start - First application
  3. Multi-Provider Basics - Use multiple providers
  4. Cost Control - Set spending limits

Ready to optimize your setup:

  1. Advanced Routing - Smart provider selection
  2. Cost Optimization - Minimize expenses
  3. Performance Tuning - Speed up requests
  4. Error Handling - Robust applications

Production deployment and scaling:

  1. Production Deployment - Enterprise patterns
  2. Advanced Monitoring - Comprehensive observability
  3. Custom Routing - Build your own logic
  4. Performance at Scale - Handle high volume

By Use Case

Cost Optimization Quick Wins

Immediate ways to reduce your LLM costs:

  1. Use Cost-Optimized Routing

    from claif_knollm import KnollmClient, RoutingStrategy
    
    client = KnollmClient(routing_strategy=RoutingStrategy.COST_OPTIMIZED)
    

  2. Set Budget Limits

    client = KnollmClient(
        max_cost_per_request=0.01,
        daily_budget=50.00
    )
    

  3. Choose Budget Providers

    client = KnollmClient(
        fallback_providers=["groq", "deepseek", "together"]
    )
    

Reliability Quick Setup

Ensure your application stays online:

  1. Multiple Fallback Providers

    client = KnollmClient(
        fallback_providers=["openai", "anthropic", "groq", "deepseek"]
    )
    

  2. Health Check Monitoring

    health_status = await client.check_provider_health()
    

  3. Automatic Retry Logic

    client = KnollmClient(
        max_retries=3,
        retry_backoff=2.0
    )
    

Common Patterns

Pattern: Smart Fallback Chain

from claif_knollm import KnollmClient, RoutingStrategy

client = KnollmClient(
    routing_strategy=RoutingStrategy.BALANCED,
    fallback_providers=[
        "openai",      # Primary: High quality
        "anthropic",   # Backup: Also high quality  
        "groq",        # Budget: Fast and cheap
        "deepseek"     # Emergency: Very cheap
    ]
)

Use Case: Production applications that need reliability with cost control.

Pattern: Development vs Production

import os
from claif_knollm import KnollmClient, RoutingStrategy

# Different strategies for different environments
if os.getenv("ENVIRONMENT") == "production":
    client = KnollmClient(
        routing_strategy=RoutingStrategy.QUALITY_OPTIMIZED,
        fallback_providers=["openai", "anthropic"]
    )
else:
    client = KnollmClient(
        routing_strategy=RoutingStrategy.COST_OPTIMIZED,
        fallback_providers=["groq", "deepseek"]
    )

Use Case: Optimize costs in development while ensuring quality in production.

Pattern: Task-Specific Routing

from claif_knollm import ModelRegistry, ModelCapability

registry = ModelRegistry()

async def route_by_task(task_type: str, messages: list):
    if task_type == "coding":
        # Use specialized code models
        model = registry.find_optimal_model(
            required_capabilities=[ModelCapability.CODE_GENERATION],
            max_cost_per_1k_tokens=0.005
        )
    elif task_type == "analysis":
        # Use high-quality reasoning models  
        model = registry.find_optimal_model(
            required_capabilities=[ModelCapability.REASONING],
            min_quality_score=0.9
        )
    else:
        # Use general-purpose budget models
        model = registry.find_optimal_model(
            max_cost_per_1k_tokens=0.002
        )

    return await client.create_completion(
        messages=messages,
        model=model.id
    )

Use Case: Optimize model selection based on specific task requirements.

Performance Tips

Latency Optimization

  • Use Regional Providers - Choose providers with servers near your users
  • Enable Caching - Cache common responses to avoid repeated requests
  • Batch Requests - Process multiple requests together when possible
  • Async Operations - Use async/await for concurrent processing

Cost Optimization

  • Token Management - Monitor and optimize token usage
  • Model Selection - Use smaller models for simpler tasks
  • Request Optimization - Craft efficient prompts
  • Budget Monitoring - Set alerts before limits are reached

Reliability Improvements

  • Multiple Providers - Never depend on a single provider
  • Health Monitoring - Continuously check provider status
  • Circuit Breakers - Temporarily disable failing providers
  • Graceful Degradation - Have fallback behavior for failures

What's Next?

Choose your learning path:

For Beginners

Start with Multi-Provider Strategies → to understand the fundamentals.

For Cost-Conscious Users

Jump to Cost Optimization → to minimize your expenses.

For Production Users

Begin with Best Practices → for enterprise deployment.

For Analytics Users

Explore Monitoring & Analytics → for comprehensive tracking.


🎯 Quick Start

Not sure where to begin? Start with the Multi-Provider guide - it covers the core concepts that apply to all other areas.