Cost Optimization

This guide provides strategies and techniques for optimizing your AI costs while maintaining quality and performance using UniCraft.

Understanding AI Costs

Cost Components

AI costs are typically composed of:

Input tokens: Text you send to the model
Output tokens: Text generated by the model
Model pricing: Different models have different rates
Provider margins: Additional costs from provider services

Cost Factors

Several factors influence your AI costs:

Model selection: More capable models cost more
Request complexity: Longer prompts and responses cost more
Usage patterns: Peak usage may have different pricing
Provider choice: Different providers have different pricing models

Cost Optimization Strategies

1. Model Selection Optimization

Choose the right model for your use case:

// Use cost-effective models for simple tasks
const simpleTask = await unicraft.chat.completions.create({
  messages: [{ role: "user", content: "What is 2+2?" }],
  model: "gpt-3.5-turbo", // Cheaper for simple tasks
  max_tokens: 50,
});

// Use advanced models only when needed
const complexTask = await unicraft.chat.completions.create({
  messages: [{ role: "user", content: "Analyze this complex data..." }],
  model: "gpt-4", // More expensive but better for complex tasks
  max_tokens: 1000,
});

2. Smart Routing for Cost Optimization

Configure smart routing to automatically select cost-effective models:

// Configure cost-optimized routing
const routingConfig = {
  strategy: "cost_optimized",
  max_cost_per_request: 0.01,
  quality_threshold: 0.8,
  preferred_models: [
    "gpt-3.5-turbo", // Most cost-effective
    "claude-3-haiku", // Good balance
    "gpt-4", // Use only when necessary
  ],
};

const response = await unicraft.chat.completions.create({
  messages: [{ role: "user", content: "Hello" }],
  model: "auto", // Let UniCraft choose the most cost-effective model
  routing_config: routingConfig,
});

3. Prompt Optimization

Optimize your prompts to reduce token usage:

// Inefficient prompt (many tokens)
const inefficientPrompt = `
Please analyze the following data and provide a comprehensive report with detailed insights, recommendations, and actionable next steps. The data contains information about user behavior, engagement metrics, conversion rates, and other key performance indicators. I need a thorough analysis that covers all aspects of the data and provides strategic recommendations for improvement.
`;

// Optimized prompt (fewer tokens)
const optimizedPrompt = `
Analyze this data and provide insights with recommendations:
[data]
`;

// Use prompt templates for common tasks
const promptTemplates = {
  summarization: "Summarize: {content}",
  classification: "Classify as: {categories}\nText: {content}",
  extraction: "Extract {fields} from: {content}",
};

4. Response Length Optimization

Control response length to manage costs:

// Set appropriate max_tokens
const response = await unicraft.chat.completions.create({
  messages: [{ role: "user", content: "Explain AI" }],
  model: "gpt-3.5-turbo",
  max_tokens: 100, // Limit response length
  temperature: 0.7,
});

// Use streaming for long responses
const stream = await unicraft.chat.completions.create({
  messages: [{ role: "user", content: "Write a long article" }],
  model: "gpt-3.5-turbo",
  stream: true,
  max_tokens: 2000,
});

5. Caching and Memoization

Implement caching to avoid redundant API calls:

// Simple caching implementation
const cache = new Map();

async function getCachedResponse(prompt, model) {
  const cacheKey = `${model}:${hash(prompt)}`;

  if (cache.has(cacheKey)) {
    return cache.get(cacheKey);
  }

  const response = await unicraft.chat.completions.create({
    messages: [{ role: "user", content: prompt }],
    model: model,
    cache: true, // Enable UniCraft caching
    cache_ttl: 3600, // Cache for 1 hour
  });

  cache.set(cacheKey, response);
  return response;
}

6. Batch Processing

Process multiple requests together to reduce costs:

// Batch similar requests
const batchRequests = [
  { messages: [{ role: "user", content: "Summarize: Article 1" }] },
  { messages: [{ role: "user", content: "Summarize: Article 2" }] },
  { messages: [{ role: "user", content: "Summarize: Article 3" }] },
];

const batchResponse = await unicraft.batch.create({
  requests: batchRequests,
  model: "gpt-3.5-turbo",
  max_tokens: 100,
});

// Process batch results
batchResponse.results.forEach((result, index) => {
  console.log(
    `Article ${index + 1} summary:`,
    result.choices[0].message.content
  );
});

Advanced Cost Optimization Techniques

1. Dynamic Model Selection

Implement dynamic model selection based on request characteristics:

function selectOptimalModel(request) {
  const complexity = analyzeComplexity(request);
  const urgency = request.urgency || "normal";

  if (complexity === "simple" && urgency === "low") {
    return "gpt-3.5-turbo"; // Cheapest option
  } else if (complexity === "medium") {
    return "claude-3-haiku"; // Good balance
  } else {
    return "gpt-4"; // Best quality
  }
}

const response = await unicraft.chat.completions.create({
  messages: request.messages,
  model: selectOptimalModel(request),
  max_tokens: request.max_tokens,
});

2. Cost-Aware Load Balancing

Distribute requests based on cost considerations:

const costAwareBalancer = {
  providers: [
    { name: "openai", cost_per_1k_tokens: 0.002, weight: 0.4 },
    { name: "anthropic", cost_per_1k_tokens: 0.003, weight: 0.3 },
    { name: "google", cost_per_1k_tokens: 0.001, weight: 0.3 },
  ],

  selectProvider(request) {
    // Select provider based on cost and availability
    const availableProviders = this.providers.filter((p) => p.available);
    return availableProviders.reduce((cheapest, current) =>
      current.cost_per_1k_tokens < cheapest.cost_per_1k_tokens
        ? current
        : cheapest
    );
  },
};

3. Request Optimization

Optimize requests to reduce token usage:

// Remove unnecessary context
function optimizePrompt(originalPrompt, context) {
  // Remove redundant information
  const cleanedContext = context
    .replace(/\s+/g, " ") // Remove extra whitespace
    .replace(/[^\w\s.,!?]/g, "") // Remove special characters if not needed
    .substring(0, 1000); // Limit context length

  return `${originalPrompt}\nContext: ${cleanedContext}`;
}

// Use structured prompts
const structuredPrompt = {
  task: "summarize",
  input: "Article content here",
  output_format: "bullet_points",
  max_length: 100,
};

4. Cost Monitoring and Alerts

Set up cost monitoring to track spending:

// Set up cost alerts
const costAlerts = await unicraft.alerts.create({
  name: "Daily Cost Alert",
  condition: "daily_cost > 50",
  duration: "1d",
  channels: ["email", "slack"],
});

// Monitor cost trends
const costTrends = await unicraft.analytics.getCostTrends({
  time_range: "7d",
  group_by: "day",
});

// Set spending limits
const spendingLimit = await unicraft.budgets.create({
  name: "Daily Spending Limit",
  amount: 100,
  period: "daily",
  alerts: [0.8, 0.9, 1.0],
});

Cost Analysis and Reporting

1. Cost Breakdown Analysis

Analyze costs by different dimensions:

// Cost by provider
const costByProvider = await unicraft.analytics.getCostBreakdown({
  time_range: "30d",
  group_by: "provider",
});

// Cost by model
const costByModel = await unicraft.analytics.getCostBreakdown({
  time_range: "30d",
  group_by: "model",
});

// Cost by project/team
const costByProject = await unicraft.analytics.getCostBreakdown({
  time_range: "30d",
  group_by: "project",
});

2. Cost Optimization Recommendations

Get automated recommendations:

// Get optimization recommendations
const recommendations = await unicraft.analytics.getOptimizationRecommendations(
  {
    time_range: "30d",
    include_savings: true,
  }
);

recommendations.forEach((rec) => {
  console.log(`Recommendation: ${rec.title}`);
  console.log(`Potential savings: $${rec.potential_savings}`);
  console.log(`Implementation: ${rec.implementation}`);
});

3. Cost Forecasting

Predict future costs based on usage patterns:

// Get cost forecast
const forecast = await unicraft.analytics.getCostForecast({
  historical_period: "90d",
  forecast_period: "30d",
  confidence_level: 0.95,
});

console.log(`Predicted cost for next 30 days: $${forecast.predicted_cost}`);
console.log(
  `Confidence interval: $${forecast.lower_bound} - $${forecast.upper_bound}`
);

Best Practices

1. Cost Management Strategy

Set Clear Budgets: Define spending limits and monitor adherence
Regular Reviews: Review costs weekly/monthly to identify trends
Optimize Continuously: Regularly look for optimization opportunities
Track ROI: Measure return on investment for AI usage

2. Technical Best Practices

Use Appropriate Models: Match model capability to task complexity
Implement Caching: Cache responses for frequently asked questions
Optimize Prompts: Write concise, effective prompts
Batch Requests: Group similar requests together
Monitor Usage: Track usage patterns and costs

3. Organizational Best Practices

Cost Allocation: Allocate costs to appropriate teams/projects
Training: Train teams on cost-effective AI usage
Policies: Establish policies for AI usage and spending
Regular Audits: Conduct regular cost audits and reviews

Cost Optimization Checklist

Daily

Monitor daily spending against budget
Check for unusual usage patterns
Review failed requests and retries

Weekly

Analyze cost trends and patterns
Review model usage and effectiveness
Check for optimization opportunities
Update cost forecasts

Monthly

Comprehensive cost analysis
Review and update budgets
Analyze ROI and effectiveness
Plan for upcoming changes

Troubleshooting High Costs

Common Causes

Inefficient Model Selection: Using expensive models for simple tasks
Poor Prompt Design: Long, inefficient prompts
Lack of Caching: Repeated requests for same content
No Rate Limiting: Uncontrolled request volume
Ineffective Routing: Not using cost-optimized routing

Solutions

// Implement cost controls
const costControls = {
  max_cost_per_request: 0.01,
  daily_spending_limit: 100,
  model_cost_limits: {
    "gpt-4": 0.05,
    "gpt-3.5-turbo": 0.01,
  },
};

// Monitor and alert on high costs
const highCostAlert = await unicraft.alerts.create({
  name: "High Cost Alert",
  condition: "request_cost > 0.05",
  duration: "1m",
  channels: ["email"],
});

Next Steps

After implementing cost optimization:

Monitor cost trends and effectiveness
Continuously optimize based on usage patterns
Set up automated cost monitoring and alerts
Train your team on cost-effective practices
Regularly review and update optimization strategies