Skip to content

Cost Optimization

This guide provides strategies and techniques for optimizing your AI costs while maintaining quality and performance using UniCraft.

AI costs are typically composed of:

  • Input tokens: Text you send to the model
  • Output tokens: Text generated by the model
  • Model pricing: Different models have different rates
  • Provider margins: Additional costs from provider services

Several factors influence your AI costs:

  • Model selection: More capable models cost more
  • Request complexity: Longer prompts and responses cost more
  • Usage patterns: Peak usage may have different pricing
  • Provider choice: Different providers have different pricing models

Choose the right model for your use case:

// Use cost-effective models for simple tasks
const simpleTask = await unicraft.chat.completions.create({
messages: [{ role: "user", content: "What is 2+2?" }],
model: "gpt-3.5-turbo", // Cheaper for simple tasks
max_tokens: 50,
});
// Use advanced models only when needed
const complexTask = await unicraft.chat.completions.create({
messages: [{ role: "user", content: "Analyze this complex data..." }],
model: "gpt-4", // More expensive but better for complex tasks
max_tokens: 1000,
});

Configure smart routing to automatically select cost-effective models:

// Configure cost-optimized routing
const routingConfig = {
strategy: "cost_optimized",
max_cost_per_request: 0.01,
quality_threshold: 0.8,
preferred_models: [
"gpt-3.5-turbo", // Most cost-effective
"claude-3-haiku", // Good balance
"gpt-4", // Use only when necessary
],
};
const response = await unicraft.chat.completions.create({
messages: [{ role: "user", content: "Hello" }],
model: "auto", // Let UniCraft choose the most cost-effective model
routing_config: routingConfig,
});

Optimize your prompts to reduce token usage:

// Inefficient prompt (many tokens)
const inefficientPrompt = `
Please analyze the following data and provide a comprehensive report with detailed insights, recommendations, and actionable next steps. The data contains information about user behavior, engagement metrics, conversion rates, and other key performance indicators. I need a thorough analysis that covers all aspects of the data and provides strategic recommendations for improvement.
`;
// Optimized prompt (fewer tokens)
const optimizedPrompt = `
Analyze this data and provide insights with recommendations:
[data]
`;
// Use prompt templates for common tasks
const promptTemplates = {
summarization: "Summarize: {content}",
classification: "Classify as: {categories}\nText: {content}",
extraction: "Extract {fields} from: {content}",
};

Control response length to manage costs:

// Set appropriate max_tokens
const response = await unicraft.chat.completions.create({
messages: [{ role: "user", content: "Explain AI" }],
model: "gpt-3.5-turbo",
max_tokens: 100, // Limit response length
temperature: 0.7,
});
// Use streaming for long responses
const stream = await unicraft.chat.completions.create({
messages: [{ role: "user", content: "Write a long article" }],
model: "gpt-3.5-turbo",
stream: true,
max_tokens: 2000,
});

Implement caching to avoid redundant API calls:

// Simple caching implementation
const cache = new Map();
async function getCachedResponse(prompt, model) {
const cacheKey = `${model}:${hash(prompt)}`;
if (cache.has(cacheKey)) {
return cache.get(cacheKey);
}
const response = await unicraft.chat.completions.create({
messages: [{ role: "user", content: prompt }],
model: model,
cache: true, // Enable UniCraft caching
cache_ttl: 3600, // Cache for 1 hour
});
cache.set(cacheKey, response);
return response;
}

Process multiple requests together to reduce costs:

// Batch similar requests
const batchRequests = [
{ messages: [{ role: "user", content: "Summarize: Article 1" }] },
{ messages: [{ role: "user", content: "Summarize: Article 2" }] },
{ messages: [{ role: "user", content: "Summarize: Article 3" }] },
];
const batchResponse = await unicraft.batch.create({
requests: batchRequests,
model: "gpt-3.5-turbo",
max_tokens: 100,
});
// Process batch results
batchResponse.results.forEach((result, index) => {
console.log(
`Article ${index + 1} summary:`,
result.choices[0].message.content
);
});

Implement dynamic model selection based on request characteristics:

function selectOptimalModel(request) {
const complexity = analyzeComplexity(request);
const urgency = request.urgency || "normal";
if (complexity === "simple" && urgency === "low") {
return "gpt-3.5-turbo"; // Cheapest option
} else if (complexity === "medium") {
return "claude-3-haiku"; // Good balance
} else {
return "gpt-4"; // Best quality
}
}
const response = await unicraft.chat.completions.create({
messages: request.messages,
model: selectOptimalModel(request),
max_tokens: request.max_tokens,
});

Distribute requests based on cost considerations:

const costAwareBalancer = {
providers: [
{ name: "openai", cost_per_1k_tokens: 0.002, weight: 0.4 },
{ name: "anthropic", cost_per_1k_tokens: 0.003, weight: 0.3 },
{ name: "google", cost_per_1k_tokens: 0.001, weight: 0.3 },
],
selectProvider(request) {
// Select provider based on cost and availability
const availableProviders = this.providers.filter((p) => p.available);
return availableProviders.reduce((cheapest, current) =>
current.cost_per_1k_tokens < cheapest.cost_per_1k_tokens
? current
: cheapest
);
},
};

Optimize requests to reduce token usage:

// Remove unnecessary context
function optimizePrompt(originalPrompt, context) {
// Remove redundant information
const cleanedContext = context
.replace(/\s+/g, " ") // Remove extra whitespace
.replace(/[^\w\s.,!?]/g, "") // Remove special characters if not needed
.substring(0, 1000); // Limit context length
return `${originalPrompt}\nContext: ${cleanedContext}`;
}
// Use structured prompts
const structuredPrompt = {
task: "summarize",
input: "Article content here",
output_format: "bullet_points",
max_length: 100,
};

Set up cost monitoring to track spending:

// Set up cost alerts
const costAlerts = await unicraft.alerts.create({
name: "Daily Cost Alert",
condition: "daily_cost > 50",
duration: "1d",
channels: ["email", "slack"],
});
// Monitor cost trends
const costTrends = await unicraft.analytics.getCostTrends({
time_range: "7d",
group_by: "day",
});
// Set spending limits
const spendingLimit = await unicraft.budgets.create({
name: "Daily Spending Limit",
amount: 100,
period: "daily",
alerts: [0.8, 0.9, 1.0],
});

Analyze costs by different dimensions:

// Cost by provider
const costByProvider = await unicraft.analytics.getCostBreakdown({
time_range: "30d",
group_by: "provider",
});
// Cost by model
const costByModel = await unicraft.analytics.getCostBreakdown({
time_range: "30d",
group_by: "model",
});
// Cost by project/team
const costByProject = await unicraft.analytics.getCostBreakdown({
time_range: "30d",
group_by: "project",
});

Get automated recommendations:

// Get optimization recommendations
const recommendations = await unicraft.analytics.getOptimizationRecommendations(
{
time_range: "30d",
include_savings: true,
}
);
recommendations.forEach((rec) => {
console.log(`Recommendation: ${rec.title}`);
console.log(`Potential savings: $${rec.potential_savings}`);
console.log(`Implementation: ${rec.implementation}`);
});

Predict future costs based on usage patterns:

// Get cost forecast
const forecast = await unicraft.analytics.getCostForecast({
historical_period: "90d",
forecast_period: "30d",
confidence_level: 0.95,
});
console.log(`Predicted cost for next 30 days: $${forecast.predicted_cost}`);
console.log(
`Confidence interval: $${forecast.lower_bound} - $${forecast.upper_bound}`
);
  • Set Clear Budgets: Define spending limits and monitor adherence
  • Regular Reviews: Review costs weekly/monthly to identify trends
  • Optimize Continuously: Regularly look for optimization opportunities
  • Track ROI: Measure return on investment for AI usage
  • Use Appropriate Models: Match model capability to task complexity
  • Implement Caching: Cache responses for frequently asked questions
  • Optimize Prompts: Write concise, effective prompts
  • Batch Requests: Group similar requests together
  • Monitor Usage: Track usage patterns and costs
  • Cost Allocation: Allocate costs to appropriate teams/projects
  • Training: Train teams on cost-effective AI usage
  • Policies: Establish policies for AI usage and spending
  • Regular Audits: Conduct regular cost audits and reviews
  • Monitor daily spending against budget
  • Check for unusual usage patterns
  • Review failed requests and retries
  • Analyze cost trends and patterns
  • Review model usage and effectiveness
  • Check for optimization opportunities
  • Update cost forecasts
  • Comprehensive cost analysis
  • Review and update budgets
  • Analyze ROI and effectiveness
  • Plan for upcoming changes
  1. Inefficient Model Selection: Using expensive models for simple tasks
  2. Poor Prompt Design: Long, inefficient prompts
  3. Lack of Caching: Repeated requests for same content
  4. No Rate Limiting: Uncontrolled request volume
  5. Ineffective Routing: Not using cost-optimized routing
// Implement cost controls
const costControls = {
max_cost_per_request: 0.01,
daily_spending_limit: 100,
model_cost_limits: {
"gpt-4": 0.05,
"gpt-3.5-turbo": 0.01,
},
};
// Monitor and alert on high costs
const highCostAlert = await unicraft.alerts.create({
name: "High Cost Alert",
condition: "request_cost > 0.05",
duration: "1m",
channels: ["email"],
});

After implementing cost optimization:

  1. Monitor cost trends and effectiveness
  2. Continuously optimize based on usage patterns
  3. Set up automated cost monitoring and alerts
  4. Train your team on cost-effective practices
  5. Regularly review and update optimization strategies