As an executive exploring generative AI’s potential for your organisation, you’re likely concerned about costs. Implementing AI isn’t just about picking a model and letting it run. It’s a complex ecosystem of decisions, each affecting the final price tag. This article will guide you to optimise costs throughout the AI life cycle, from model selection and fine-tuning to data management and operations.
Model Selection
Wouldn’t it be great to have a lightning-fast, highly accurate AI model that costs pennies to run? Since this ideal scenario does not exist (yet), you must find the optimal model for each use case by balancing performance, accuracy, and cost.
Start by clearly defining your use case and its requirements. These questions will guide your model selection:
- Who is the user?
- What is the task?
- What level of accuracy do you need?
- How critical is rapid response time to the user?
- What input types will your model need to handle, and what output types are expected?
Next, experiment with different model sizes and types. Smaller, more specialised models may lack the broad knowledge base of their larger counterparts, but they can be highly effective—and more economical—for specific tasks.
Consider a multi-model approach for complex use cases. Not all tasks in a use case may require the same level of model complexity. Use different models for different steps to improve performance while reducing costs.
Fine-Tuning and Model Customisation
Pretrained foundation models (FMs) are publicly available and can be used by any company, including your competitors. While powerful, they lack the specific knowledge and context of your business.
To gain a competitive advantage, you need to infuse these generic models with your organisation’s unique knowledge and data. Doing so transforms an FM into a powerful, customised tool that understands your industry, speaks your company’s language, and leverages your proprietary information. Your choice to use retrieval-augmented generation (RAG), fine-tuning, or prompt engineering for this customisation will affect your costs.
Retrieval-Augmented Generation
RAG pulls data from your organisation’s data sources to enrich user prompts so they deliver more relevant and accurate responses. Imagine your AI being able to instantly reference your product catalogue or company policies as it generates responses. RAG improves accuracy and relevance without extensive model retraining, balancing performance and cost efficiency.
Fine-Tuning
Fine-tuning means training an FM on additional, specialised data from your organisation. It requires significant computational resources, machine learning expertise, and carefully prepared data, making it more expensive to implement and maintain than RAG.
Fine-tuning excels when you need the model to perform exceptionally well on specific tasks, consistently produce outputs in a particular format, or perform complex operations beyond simple information retrieval.
We recommend a phased approach. Start with less resource-intensive methods such as RAG and consider fine-tuning only when these methods fail to meet your needs. Set clear performance benchmarks and regularly evaluate the gains versus the resources invested.
Prompt Engineering
Prompts are the instructions given to AI applications. AI users, such as designers, marketers, or software developers, enter prompts to generate the desired output, such as pictures, text summaries or source code. Prompt engineering is the practice of crafting and refining these instructions to get the best possible results. Think of it as asking the right questions to get the best answers.
Good prompts can significantly reduce costs. Clear, specific instructions reduce the need for multiple back-and-forth interactions that can quickly add up in pay-per-query pricing models. They also lead to more accurate responses, reducing the need for costly, time-consuming human review. With prompts that provide more context and guidance, you can often use smaller, more cost-effective AI models.
Data Management
The data you use to customise generic FMs is also a significant cost driver. Many organisations fall into the trap of thinking that more data always leads to better AI performance. In reality, a smaller dataset of high-quality, relevant data often outperforms larger, noisier datasets.
Investing in robust data cleansing and curation processes can reduce the complexity and cost of customising and maintaining AI models. Clean, well-organised data allows for more efficient fine-tuning and produces more accurate results from techniques like RAG. It lets you streamline the customisation process, improve model performance, and ultimately lower the ongoing costs of your AI implementations.
Strong data governance practices can help increase the accuracy and cost performance of your customised FM. It should include proper data organisation, versioning, and lineage tracking. On the other hand, inconsistently labelled, outdated, or duplicate data can cause your AI to produce inaccurate or inconsistent results, slowing performance and increasing operational costs. Good governance helps ensure regulatory compliance, preventing costly legal issues down the road.
Operations
Controlling AI costs isn’t just about technology and data—it’s about how your organisation operates.
Organisational Culture and Practices
Foster a culture of cost-consciousness and frugality around AI, and train your employees in cost-optimisation techniques. Share case studies of successful cost-saving initiatives and reward innovative ideas that lead to significant cost savings. Most importantly, encourage a prove-the-value approach for AI initiatives. Regularly communicate the financial impact of AI to stakeholders.
Continuous learning about AI developments helps your team identify new cost-saving opportunities. Encourage your team to test various AI models or data preprocessing techniques to find the most cost-effective solutions.
FinOps for AI
FinOps, short for financial operations, is a practice that brings financial accountability to the variable spend model of cloud computing. It can help your organisation efficiently use and manage resources for training, customising, fine-tuning, and running your AI models. (Resources include cloud computing power, data storage, API calls, and specialised hardware like GPUs). FinOps helps you forecast costs more accurately, make data-driven decisions about AI spending, and optimise resource usage across the AI life cycle.
FinOps balances a centralised organisational and technical platform that applies the core FinOps principles of visibility, optimisation, and governance with responsible and capable decentralised teams. Each team should “own” its AI costs—making informed decisions about model selection, continuously optimising AI processes for cost efficiency, and justifying AI spending based on business value.
A centralised AI platform team supports these decentralised efforts with a set of FinOps tools and practices that includes dashboards for real-time cost tracking and allocation, enabling teams to closely monitor their AI spending. Anomaly detection allows you to quickly identify and address unexpected cost spikes. Benchmarking tools facilitate efficiency comparisons across teams and use cases, encouraging healthy competition and knowledge sharing.
Conclusion
As more use cases emerge and AI becomes ubiquitous across business functions, organisations will be challenged to scale their AI initiatives cost-effectively. They can lay the groundwork for long-term success by establishing robust cost optimisation techniques that allow them to innovate freely while ensuring sustainable growth. After all, success depends on perfecting the delicate balance between experimentation, performance, accuracy, and cost.
The post Generative AI Cost Optimisation Strategies appeared first on AIM.