Databricks, the data and AI company, has announced a new feature aimed at enhancing the efficiency of large language model (LLM) inference with its Mosaic AI Model Serving.
The company says this innovation allows for simple, fast, and scalable batch processing of LLMs, making it easier for organisations to deploy these models in production environments to analyse unstructured data.
The new model supports batch inference, allowing users to process multiple requests simultaneously, rather than one at a time. Databricks claims it enhances throughput and reduces latency, which is vital for real-time applications. Designed for ease of use, it provides a straightforward interface for users to quickly set up and manage LLM inference tasks without extensive coding.
Mosaic AI Model Serving efficiently scales with demand, enabling organisations to dynamically adjust resources based on workload for optimal performance during peak times. This feature integrates with the Databricks platform, using existing data lakes and collaborative notebooks to enhance model training and deployment workflows.
“No more exporting data as CSV files to unmanaged locations—now you can run batch inference directly within your workflows, with full governance through Unity Catalog,” the company posted on its official blog.
This development positions Databricks as a leader in the LLM space, addressing the growing demand for efficient AI solutions across industries.
Databricks-AWS Partnership
Recently, Amazon and Databricks struck a five-year deal to focus on using Amazon’s Trainium AI chips that could cut costs for businesses seeking to build their GenAI apps.
Databricks acquired AI startup MosaicML last year in a $1.3 billion deal and is expanding its services to democratise AI and position its Lakehouse as the top platform for GenAI and LLMs.
The company raised $37 million and offers technology up to 15 times cheaper than competitors, serving clients like AI2, Replit, and Hippocratic AI. It claims that its MPT-30B LLM, a 30-billion parameter model, is superior in quality and more cost-effective for local deployment than GPT-3.
The post Databricks Introduces Scalable Batch Inference Model Serving appeared first on AIM.