MosaicML debuts inference service to make generative AI deployment affordable

Digitally Generated Image, Globe and security concept

Image Credit: yaom/Getty

Join top executives in San Francisco on July 11-12, to hear how leaders are integrating and optimizing AI investments for success. Learn More

California-based MosaicML, a provider of generative AI infrastructure, has launched a fully-managed inference service to help enterprises easily and affordably deploy generative AI models.

Want must read news straight to your inbox?

The offering comes as the demand for large language models (LLMs) continues to grow across industries. According to MosaicML, it can make it possible to serve LLMs for up to 15 times less than other comparable services in the market.

The launch expands MosaicML’s capabilities, making it a complete tool for generative AI training and deployment. Prior to this, the company had largely focused on providing the software infrastructure for training generative AI models.

MosaicML inference: How does it help?

Given the rise of LLMs like ChatGPT, enterprises have grown eager to implement generative AI capabilities into their applications and products. However, owing to the privacy challenges (data going to third party) and high costs involved with building and deploying such models, the task has not exactly been a cakewalk.

Event

Transform 2023

Join us in San Francisco on July 11-12, where top executives will share how they have integrated and optimized AI investments for success and avoided common pitfalls.

With the new inference service, MosaicML is simplifying deployment by giving enterprises the option to either query their own custom-built LLMs or a curated selection of open-source models, including Instructor-XL, Databricks’ Dolly, GPT NeoX and MosaicML foundation series models.

At the core, the service includes two separate tiers: starter and enterprise. The starter tier offers open-source models curated and hosted by MosaicML as API endpoints for easy starts when adding generative AI to applications. They can be deployed as is.

The enterprise tier goes a step further, allowing teams to deploy any model they want, including custom ones developed to address specific use cases in their own network (VPC). This way, inference data never leaves the secured environment of the user’s infrastructure, ensuring full privacy and security.

And, it saves money

More importantly, with its low latency and high hardware utilization capabilities, MosaicML Inference can also be several times cheaper at deploying models than other comparable offerings.

In a cost assessment, MosaicML said the starter edition of its inference service hosted curated text completion and embedding models for four times less than OpenAI’s offering, while the enterprise tier was found to be 15 times cheaper. All measurements were taken on 40GB NVIDIA A100s with standard 512-token input sequences or 512×512 images, the company added.

Cost performance of starter and enterprise tiers MosaicML inference

While MosaicML didn’t share the names of the companies using the new inference service, CO Naveen Rao did note that customers are already starting to witness results with the offering.

“A publicly traded customer of ours in the financial compliance space is using the MosaicML inference service to deploy their custom GPT trained from scratch on MosaicML,” Rao told VentureBeat. “This customer experienced north of 10x inference savings compared to alternate providers. TCO (total cost of ownership) for their first model was less than $100,000.”

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

MosaicML debuts inference service to make generative AI deployment affordable

MosaicML debuts inference service to make generative AI deployment affordable

MosaicML inference: How does it help?

Event

And, it saves money

Recommend

AI ‘a lens through which we can move data,’ MosaicML CEO says

Cloudflare R2 and MosaicML enable training LLMs on any compute, anywhere in the...

OctoML debuts self-optimizing compute service for generative AI applications

MosaicML challenges OpenAI with its new open-source language model

This Week in AI: Databricks’ Acquisition of MosaicML

AIGC领域最大收购：Databricks 13亿美元买下MosaicML，成立仅2年员工60人

Databricks is acquiring MosaicML for a jaw-dropping $1.3 billion

Databricks and MosaicML CEOs say $1.3 billion deal is about helping enterprises...

AMD AI chips are nearly as fast as Nvidia's, MosaicML says | TechSpot

AI 公司 MosaicML 推出 70 亿参数模型 MPT-7B-8K，号称“一次处理 8000 字长文本、可商...

About Joyk