• OctoML’s new solution will enable customers to work with models like the rest of their application stack, using their DevOps workflows and tools.
  • By combining two simultaneous deployment streams—one for AI and the other for conventional software—this DevOps-inclusive strategy avoids redundancy.

OctoML, a Machine Learning (ML) deployment platform, recently unveiled a significant platform upgrade that will speed up the creation of AI-powered applications by removing deployment obstacles. The latest upgrades will allow IT operations teams and app developers to alter trained ML models into agile, portable, production-ready software functions that integrate with existing application stacks and DevOps workflows.

Building dependable and effective AI-powered apps is one of the biggest problems in enterprise software development today. The issue is that 47% of fully trained ML models are never deployed in production, and the remaining models take an average of 12 weeks to deploy. Dependencies between the ML training framework, model type, and necessary hardware at each stage of the model lifecycle make it difficult to deploy models. Users need a means to abstract out complexity, take dependencies away, and offer models as production-ready software functions.

“AI has the potential to change the world, but it first needs to become sustainable and accessible,” said Luis Ceze, CEO of OctoML. “Today’s manual, specialized ML deployment workflows are keeping application developers, DevOps engineers and IT operations teams on the sidelines. Our new solution enables them to work with models like the rest of their application stack, using their DevOps workflows and tools. We aim to do that by allowing customers to transform models into performant, portable functions that can run on any hardware.”

From the cloud to the edge, models-as-functions can operate at high performance from anywhere while maintaining stability and consistency even when there are changes to hardware infrastructure. By combining two simultaneous deployment streams — one for AI and the other for conventional software — this DevOps-inclusive strategy avoids redundancy. Additionally, it increases the success of investments previously made in model creation and operations.

Customers can work with current teams and tools thanks to the new OctoML platform release. Each user’s model, development environment, developer tools, CI/CD framework, application stack, and the cloud can leverage intelligent functionalities while still adhering to cost and performance SLAs.

The main platform expansion features are:

  • Machine learning for capabilities in machine learning — Automation finds and fixes dependencies, optimizes and cleans up model code, accelerates and packages the model for every hardware target.
  • OctoML CLI enables local use of OctoML’s feature set and connects with SaaS capabilities to build accelerated models-as-functions that are independent on hardware.
  • Wide-ranging fleet of more than 80 deployment targets – in the cloud (AWS, Azure and GCP) and at the edge with accelerated computing, including GPUs, CPUs, and NPUs from NVIDIA, Intel, AMD, ARM, and AWS Graviton – used for automated hardware compatibility testing, performance analysis, and optimizations on real hardware.
  • Performance and compatibility insights supported by real-world (rather than simulated) scenarios that may be used to accurately inform deployment decisions and guarantee SLAs around performance, cost, and user experience are achieved.
  • Comprehensive software library that includes all key machine learning frameworks, acceleration tools like Apache TVM, and chip manufacturer software stacks.
  • Any model-as-a-function produced by the OctoML CLI or OctoML platform is delivered with NVIDIA Triton Inference Server as the integrated inference serving software.
  • Users may more easily select, integrate, and deploy Triton-powered inference from any framework on common data centre servers by combining NVIDIA Triton and OctoML.

“NVIDIA Triton is the top choice for AI inference and model deployment for workloads of any size, across all major industries worldwide,” said Shankar Chandrasekaran, Product Marketing Manager, NVIDIA. “Its portability, versatility and flexibility make it an ideal companion for the OctoML platform.”

“NVIDIA Triton enables users to leverage all major deep learning frameworks and acceleration technologies across both GPUs and CPUs,” said Jared Roesch, CTO, OctoML. “The OctoML workflow extends the user value of Triton-based deployments by seamlessly integrating OctoML acceleration technology, allowing you to get the most out of both the serving and model layers.”