Upcoming Release Roadmap

Maanav Dalal edited this page Jul 23, 2024 · 7 revisions

ONNX Runtime 1.19

Target Release: Mid August 2024

Build System & Packages

Stopping publishing packages for Python 3.8.
Discontinuing support for Xamarin. (Xamarin reached EOL on May 1, 2024)
Discontinuing support for macOS 11 and increasing the minimum supported macOS version to 12. (macOS 11 reached EOL in September 2023)
Discontinuing support for iOS 12 and increasing the minimum supported iOS version to 13.
Introducing Java CUDA 12 packages on Maven.
Adding support for CPython 3.13.0b1.
Implementing Friendlier error messages for missing DLLs when loading dynamic loadable EPs (e.g., CUDA) on Windows to reduce CUDA version mismatch issues.

Core

Completing E2E MultiLora support, including work in the GenAI layer.
Implementing DeformConv
Removing the OrtMutex class and the dependency on nsync. Standard C++ std::mutex will be used instead.

Performance

Adding QDQ support for int4 quantization in CPU and CUDA EP.
Implementing FlashAttention on CPU to improve performance for GenAI prompt cases.
Improving int4 performance for CPU (x64, arm64) and Nvidia GPU.
Enabling running fp16 gemm with fp8 capacity on Nvidia GPU.

Execution Providers

TensorRT

No specific updates mentioned.

QNN

No specific updates mentioned.

OpenVINO

Adding support for OpenVINO 2024.3.

DirectML

Updating DirectML from 1.14.1 → 1.15.
Updating ONNX opset from 17 → 19.

Mobile

Implementing CoreML ML Program operators for the Autodesk model.
Developing a GPU EP proof-of-concept for phi-3.
Updating mobile documentation.
Removing references to deprecated 'mobile' packages.
Updating recommendations for building and deployment.

Web

Updating JavaScript packaging to align with the latest best practices, introducing slight incompatibilities when apps bundle onnxruntime-web.
Adding support for grouped-query attention (GQA).
Adding support for phi3-vision.
Improving CPU ops coverage for WebNN, now supported by Chrome.

Training

No specific updates mentioned.

GenAI

Adding support for the Whisper model.
Adding Java bindings.
Introducing Android packages.
Introducing Windows ARM packages.

Extensions

Adding Audio FeatureExtractor APIs.
Enhancing support for models in tokenization with a more efficient tiktoken algorithm.
Supporting SOTA model for multimodal applications.
Enhancing Custom Op Lite API on GPU and fused kernels for DORT.

*note: all mentioned features are subject to change