-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Upcoming Release Roadmap
Maanav Dalal edited this page Jul 23, 2024
·
7 revisions
Target Release: Mid August 2024
- Stopping publishing packages for Python 3.8.
- Discontinuing support for Xamarin. (Xamarin reached EOL on May 1, 2024)
- Discontinuing support for macOS 11 and increasing the minimum supported macOS version to 12. (macOS 11 reached EOL in September 2023)
- Discontinuing support for iOS 12 and increasing the minimum supported iOS version to 13.
- Introducing Java CUDA 12 packages on Maven.
- Adding support for CPython 3.13.0b1.
- Implementing Friendlier error messages for missing DLLs when loading dynamic loadable EPs (e.g., CUDA) on Windows to reduce CUDA version mismatch issues.
- Completing E2E MultiLora support, including work in the GenAI layer.
- Implementing DeformConv
- Removing the OrtMutex class and the dependency on nsync. Standard C++ std::mutex will be used instead.
- Adding QDQ support for int4 quantization in CPU and CUDA EP.
- Implementing FlashAttention on CPU to improve performance for GenAI prompt cases.
- Improving int4 performance for CPU (x64, arm64) and Nvidia GPU.
- Enabling running fp16 gemm with fp8 capacity on Nvidia GPU.
- No specific updates mentioned.
- No specific updates mentioned.
- Adding support for OpenVINO 2024.3.
- Updating DirectML from 1.14.1 → 1.15.
- Updating ONNX opset from 17 → 19.
- Implementing CoreML ML Program operators for the Autodesk model.
- Developing a GPU EP proof-of-concept for phi-3.
- Updating mobile documentation.
- Removing references to deprecated 'mobile' packages.
- Updating recommendations for building and deployment.
- Updating JavaScript packaging to align with the latest best practices, introducing slight incompatibilities when apps bundle onnxruntime-web.
- Adding support for grouped-query attention (GQA).
- Adding support for phi3-vision.
- Improving CPU ops coverage for WebNN, now supported by Chrome.
- No specific updates mentioned.
- Adding support for the Whisper model.
- Adding Java bindings.
- Introducing Android packages.
- Introducing Windows ARM packages.
- Adding Audio FeatureExtractor APIs.
- Enhancing support for models in tokenization with a more efficient tiktoken algorithm.
- Supporting SOTA model for multimodal applications.
- Enhancing Custom Op Lite API on GPU and fused kernels for DORT.
*note: all mentioned features are subject to change
Please use the learning roadmap on the home wiki page for building general understanding of ORT.