IBM Spyre Accelerator: PyTorch Enabling Status and Feature Plan - 1H 2026

raghukiran1224 · March 2, 2026, 6:50pm

IBM Spyre Accelerator: PyTorch Enabling Status and Feature Plan - 1H 2026

Introduction

This roadmap outlines IBM’s plan for integrating the Spyre accelerator with the PyTorch ecosystem during the first half of 2026. Our goal is to provide seamless PyTorch support for Spyre by building on existing PyTorch ecosystem components, ensuring minimal runtime overhead while maximizing performance and developer productivity. Equally important, we are committed to contributing back to the community — generalizing dataflow accelerator enablement in torch.inductor and lower layers, contributing OpenReg testing infrastructure into PyTorch core, and establishing out-of-tree CI/CD infrastructure — so that our work benefits not just Spyre but the broader ecosystem of dataflow accelerators.

Scope for 1H 2026: All work in this roadmap is scoped to inference only at FP16 and FP8 precisions, starting with the following priority models:

GPT-OSS (20B)
Granite 4 Hybrid (30B)
Mistral-small
Qwen 2.5 VL 7B
Llama 3.1 8B-instruct
Ministral 8B
Ministral 14B

Motivation

We are building PyTorch-native support for the Spyre accelerator, designed from the ground up around upstream integration mechanisms. This approach is driven by several key goals:

Ecosystem-First Development:

Leveraging torch.inductor and out-of-tree extensions as the primary compilation and integration path
Minimizing custom code by building on community-maintained infrastructure
Freeing up engineering capacity to contribute hardware-specific optimizations and improvements back to the PyTorch ecosystem

Ecosystem Access and Model Coverage:

Direct access to the rapidly expanding PyTorch model ecosystem (vLLM, Hugging Face, etc.)
Faster time-to-support for new model architectures
Ability to leverage community tools, optimizations, and best practices

Sustainability and Community Collaboration:

Alignment with PyTorch standards ensures long-term compatibility as the ecosystem evolves
Opportunities to contribute improvements back to the community
Easier collaboration with external partners and customers already using PyTorch

Contributing Back: Dataflow Accelerator Enablement:

Generalizing torch.inductor pathways to work across dataflow architectures, not just Spyre-specific optimizations
Contributing an intermediate representation at the Tile level that is generic for any dataflow accelerator to use for lower-level scheduling of kernels
Ensuring our upstream contributions are designed for broad adoption, making it easier for the next dataflow accelerator to onboard

Production-Grade Performance:

Ensuring the ecosystem integration approach meets production deployment requirements
Validating that the PyTorch-native approach can deliver high performance on dataflow accelerators
Establishing CI/CD infrastructure for continuous validation and regression detection across the full stack

This approach represents a strategic investment in long-term sustainability while delivering the performance standards our users expect.

Roadmap Overview

Our work focuses on four key pillars:

PyTorch Core Integration - Deep integration with torch.inductor, runtime mechanisms, distributed inference, and profiling
Backend Compiler - Building a robust compiler backend with KernelTile IR (KTIR) as the community-aligned intermediate representation for dataflow accelerators
vLLM Integration - Production inference support for large language models
CI/CD Infrastructure - Establishing out-of-tree CI support for the PyTorch ecosystem

We are committed to contributing generic primitives and improvements back to the PyTorch community, including OpenReg testing infrastructure and broader KTIR adoption across AI accelerators.

For detailed technical specifications and design documents, please refer to our RFCs repository.

PyTorch Core Integration

torch.inductor Integration

Our primary objective is to integrate Spyre into torch.inductor with maximal performance at the PyTorch level.

Inductor extensions for Dataflow accelerators:

Introduce tile based tensor layout representations in inductor
Multi-core work division on tiles as part of inductor passes
Scratchpad optimization for enabling dataflow accelerators
Performant multi-card inference optimization is out of scope for 1H 2026 (functional distributed inference covered separately below)

Key Metrics:

Achieve TTFT, ITL, and throughput metrics that meet production deployment requirements on single card
Priority models compiling end-to-end through torch.inductor by end of 1H 2026

torch.runtime Integration

Enable Spyre using torch built-in mechanisms to ensure minimal runtime overhead and enable eager mode execution.

Device Registration and Extensions:

Integrate device registration and startup using core PyTorch out-of-tree extensions
Implement memory management, data transfer, and dispatcher as out-of-tree extensions with minimal overhead
Contribute generic primitives of OpenReg testing infrastructure into PyTorch core for broader community benefit

Key Metrics:

100% of device lifecycle (registration, startup, teardown) implemented via out-of-tree extensions
Runtime overhead from PyTorch integration layer: <5% compared to direct device access
OpenReg primitives contributed upstream: ≥3 PRs merged into PyTorch core

Op Coverage (Integration in Core PyTorch)

Integrate new ops into torch.inductor for Spyre, focusing on registration and layout constraint propagation.

Torch Op Integration:

Increase torch op coverage to enable the priority models
Register new ops in torch.inductor and implement layout constraint propagation for each op
Enable seamless integration of custom kernels with torch.compile workflow
Implement new ops through the backend compiler IR as the single entry point for all op lowering and code generation

Key Metrics:

Op integration in torch.inductor sufficient to support priority models listed above
Layout constraint propagation validated for all integrated ops
End-to-end torch.compile workflow validated on priority models

Distributed Inference

Enable multi-card inference for Spyre using PyTorch distributed primitives, with a phased approach from compiled functional collectives to full torch.distributed integration.

Compiled Functional Collectives (1H 2026):

Support compilation of functional collective operations (all-reduce, all-gather) through torch.inductor
Distributed inference support targeting all priority models via compiled functional collectives

Migration to torch.distributed (1H 2026 and beyond):

Transition to torch.distributed for eager mode collective operations
Eventual migration to torch.comms as the long-term community communication layer

Key Metrics:

All priority models running distributed inference end-to-end via compiled functional collectives
Functional correctness validated: distributed inference results match single-card reference outputs

Profiling and Performance Analysis

Build a profiling toolkit for Spyre that integrates with PyTorch profiling infrastructure and provides performance visibility from end-to-end model execution down to intra-kernel behavior.

Spyre Profiling Toolkit:

Build a System Management Interface (Spyre SMI) for device monitoring: power, temperature, utilization, memory bandwidth, and per-process resource usage
Integrate with PyTorch Profiler via upstream Kineto plugin to trace Spyre kernel execution, memory timelines, and call stacks
Extend Spyre trace analyzer to HTA metric parity and multi-Spyre support
Design IR instrumentation-based profiler using FX graph observability hooks for selective operator-level and intra-kernel profiling
Extend Inductor provenance tracking to show IR after any user-specified compiler pass

Key Metrics:

Profiling overhead: <5% on application performance at standard device instrumentation level (SMI)
100% identical results with and without profiling enabled
All profiling tools designed for upstream contribution or open-source release

Backend Compiler

IR Integration Across the Compilation Stack

The backend compiler connects the spyre-inductor frontend (our out-of-tree torch.inductor extension) to Spyre code generation through optimization passes exercising a set of mid-level and low-level IRs, enabling clean separation of concerns and community-aligned extensibility.

SuperDSC (SDSC) — Backend Compiler IR:

SuperDSC (SDSC) serves as the compiler IR produced after spyre-inductor frontend’s optimization passes. It is consumed by the Spyre backend compiler for further lowering and optimization passes for code generation.
Enable clean separation of concerns between the PyTorch integration layer and hardware-specific optimizations
All op lowering and code generation flows through SDSC as the single entry point into the backend

KernelTile IR (KTIR) — Community-Aligned Specification:

KernelTile IR (KTIR) is the longer-term community-aligned specification, designed for adoption across dataflow accelerators
KTIR generalizes the tile-level intermediate representation so that other dataflow accelerators can leverage it for lower-level scheduling

Dataflow Scheduling and Code Generation

Realize efficient dataflow scheduling and code generation for Spyre hardware.

Dataflow Scheduling:

Open-source contributions to automatic dataflow scheduling
Design scheduling algorithms to be useful and adaptable for other dataflow accelerators
Develop a native programming language for dataflow accelerators used for development and verification

Key Metrics:

100% of torch ops required by priority models expressible in SDSC
All priority models compiling end-to-end through the backend compiler
SDSC generation time from spyre-inductor lowering: few minutes per priority model
Complete KTIR spec published

vLLM Integration

Enable Spyre in the vLLM ecosystem and expand supported models for production inference workloads.

Model Support:

Adopt modeling code from vLLM, consolidating on upstream model implementations
Enable Spyre support for the priority models through vLLM

Performance Optimization:

Develop new Spyre attention backend for vLLM that does not have homogeneous sequence length constraint
Improve how upstream vLLM handles caching of torch.compile artifacts

API Stability:

Collaborate with the vLLM community to stabilize the platform plugin interface
Establish a predictable release cadence for platform plugin API changes

Key Metrics:

Priority models serving inference end-to-end through vLLM on Spyre
Inter-token latency: significant reduction via new attention backend
Startup time: few seconds with torch.compile artifact caching
Breaking changes in platform plugin interface: ≤1 per quarter

Testing and CI/CD Infrastructure

Establish new CI/CD pipeline for using Spyre accelerator with PyTorch and vLLM, enabling out-of-tree CI support in collaboration with the broader PyTorch community. Testing spans from op-level through full vLLM inference validation, all scoped to the priority models.

Test Categories:

Op-level Tests: Validate individual torch ops for the 7 priority models (see Introduction)
Inductor Tests: Ensure torch.inductor integration correctness
- Compilation accuracy and performance validation
- Lowering and code generation verification
Module-level Tests: Test PyTorch module components and building blocks
- Attention mechanisms, normalization layers, activations
- Memory management and data transfer correctness
Top-level Model Tests: End-to-end model validation similar to vLLM
- Quality metrics: accuracy, convergence, numerical stability
- Performance metrics: throughput, latency, memory utilization
vLLM Integration Tests: End-to-end inference validation through vLLM on Spyre
- Model loading, compilation, and serving for priority models
- Throughput, latency, and correctness validation matching vLLM benchmarks

Key Metrics:

PyTorch/vLLM ecosystem OSS tests relevant to priority models running nightly
Test pass rate on executed suite: >95% on nightly runs
Nightly regression detection: failures flagged within few hours of commit
CI pipeline uptime: >99% availability
Average full pipeline run time: < 3 hours

Summary

IBM’s Spyre accelerator integration with PyTorch in 1H 2026 focuses on co-development with the PyTorch ecosystem, leveraging PyTorch’s out-of-tree extension mechanisms to minimize custom code while maximizing performance. Our comprehensive approach spans backend compiler integration with torch.inductor, runtime integration (device registration, memory management, distributed inference, and profiling), ecosystem tooling (CI/CD infrastructure), and production deployment (vLLM support).

We are committed to contributing generic improvements back to the PyTorch community, including OpenReg testing primitives, KTIR generalization for AI accelerators, and collaboration on out-of-tree CI support infrastructure. This ensures that our work benefits not just Spyre users but the broader PyTorch ecosystem.

raghukiran1224 · March 13, 2026, 2:27pm

KTIR RFC: torch-spyre/RFCs/0682-KtirSpec/0682-KtirSpecRFC.md at main · torch-spyre/torch-spyre · GitHub (updated after PR merged).

Topic		Replies	Views
Intel GPU & CPU Enabling Status and Feature Plan – 2026 H1 Update hardware-backends	0	645	March 10, 2026
Intel GPU & CPU Enabling Status and Feature Plan – 2025 H1 Update hardware-backends	1	1632	April 16, 2025
Intel GPU Enabling Status and Feature Plan hardware-backends	0	891	August 16, 2024
Next Steps for PyTorch Compilers compiler	9	11135	October 21, 2021
Meta PyTorch Team 2024 H2 Roadmaps	20	24407	February 19, 2025

IBM Spyre Accelerator: PyTorch Enabling Status and Feature Plan - 1H 2026

IBM Spyre Accelerator: PyTorch Enabling Status and Feature Plan - 1H 2026

Introduction

Motivation

Roadmap Overview

PyTorch Core Integration

torch.inductor Integration

torch.runtime Integration

Op Coverage (Integration in Core PyTorch)

Distributed Inference

Profiling and Performance Analysis

Backend Compiler

IR Integration Across the Compilation Stack

Dataflow Scheduling and Code Generation

vLLM Integration

Testing and CI/CD Infrastructure

Summary

Related topics