Saturday, March 7, 2026
No Result
View All Result
Blockchain 24hrs
  • Home
  • Bitcoin
  • Crypto Updates
    • General
    • Altcoins
    • Ethereum
    • Crypto Exchanges
  • Blockchain
  • NFT
  • DeFi
  • Metaverse
  • Web3
  • Blockchain Justice
  • Analysis
Crypto Marketcap
  • Home
  • Bitcoin
  • Crypto Updates
    • General
    • Altcoins
    • Ethereum
    • Crypto Exchanges
  • Blockchain
  • NFT
  • DeFi
  • Metaverse
  • Web3
  • Blockchain Justice
  • Analysis
No Result
View All Result
Blockchain 24hrs
No Result
View All Result

NVIDIA cuTile Python Guide Shows 90% cuBLAS Performance for Matrix Ops

Home Blockchain
Share on FacebookShare on Twitter


Timothy Morano
Jan 14, 2026 21:15

NVIDIA releases detailed cuTile Python tutorial for Blackwell GPUs, demonstrating matrix multiplication reaching over 90% of cuBLAS efficiency with simplified code.

NVIDIA has revealed a complete developer information for its cuTile Python framework, demonstrating how the brand new tile-based programming mannequin can obtain over 90% of cuBLAS efficiency for matrix multiplication operations on Blackwell structure GPUs.

The tutorial, authored by NVIDIA engineer Jinman Xie, walks builders via implementing high-performance matrix multiplication utilizing the cuTile library launched with CUDA 13.1 in December 2025. Testing on an RTX 5080 confirmed the cuTile implementation matching PyTorch’s cuBLAS-backed operations throughout matrix sizes from 1024×1024 to 16384×16384.

What cuTile Adjustments for Builders

The framework represents NVIDIA’s shift away from conventional thread-level GPU programming. As an alternative of managing particular person threads, builders now work with “tiles” – bigger knowledge chunks that the compiler robotically optimizes for tensor core execution.

An entire matrix multiplication kernel in cuTile requires roughly 30 strains of Python code. The important thing operations: load tiles from matrices A and B, name ct.mma() for matrix multiply-accumulate (which auto-invokes tensor cores), and retailer outcomes. The framework handles thread synchronization and reminiscence entry patterns internally.

Present necessities restrict adoption: CUDA 13.1 minimal, Blackwell structure solely (RTX 50 collection, compute functionality 10.x and 12.x), and Python 3.10+. NVIDIA signifies broader structure help will are available future CUDA releases.

Efficiency Optimization Particulars

The information covers “swizzle” optimization – a way that remaps block IDs to enhance cache hit charges. NVIDIA’s instance reveals swizzled reminiscence entry lowering whole knowledge hundreds by 20% in comparison with linear row entry, translating on to throughput positive aspects.

Tile dimension configuration issues considerably. For float16/bfloat16 operations, the tutorial recommends 128×256×64 tiles; for float32, 32×32×32. These aren’t common – optimum parameters depend upon matrix dimensions, GPU structure, and accessible shared reminiscence.

Market Implications

NVIDIA shares traded at $182.06 as of January 14, down 2.02% on the day. The corporate’s push to simplify GPU programming comes as competitors in AI accelerator markets intensifies.

The cuTile framework issues as a result of matrix multiplication underlies just about all neural community operations. Decreasing the experience barrier for writing performant GPU code might increase NVIDIA’s developer ecosystem – a key aggressive moat as AMD and customized silicon distributors chase the AI coaching and inference markets.

Full code examples and benchmarks can be found in NVIDIA’s TileGym repository. The autotuner instrument can robotically decide optimum tile parameters for particular workloads, addressing one of many predominant friction factors in GPU kernel optimization.

Picture supply: Shutterstock



Source link

Tags: cuBLAScuTileGuideMatrixNVIDIAOpsPerformancePythonShows
Previous Post

Popular Attorney Reveals Why Ripple Was Unable To Push XRP All These Years

Next Post

Coinbase CEO Brian Armstrong Abruptly Drops Support for Major US Crypto Legislation, Calls New Version ‘Materially Worse’ Than Status Quo

Related Posts

ElevenLabs Launches Generative Voice AI Tool for Custom Synthetic Voices
Blockchain

ElevenLabs Launches Generative Voice AI Tool for Custom Synthetic Voices

March 6, 2026
Expert Tips to Become a Web3 Expert
Blockchain

Expert Tips to Become a Web3 Expert

March 6, 2026
OpenAI Deploys ChatGPT on Pentagon’s GenAI.mil Platform for 3M Defense Personnel
Blockchain

OpenAI Deploys ChatGPT on Pentagon’s GenAI.mil Platform for 3M Defense Personnel

March 6, 2026
OpenAI Launches €500K Grant for Youth AI Safety Research in EMEA
Blockchain

OpenAI Launches €500K Grant for Youth AI Safety Research in EMEA

March 5, 2026
NVIDIA Releases Flash Attention Optimization Guide for Blackwell GPUs
Blockchain

NVIDIA Releases Flash Attention Optimization Guide for Blackwell GPUs

March 4, 2026
OpenAI Releases GABRIEL Toolkit to Transform Social Science Research
Blockchain

OpenAI Releases GABRIEL Toolkit to Transform Social Science Research

March 3, 2026
Next Post
Coinbase CEO Brian Armstrong Abruptly Drops Support for Major US Crypto Legislation, Calls New Version ‘Materially Worse’ Than Status Quo

Coinbase CEO Brian Armstrong Abruptly Drops Support for Major US Crypto Legislation, Calls New Version 'Materially Worse' Than Status Quo

Coinbase Pulls Support Of CLARITY Act, Citing Restrictions

Coinbase Pulls Support Of CLARITY Act, Citing Restrictions

Facebook Twitter Instagram Youtube RSS
Blockchain 24hrs

Blockchain 24hrs delivers the latest cryptocurrency and blockchain technology news, expert analysis, and market trends. Stay informed with round-the-clock updates and insights from the world of digital currencies.

CATEGORIES

  • Altcoins
  • Analysis
  • Bitcoin
  • Blockchain
  • Blockchain Justice
  • Crypto Exchanges
  • Crypto Updates
  • DeFi
  • Ethereum
  • Metaverse
  • NFT
  • Regulations
  • Web3

SITEMAP

  • About Us
  • Advertise With Us
  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact Us

Copyright © 2024 Blockchain 24hrs.
Blockchain 24hrs is not responsible for the content of external sites.

  • bitcoinBitcoin(BTC)$67,787.00-4.68%
  • ethereumEthereum(ETH)$1,978.33-5.08%
  • tetherTether(USDT)$1.000.00%
  • binancecoinBNB(BNB)$627.44-3.07%
  • rippleXRP(XRP)$1.36-3.01%
  • usd-coinUSDC(USDC)$1.000.00%
  • solanaSolana(SOL)$84.20-5.08%
  • tronTRON(TRX)$0.283663-0.92%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.02-1.05%
  • dogecoinDogecoin(DOGE)$0.090544-3.69%
No Result
View All Result
  • Home
  • Bitcoin
  • Crypto Updates
    • General
    • Altcoins
    • Ethereum
    • Crypto Exchanges
  • Blockchain
  • NFT
  • DeFi
  • Metaverse
  • Web3
  • Blockchain Justice
  • Analysis
Crypto Marketcap

Copyright © 2024 Blockchain 24hrs.
Blockchain 24hrs is not responsible for the content of external sites.