Thursday, April 23, 2026
No Result
View All Result
Blockchain 24hrs
  • Home
  • Bitcoin
  • Crypto Updates
    • General
    • Altcoins
    • Ethereum
    • Crypto Exchanges
  • Blockchain
  • NFT
  • DeFi
  • Metaverse
  • Web3
  • Blockchain Justice
  • Analysis
Crypto Marketcap
  • Home
  • Bitcoin
  • Crypto Updates
    • General
    • Altcoins
    • Ethereum
    • Crypto Exchanges
  • Blockchain
  • NFT
  • DeFi
  • Metaverse
  • Web3
  • Blockchain Justice
  • Analysis
No Result
View All Result
Blockchain 24hrs
No Result
View All Result

FlashAttention-4 Hits 1,605 TFLOPS on NVIDIA Blackwell GPUs

Home Blockchain
Share on FacebookShare on Twitter


Alvin Lang
Jan 22, 2026 23:03

NVIDIA’s FlashAttention-4 achieves 71% {hardware} effectivity on Blackwell chips, delivering 3.6x speedup over FA2 for AI coaching workloads.

NVIDIA has launched FlashAttention-4, the newest optimization for transformer neural networks that squeezes 1,605 TFLOPS out of its Blackwell structure—capturing 71% of the {hardware}’s theoretical most efficiency.

The announcement issues for anybody watching AI infrastructure investments. As giant language fashions push towards longer context home windows, the eye mechanism’s quadratic reminiscence complexity turns into a brutal bottleneck. FlashAttention-4 assaults this downside straight, and the benchmark numbers recommend significant beneficial properties for manufacturing AI workloads.

What the Numbers Present

On the B200 GPU, FA4 delivers a 3.6x speedup over FlashAttention-2 throughout ahead passes at 32,768 sequence size. Backward cross efficiency hits 3.15x sooner than FA2 below the identical situations. Towards present frameworks, FA4 posts 1.3x enchancment over cuDNN and a pair of.4x over Triton Inference Server implementations.

The reminiscence effectivity beneficial properties are equally important. Normal consideration scales at O(N²) with sequence size—that means doubling your context window quadruples reminiscence necessities. FA4 brings this all the way down to O(N) by means of tiling and incremental softmax normalization. NVIDIA claims 20x decrease reminiscence utilization in comparison with PyTorch baselines.

{Hardware}-Software program Co-Design

FA4 was constructed particularly for Blackwell’s quirks. The structure presents an uneven scaling downside: compute energy roughly doubles whereas reminiscence bandwidth would not preserve tempo. Conventional approaches depart tensor cores sitting idle whereas ready for information.

The answer leverages Blackwell’s devoted Tensor Reminiscence (TMEM)—256 KB of on-chip reminiscence per streaming multiprocessor. By storing intermediate calculations straight in TMEM as a substitute of shared reminiscence, FA4 sidesteps the bandwidth bottleneck that will in any other case throttle the sooner compute items.

Bigger tile sizes (as much as 128×128) and deeper pipelines preserve the {hardware} busy. The backward cross—usually the slower half of coaching—advantages from bypassing register accumulation completely.

Manufacturing Integration

Main inference frameworks together with SGLang and vLLM already assist FA4 prefill operations. NVIDIA has integrated these strategies into cuDNN 9.14, making the optimizations accessible to builders with out customized kernel work.

For AI corporations burning by means of compute budgets, the effectivity beneficial properties translate on to price financial savings. A 3x+ speedup on coaching passes means both sooner iteration cycles or the flexibility to coach bigger fashions inside present infrastructure constraints.

The broader pattern right here: as transformer fashions develop, algorithmic effectivity on the kernel degree turns into as essential as uncooked {hardware} functionality. FlashAttention-4 represents the present frontier of that optimization work.

Picture supply: Shutterstock



Source link

Tags: BlackwellFlashAttention4GPUshitsNVIDIATFLOPS
Previous Post

Solana Treasury Firm Blames Sniper for Suspicious Meme Coin Trades

Next Post

Bitcoin Bounces Back as Tariff U-Turn Sends Gold Lower

Related Posts

GSR Launches Multi-Asset Crypto ETF ‘BESO’ on Nasdaq
Blockchain

GSR Launches Multi-Asset Crypto ETF ‘BESO’ on Nasdaq

April 23, 2026
Litecoin Eyes  Breakout as Technical Setup Aligns for May Rally
Blockchain

Litecoin Eyes $62 Breakout as Technical Setup Aligns for May Rally

April 23, 2026
Blockchain.com Adds Perps Trading to Self-Custody Wallets
Blockchain

Blockchain.com Adds Perps Trading to Self-Custody Wallets

April 22, 2026
Google’s Deep Research Max Raises Bar for Autonomous AI Tools
Blockchain

Google’s Deep Research Max Raises Bar for Autonomous AI Tools

April 21, 2026
Success Story: Douglas Vernon’s Learning Journey with 101 Blockchains
Blockchain

Success Story: Douglas Vernon’s Learning Journey with 101 Blockchains

April 21, 2026
Tether Acquires 8.2% Stake in Bitcoin Mining Lender Antalpha
Blockchain

Tether Acquires 8.2% Stake in Bitcoin Mining Lender Antalpha

April 20, 2026
Next Post
Bitcoin Bounces Back as Tariff U-Turn Sends Gold Lower

Bitcoin Bounces Back as Tariff U-Turn Sends Gold Lower

Validator Says Current Level is a Strategic Buying Opportunity

Validator Says Current Level is a Strategic Buying Opportunity

Facebook Twitter Instagram Youtube RSS
Blockchain 24hrs

Blockchain 24hrs delivers the latest cryptocurrency and blockchain technology news, expert analysis, and market trends. Stay informed with round-the-clock updates and insights from the world of digital currencies.

CATEGORIES

  • Altcoins
  • Analysis
  • Bitcoin
  • Blockchain
  • Blockchain Justice
  • Crypto Exchanges
  • Crypto Updates
  • DeFi
  • Ethereum
  • Metaverse
  • NFT
  • Regulations
  • Web3

SITEMAP

  • About Us
  • Advertise With Us
  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact Us

Copyright © 2024 Blockchain 24hrs.
Blockchain 24hrs is not responsible for the content of external sites.

  • bitcoinBitcoin(BTC)$77,952.00-0.72%
  • ethereumEthereum(ETH)$2,327.11-2.85%
  • tetherTether(USDT)$1.00-0.01%
  • rippleXRP(XRP)$1.44-0.05%
  • binancecoinBNB(BNB)$638.27-0.53%
  • usd-coinUSDC(USDC)$1.000.00%
  • solanaSolana(SOL)$85.81-1.56%
  • tronTRON(TRX)$0.329142-0.05%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.041.59%
  • dogecoinDogecoin(DOGE)$0.0968300.45%
No Result
View All Result
  • Home
  • Bitcoin
  • Crypto Updates
    • General
    • Altcoins
    • Ethereum
    • Crypto Exchanges
  • Blockchain
  • NFT
  • DeFi
  • Metaverse
  • Web3
  • Blockchain Justice
  • Analysis
Crypto Marketcap

Copyright © 2024 Blockchain 24hrs.
Blockchain 24hrs is not responsible for the content of external sites.