Trending

#Infiniband

Latest posts tagged with #Infiniband on Bluesky

Latest Top
Trending

Posts tagged #Infiniband

Preview
Ethernet Surpasses InfiniBand in AI Networking Advancements by 2025 A recent report predicts that Ethernet will significantly outpace InfiniBand in AI networking, taking over two-thirds of data center switch sales by 2025.

Ethernet Surpasses InfiniBand in AI Networking Advancements by 2025 #United_States #Redwood_City #Dell'Oro_Group #Ethernet #InfiniBand

0 0 0 0
Post image

www.soniccomponents.com/product/mqm9...

Nvidia quantum 2 based ndr infiniband support required mqm9790-ns2f @NVIDIA.bsky.social #quantum2 #ndr #infiniband #mqm9790 #ns2f2f

0 0 0 0
Preview
Pushed By GenAI And Front End Upgrades, Ethernet Switching Hits New Highs But virtue of its scale out capability, which is key for driving the size of absolutely enormous AI clusters, and to its universality, Ethernet switch sales are booming, and if the recent history is any guide, we can expect Ethernet revenues will climb exponentially higher in the coming quarters as well. … Pushed By GenAI And Front End Upgrades, Ethernet Switching Hits New Highs was written by Timothy Prickett Morgan at The Next Platform.
0 0 0 0
Post image

Нейро сети для самых маленьких Нейро сети для самых маленьких Каждый раз, когда вы говорите нейросети « Спа...

#ai #ml #roce #infiniband #трансформеры #нейросети #llm #mlp #backpropagation

Origin | Interest | Match

0 0 0 0
Post image

Нейро сети для самых маленьких. Часть нулевая. Обзорная Нейро сети для самых маленьких Каждый раз, когда вы ...

#ai #ml #roce #infiniband #трансформеры #нейросети #llm #mlp #backpropagation

Origin | Interest | Match

0 0 0 0
Post image

Monitoring high-speed networks… in the terminal 😍

📡 ibtop — Real-time TUI monitor for InfiniBand networks.

💯 htop but for ultra-fast interconnects.

🦀 Written in Rust & built with @ratatui.rs

⭐ GitHub: github.com/JannikSt/ibtop

#rustlang #ratatui #tui #networking #infiniband #linux #terminal

13 2 0 1
Preview
DriveNets to Revolutionize AI Networking Solutions at AI Summit NY 2025 Discover the cutting-edge AI networking solutions showcased by DriveNets at the upcoming AI Summit NY set for December 2025. Explore innovation!

DriveNets to Revolutionize AI Networking Solutions at AI Summit NY 2025 #United_States #New_York #AI_Networking #DriveNets #InfiniBand

0 0 0 0
Preview
TACC’s “Horizon” Supercomputer Sets The Pace For Academic Science As we expected, the “Vista” supercomputer that the Texas Advanced Computing Center installed last year as a bridge between the current “Stampede-3” and “Frontera” production system and its future “Horizon” system coming next year was indeed a precursor of the architecture that TACC would choose for the Horizon machine. … TACC’s “Horizon” Supercomputer Sets The Pace For Academic Science was written by Timothy Prickett Morgan at The Next Platform.
0 0 0 0
Post image

Check out our newest blog, "Ethernet Alliance Hot Take: #Ethernet and #InfiniBand for HPC." This blog includes contributions from 4 member companies and also includes many of our very own Ethernet Alliance board members and leadership. Read it here: https://bit.ly/47GZNLj #VoiceOfEthernet

1 0 0 0
Post image

Check out our newest blog, "Ethernet Alliance Hot Take: #Ethernet and #InfiniBand for HPC." This blog includes contributions from 4 member companies and also includes many of our very own Ethernet Alliance board members and leadership. Read it here: https://bit.ly/47GZNLj #VoiceOfEthernet

2 0 0 0
Video

www.soniccomponents.com/product/nvid...
NVIDIA Quantum-X800 Q3200‑RA 920‑9B34F‑00RX‑FS0 NVIDIA Quantum-X800 Q3200‑RA @nvidia.bsky.social #quantum #x800 #q3200 #9b34f #800gbs #connectX #supernic #linkX #infiniband

0 0 0 0

📰 ASUS Perkenalkan AI Factory dan Server Generasi Baru dengan NVIDIA HGX B300 di OCP 2025

👉 Baca artikel lengkap di sini: ahmandonk.com/2025/10/14/asus-ai-facto...

#aifactory #aiserver #asus #blackwell #datacenter #epyc9005 #generativeai #hgxb300 #infiniband #

0 1 0 0
Preview
InfiniBand: A Formidable Foe Ultra Ethernet: Scaling AI & HPC with high-performance, lossless Ethernet. Supports 1M+ endpoints. 1.0 spec released. Hardware coming 2025.

Ultra Ethernet challenges InfiniBand for AI/HPC dominance with scalable, loss-tolerant design. Will it become the new interconnect standard?
#InfiniBand
open.substack.com/pub/chippub/...

0 0 0 0
Post image

#Ethernet is gaining ground in the $80B #AI #networking race and may soon overtake #InfiniBand, as our friends at #DellOroGroup shared in a new report. As AI workloads scale fast, openness, flexibility, and ecosystem maturity are tipping the balance. Details at sdxcentral.bsky.social: bit.ly/3IBTwGM

0 1 0 0
Preview
Omni-Path is back to take on InfiniBand? : After a five-year hiatus, Cornelis' interconnect returns at 400Gbps, with Ethernet support next

Five years after Intel spun off its #Omni-Path #interconnect tech into Cornelis Networks, its 400Gbps CN5000 line of switches and NICs is finally ready to do battle with its long-time rival, Nvidia's #InfiniBand

www.theregister.com/2025/06/09/o...

#HPC #AI via @theregister.com

2 0 1 0

NERSC going with #Dell / #Infiniband rather than #Cray / #Slingshot is a pretty big deal for all parties involved!

7 1 0 1
Post image

#AI #datacenters depend on high-speed #interconnects and connectors like #PCIe, #NVLink, and #InfiniBand, so where does Ethernet fit in?

As demands grow, #Ethernet’s evolving role in high-performance compute deserves a closer look. Explore the full article now at #EEWorldOnline: bit.ly/3FzjTvz

1 0 0 0
Preview
RDMA에 대하여 RDMA(Remote Direct Memory Access)는 네트워크를 통해 서버 간에 CPU 개입 없이 메모리를 직접 읽고 쓰는 기술이다. 주로 고성능 컴퓨팅(HPC), 대규모 데이터 처리, AI/ML 분산 학습 환경 등에서 낮은 지연 시간과 높은 대역폭을 제공하기 때문에 많이 사용한다. **1. RDMA의 기본 개념** 일반적인 네트워크 통신 방식에서는 데이터를 전송하려면 CPU가 메모리에서 데이터를 읽고, 네트워크 인터페이스 카드(NIC)로 복사한다. 이 복사한 데이터를 네트워크를 통해 데이터를 보내고, 수신 측에서도 CPU가 데이터를 NIC에서 메모리로 복사한다. 이 과정에서 CPU, 커널, 복사 작업 등이 개입되어 **지연(latency)** 과 CPU 오버헤드가 큰 현상이 발생한다. 반면, RDMA는, CPU나 운영체제(OS) 커널의 개입 없이, 원격 시스템의 메모리에 직접 읽기(read) 또는 쓰기(write) 작업을 수행한다. 다시 말해, 데이터가 직접 메모리에서 메모리로 이동한다. **2. RDMA가 동작하는 방식** **(1) 메모리 등록 (Memory Registration)** * 송신자와 수신자 모두 RDMA NIC(HCA: Host Channel Adapter)를 사용함. * RDMA를 사용하기 위해서는 **사용할 메모리 영역을 등록(pinning)** 해야 함. 이 과정에서 해당 메모리는 물리 메모리에 고정되고 커널이 RDMA NIC에 메모리 주소를 알려줌. **(2) Queue Pair 생성** * RDMA 통신은 **QP(Queue Pair)** 라는 구조를 통해 이루어짐. * QP는 Send Queue와 Receive Queue로 구성되어 있고, 이를 통해 명령을 주고 받는다. **(3) 통신 방식** RDMA Write/Read는 zero-copy 방식이며, 데이터가 사용자 공간 메모리에서 사용자 공간 메모리로 직접 이동함. 통신 방식 | 설명 ---|--- **RDMA Write** | 송신자가 수신자의 메모리에 **데이터를 씀** **RDMA Read** | 송신자가 수신자의 메모리에서 **데이터를 읽음** **Send/Receive** | 일반적인 메시지 송수신. 수신자는 명시적으로 Receive를 호출해야 함 **3. RDMA의 장점** * **낮은 지연 시간:** 커널과 CPU를 우회하므로 지연이 매우 낮음 * **높은 처리량:** zero-copy 방식과 직접 메모리 접근으로 대역폭이 높음 * **낮은 CPU 사용률:** CPU가 복사 작업에서 해방되어 다른 작업에 집중 가능 * **스케일 아웃에 유리:** HPC, AI 분산 학습 등에 적합 **4. RDMA를 사용하려면?** * RDMA 지원 네트워크 카드 (예: Mellanox ConnectX) * RDMA 지원 네트워크 (예: InfiniBand, RoCE(RDMA over Converged Ethernet)) * RDMA를 지원하는 라이브러리 및 소프트웨어 * Verbs API, libibverbs * MPI(MVAPICH, OpenMPI with UCX) * NVIDIA NCCL (분산 학습 시 RDMA 지원) * GDS (GPUDirect Storage와 연계 가능) **5. RDMA와 관련된 주요 기술** 기술 | 설명 ---|--- **RoCE** | RDMA를 이더넷에서 사용하게 해주는 기술 (L2 또는 L3) **InfiniBand** | 고성능 RDMA 전용 네트워크 프로토콜 **GPUDirect RDMA** | GPU 메모리 간 직접 통신 지원 **GDS (GPUDirect Storage)** | RDMA를 통해 스토리지에서 GPU로 직접 데이터 전송 **6. RDMA가 AI 분산 학습에서 성능을 높이는 이유** **1) CPU 개입 없이 GPU 메모리 간 직접 통신 가능 (Zero-Copy)** * RDMA는 CPU와 OS 커널을 거치지 않고, 한 노드의 GPU 메모리에서 다른 노드의 GPU 메모리로 직접 데이터 전송이 가능함. * 이를 NVIDIA GPUDirect RDMA라고 하며, 학습 시 자주 발생하는 all-reduce, broadcast, scatter 연산의 성능을 획기적으로 향상시킴. **2) 낮은 지연(Low Latency)** * 일반 TCP/IP 스택보다 수십~수백 배 낮은 지연을 제공함 (μs 수준). * 이는 딥러닝 학습 시 step 간 통신 대기 시간을 줄이는 데 매우 효과적. **3) 높은 대역폭(High Throughput)** * InfiniBand나 RoCE를 사용하는 RDMA는 100Gbps~400Gbps까지 지원 가능. * **파라미터 싱크(sync)** 나 gradient all-reduce처럼 대량의 데이터를 빠르게 교환해야 하는 상황에서 필수적. **7. 실제 프레임워크 적용 사례** 📌 **예1) PyTorch Distributed + NCCL** * torch.distributed에서 backend로 nccl을 설정하고, underlying transport로 RDMA가 적용된 InfiniBand를 사용하면 AllReduce 성능이 수 배 향상. * NCCL은 RDMA를 통해 direct GPU-GPU communication을 지원. 📌 **예2) DeepSpeed ZeRO-2 / ZeRO-3** * 수십억~수천억 파라미터 모델의 학습에서는 각 노드가 다른 노드의 optimizer state, gradient, parameter shard를 가져와야 함. * 이 때 RDMA 없이 일반 TCP를 쓰면 통신 대기 시간이 너무 커짐 → 학습 속도 병목. * RDMA를 사용하면 state sharding 간의 통신 지연을 줄여 전체 학습 속도 향상. **8. 벤치마크 예시 (NVIDIA A100 8-GPU, 2노드 기준)** RDMA 적용 시 최대 2배 이상 성능 향상 가능 통신 방식 | ResNet50 Training Speed (images/sec) ---|--- TCP/IP over Ethernet | 25,000 RoCEv2 (RDMA over Converged Ethernet) | 42,000 InfiniBand + GPUDirect RDMA | **58,000** **9. 사용 조건** 구성 요소 | 필요 내용 ---|--- HCA (NIC) | Mellanox ConnectX 시리즈, RDMA 지원 네트워크 | InfiniBand, RoCEv2, NVLink Switch (for intra-node) 소프트웨어 | libibverbs, OFED driver, NCCL with RDMA support 프레임워크 | PyTorch DDP, TensorFlow Horovod, DeepSpeed, Megatron-LM 등 **10. 결론** RDMA는 고성능 AI 분산 학습의 핵심 인프라 기술이다. 특히, GPU 간 통신이 병목이 되는 LLM, ViT, MoE 계열 모델 학습에서 통신 지연을 획기적으로 줄여 전체 학습 시간을 단축시켜 주는 효과가 있다.
0 0 0 0
Post image

You don't say? #AI continues to transform #datacenter #networking, as scale-up switches get set to outpace #Ethernet and #InfiniBand according to #LightCounting. It forecasts Ethernet will lead scale-out by 2030, with UALink gaining traction. Details in this #LightTrends Newsletter: bit.ly/3Gzxmnj

0 0 0 0
Preview
Simplifying InfiniBand on AKS Learn the what, the whys, and the hows of configuring InfiniBand networking for high performance compute (HPC) workloads on AKS

Need to run really large LLMs that require the power of RDMA with InfiniBand?
Check out this blog to see how to get it working on Azure Kubernetes Service:
azure.github.io/AKS/2025/04/...

#AKS #Kubernetes #InfiniBand #AI #LLM #RDMA #Azure

3 3 0 0
Preview
Simplifying InfiniBand on AKS Learn the what, the whys, and the hows of configuring InfiniBand networking for high performance compute (HPC) workloads on AKS

Simplifying #InfiniBand on #AKS azure.github.io/AKS/2025/04/...

#kubernetes

0 0 0 0
Preview
Simplifying InfiniBand on AKS Learn the what, the whys, and the hows of configuring InfiniBand networking for high performance compute (HPC) workloads on AKS

Interested in running workloads on #AKS that require #RDMA over #Infiniband? 🤔

Take a read through the latest AKS blog post on the topic and the linked guide which will walk you through installing #Nvidia's #NetworkOperator and #GPUOperator

azure.github.io/AKS/2025/04/...

0 0 0 0
Original post on fosstodon.org

Nvidia has been doing a lot of useless stuff lately, but this is actually a big deal. I wonder what the latency looks like on these switches. Traditionally direct-attach copper has always been the preferred choice for low-latency applications, with optics used for longer connections where […]

0 0 0 0
Preview
AI Ate My Blog on RoCEv2 I acknowledge I’ve been a blog technology summarizer for quite a while. It served to help me broaden/solidify my skills and hopefully help others do so as well.

New blog “How AI Ate My Blog on RoCEv2”. #PeterWelcher #CCIE1773 #AI #ECN #PFC #RoCEV2 #Infiniband. URL: www.linkedin.com/pulse/ai-ate...

1 0 0 0

For AI networks, what would you pick: Ethernet or Infiniband? 🧐 Drop your choice down below 👇🏽

#AI #DataCenter #Ethernet #Infiniband #Telemetry #AIOps #Tech #Technology

0 0 0 0
Preview
Blackstone invests $300M in DDN to boost AI storage business – Blocks and Files Private equity firm Blackstone is investing $300 million in privately held DDN, valuing the company at $5 billion.

It seems I have an opportunity to be on the bleeding edge of AI storage! #DDN #HPC #AI #DataDirectNetworks #infiniband #nvidia #lustre #exascaler #scalecomputing #infinia #gpfs

blocksandfiles.com/2025/01/09/b...

1 0 0 0
Post image

Juniper Networks beleuchtet die wichtigsten Trends für Campus und Datacenter 2025​

#AIOps #Datacenter #EdgeDatacenter #Infiniband #ITSecurity @JuniperNetworks #künstlicheIntelligenz #Rechenzentrum #Risikomanagement

netzpalaver.de/2025/...

0 0 0 0
Flux Tutorial Series - "Flux on Azure" (Dinosaur Tutorials) 🌀
Flux Tutorial Series - "Flux on Azure" (Dinosaur Tutorials) 🌀 YouTube video by vsoch

Happy New year from the Flux team! 🌀 I'm excited to share our newest Flux tutorial - a fully automated build (packer) and deployment (Terraform) on Microsoft Azure for a full #HPC cluster ready to go with #Infiniband!

youtu.be/1WhJTKAu05o?...

This is the most fun tutorial of the series yet! 🪩

3 0 1 0