Gpu | Towards Data Science

AI in Multiple GPUs: How GPUs Communicate

Artificial Intelligence

A deep dive into the hardware infrastructure that enables multi-GPU communication for AI workloads

Lorenzo Cesconetto

February 19, 2026

5 min read

AI in Multiple GPUs: Point-to-Point and Collective Operations

Artificial Intelligence

Learn PyTorch distributed operations for multi GPU AI workloads

Lorenzo Cesconetto

February 13, 2026

10 min read

AI in Multiple GPUs: Understanding the Host and Device Paradigm

Artificial Intelligence

Learn how CPU and GPUs interact in the host-device paradigm

Lorenzo Cesconetto

February 12, 2026

7 min read

Breaking the Hardware Barrier: Software FP8 for Older GPUs

Deep Learning

Deep learning workloads are increasingly memory-bound, with GPU cores sitting idle while waiting for data…

Suriyaa MM

December 28, 2025

8 min read

How to Keep AI Costs Under Control

Artificial Intelligence

Lessons from Scaling LLMs

Asaf Liveanu

October 23, 2025

4 min read

Learning Triton One Kernel at a Time: Matrix Multiplication

Machine Learning

Tiled GEMM, GPU memory, coalescing, and much more!

Ryan Pégoud

October 14, 2025

13 min read

Use PyTorch to Easily Access Your GPU

Programming

Or … how an ML library can accelerate non-ML computations

Thomas Reid

May 21, 2025

12 min read

From Local to Cloud: Estimating GPU Resources for Open-Source LLMs

Estimating GPU memory for deploying the latest open-source LLMs

Maxime Jabarian

November 18, 2024

4 min read

How to Reduce Python Runtime for Demanding Tasks

Data Science

Practical techniques to accelerate heavy workloads with GPU optimization in Python

Jiayan Yin

November 17, 2024

8 min read

Image source: https://pxhere.com/en/photo/872846

The Mystery Behind the PyTorch Automatic Mixed Precision Library

Deep Learning

How to get 2X speed up model training using three lines of code

Mengliu Zhao

September 17, 2024

9 min read