News

To understand how grid/block size affects GPU throughput, we use a CUDA kernel below to give a demonstration. The kernel adds two one-dimensional (1D) vectors on global memory and combines them to ...
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators - ROCm/composable_kernel ...
During last week's presentation, some had hopes for PCIe Gen 5.0 connectivity. However, the presentation and slides did not mention a thing about the version used. A newly leaked block diagram ...
A leaked slide from an AMD presentation sheds light on a few specific details of AMD's design goals with Navi 31.
NVIDIA's DeepSeek-R1 model uses inference-time scaling to improve GPU kernel generation, optimizing performance in AI models by efficiently managing computational resources during inference.
Modern supercomputers are increasingly using GPUs to improve performance per watt. Generating GPU code for target regions in openMP 4.0, or later versions, requires the selection of grid geometry to ...
A leaked block diagram reveals AMD may have a missed an opportunity for some bragging over rights over Intel in the gaming laptop segment.