News
Abstract: In this paper, we propose a scheme for matrix-matrix multiplication on a distributed-memory parallel computer. The scheme hides almost all of the communication cost with the computation and ...
DBCSR is a library designed to efficiently perform sparse matrix-matrix multiplication, among other operations. It is MPI and OpenMP parallel and can exploit Nvidia and AMD GPUs via CUDA and HIP. To ...
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results