CPU beating GPU. Lol NVIDIA. SELL! SELL!
https://news.rice.edu/2020/03/02/deep-learning-rethink-overcomes-major-obstacle-in-ai-industry/
arxiv: https://arxiv.org/pdf/1903.03129.pdf
Conclusion:
We provide the first evidence that a smart algorithm with
modest CPU OpenMP parallelism can outperform the best
available hardware NVIDIA-V100, for training large deep
learning architectures. Our system SLIDE is a combination
of carefully tailored randomized hashing algorithms with
the right data structures that allow asynchronous parallelism.
We show up to 3.5x gain against TF-GPU and 10x gain
against TF-CPU in training time with similar precision on
popular extreme classification datasets. Our next steps are to
extend SLIDE to include convolutional layers. SLIDE has
unique benefits when it comes to random memory accesses
and parallelism. We anticipate that a distributed implementation of SLIDE would be very appealing because the
communication costs are minimal due to sparse gradients.