From my experience the bigger frameworks may have support for non-CUDA devices (that is not just the CPU fallback) but many smaller libraries and models will not, and will only have a CUDA kernel for some specialized operation.
I encounter this all the time in computer vision models.
I encounter this all the time in computer vision models.