There are really two separate shuffle issues. The first is that if the mask register isn't constant, you always get slow code. The fix for that is a new API: github.com/dotnet/runti....
Second issue is that even constants aren't always recognized. Both are still in net9.0.
There are really two separate shuffle issues. The first is that if the mask register isn't constant, you always get slow code. The fix for that is a new API: github.com/dotnet/runti....
Second issue is that even constants aren't always recognized. Both are still in net9.0.