↧
Answer by Peter Cordes for AVX512BW: handle 64-bit mask in 32-bit code with...
First of all, if your program depends much on strlen performance for large buffers, you're probably doing it wrong. Use explicit-length strings (pointer + length) like std::string so you don't have to...
View ArticleAnswer by Renat for AVX512BW: handle 64-bit mask in 32-bit code with bsf /...
Looks like KSHIFTRQ could be used as an alternative, to right-shift top 32-bits of k0 counter to be lower 32-bits, which could be copied to the regular purpose register. Like: .check_next_dword: add...
View ArticleAVX512BW: handle 64-bit mask in 32-bit code with bsf / tzcnt?
this is my code for 'strlen' function in AVX512BW vxorps zmm0, zmm0, zmm0 ; ZMM0 = 0 vpcmpeqb k0, zmm0, [ebx] ; ebx is string and it's aligned at 64-byte boundary kortestq k0, k0 ; 0x00 found ? jnz...
View Article