I tested xattention in A100 and L40s , and got same result. xattention is slower than flash attention in both platforms. q = torch.randn((bsz,heads,seq_len,dim),dtype=torch.bfloat16).to("cuda") k = ...
Error on installing libraries with 'pip' in Anaconda #13594 Open vianarifkhan opened last week ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results