Spaces:

giulio98
/

beyondrag

Running on Zero

setting inf in attention matrix

by xiaohanai - opened 12 days ago

12 days ago

It seems like in line 426-428 you set everything before the sink tokens to have inf attention score, and everything in the question tokens to also have inf attention, then wouldn't torch.topk in line 439 simply pick the first k inf values? Maybe I have a misunderstanding but I thought it should be setting the sink tokens to be -inf and question tokens to be -inf, because otherwise what's the point of calculating the attention matrix if torch.topk is just gonna select the +inf values. Please correct me if I'm wrong.

giulio98

Owner 12 days ago

Hello!

Thanks for the question.

In this implementation we keep always the first sink_tokens and the question tokens, everything in between istead is taken with respect to their score(importance) with respect to task/fs/question this is because sink tokens are very important because all the other succeding tokens attends to them see https://arxiv.org/pdf/2309.17453
and question tokens are preserved because are needed by the model to solve the task.

Best

Giulio

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment