Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于可视化的一些问题 #17

Open
fmm170 opened this issue Jan 15, 2024 · 4 comments
Open

关于可视化的一些问题 #17

fmm170 opened this issue Jan 15, 2024 · 4 comments

Comments

@fmm170
Copy link

fmm170 commented Jan 15, 2024

您好~最近阅读了您的论文和代码,想问一下这个可视化是如何实现的呢?谢谢~
efb0648f9166e1bc07f46409e8cfc5fa

@leanwang326
Copy link
Collaborator

这个是用bertviz包来画的,黄色的高亮是手动画的

@fmm170
Copy link
Author

fmm170 commented Jan 19, 2024

您好~我用bertviz的head_view可视化了gpt2-xl的第44层,可视化结果显示第一个token的attention更大,我不太清楚是不是衡量指标不同还是其他原因?谢谢~
44层
44层2

@leanwang326
Copy link
Collaborator

哦是这个样子的,首先,我们这边可视化的saliency,而不是attention,因为attention的大小又可能和重要性还差一些(比如有文章就说应该再用value vector的norm修正)
其次,在attention的值方面,这个似乎是个非常典型的特征,大概和 Attention Is Off By One讲的有点关系(就是可能开头的bos token起了相当于Attention Is Off By One提到的softmax里+1的作用),所以这个其实应该也不意味着模型在关注这个token,可能就是为了修正attention的值

@fmm170
Copy link
Author

fmm170 commented Jan 22, 2024

哦哦,确实在streaming LLM中也有相关描述,谢谢您的解答~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants