2

Multimodal trials: solve the Masked Language problem about my tiny ALBEF impleme...

 1 month ago
source link: https://donghao.org/2024/04/19/multimodal-trials-solve-the-masked-language-problem-about-my-tiny-albef-implementation-episode-3/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Multimodal trials: solve the Masked Language problem about my tiny ALBEF implementation (episode 3)

I just wrote my implementation of ALBEF in my own way. But when evaluated with some masked sentences, it failed.

I am using this image:

18f8b43c-3d3e-44b1-b400-50f6c991799d.jpg

When I asked “This is a chocolate <|mask|>”, it generated “This is a chocolate urn”. Quite strange

Then I asked “This is a <|mask|> cake, it generated “This is a iph cake”. Totally wrong.

After checking my implementation of the dataset, and training on a small part of CC3M, a week passed and I finally got the reason today: the tiktoken is a BPE tokenizer that will use sub-words as tokens and these sub-words severely hurt the model. For example, sub-words “urn” and “iph” appear too many times and the model would use them to replace the masked word in prediction.

By replacing tiktoken with BertTokenizerFast (from “transformers” package), the model correctly generates “This is a chocolate cake”.

Related Posts

April 19, 2024 - 4:57 RobinDong machine learning
Multimodal, PyTorch
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Comment *

Name *

Email *

Website

Save my name, email, and website in this browser for the next time I comment.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK