Multi-Head Attention
IntermediateAllows model to attend to information from different subspaces simultaneously.
AdvertisementAd space — term-top
Definition
Full Definition
Allows model to attend to information from different subspaces simultaneously.