Question Classification using Self-Attention Transformer — Part 1.1
5 min readJan 1, 2021
Understanding the Multi-Head Self-Attention Transformer network with code in PyTorch
In this part of the blog series, we will be trying to understand the Encoder-Decoder architecture of the Multi-Head Self-Attention Transformer network with some code in PyTorch. There won’t be any theory…