Transformer Model: Implement Encoder
In this tutorial, we’ll implement the Transformer Encoder. We’ll first discuss the internal components of Transformer Encoder and then implement them. Also, we’ll discuss the difference between Post-Layer-Normalization and Pre-Layer-Normalization, and try to understand which one works better and why. At the end, we’ll test our implementation with dummy inputs.
The code used in this tutorial is available here:
Self Attention - https://github.com/makeesyai/makeesy-deep-learning/blob/main/seq2seq/attention.py
Encoder Module - https://github.com/makeesyai/makeesy-deep-learning/blob/main/seq2seq/transformer_encoder.py
The papers mentioned in this tutorial -
Attention Is All You Need - https://arxiv.org/pdf/1706.03762.pdf
On Layer Normalization in the Transformer Architecture - https://arxiv.org/pdf/2002.04745.pdf
#pytorch #tutorial #transformer #encoder