This demo uses a PyTorch Transformer trained from scratch with subword tokenization and beam search decoding.
This diagram illustrates the custom Transformer architecture used in our EN→FR/DE model, including Encoder/Decoder blocks, multi-head attention, and positional encodings.