Research Article

Nested Transformers for Hyperspectral Image Classification

Figure 1

Illustration of ViT. An image is split into fixed-size patches, linearly embedded with position embeddings (pos), and the resulting sequence of vectors is fed into a standard Transformer encoder with the standard approach of adding an extra learnable classification token (cls) to the sequence.