Research Article
Nested Transformers for Hyperspectral Image Classification
Figure 1
Illustration of ViT. An image is split into fixed-size patches, linearly embedded with position embeddings (pos), and the resulting sequence of vectors is fed into a standard Transformer encoder with the standard approach of adding an extra learnable classification token (cls) to the sequence.