Research Article
A Cooperative Lightweight Translation Algorithm Combined with Sparse-ReLU
Table 1
Transformer submodel size.
| Description | Substructure | Layer name | Size |
| Encoder | MultiHeadAttention | cast_queries | (512, 384) | cast_keys_values | (512,768) | cast_output | (384,512) | softmax | softmax | layer_norm | eps = 1e-05 | PositionWiseFCNetwork | LayerNorm | eps = 1e-05 | fc_1 | (512,1024) | fc_2 | (1024,512) | Sparse-ReLU | Sparse-ReLU :a = 0.25,b = 1,c = 0.2,d = 0.4 |
| Decoder | Embedding | Embedding | (10000, 512) | MultiHeadAttention | tgt_emb | (10000, 512) | MultiHeadAttention | pos_emb | (10000, 512) | PositionWiseFCNetwork | Sparse-ReLU | Sparse-ReLU :a = 0.25,b = 1,c = 0.1,d = 0.4 |
| Output | LayerNorm | LayerNorm | eps = 1e-05 | Fc | Fc | (512,10000) |
|
|