Research Article

A Cooperative Lightweight Translation Algorithm Combined with Sparse-ReLU

Table 1

Transformer submodel size.

DescriptionSubstructureLayer nameSize

EncoderMultiHeadAttentioncast_queries(512, 384)
cast_keys_values(512,768)
cast_output(384,512)
softmaxsoftmax
layer_normeps = 1e-05
PositionWiseFCNetworkLayerNormeps = 1e-05
fc_1(512,1024)
fc_2(1024,512)
Sparse-ReLUSparse-ReLU :a = 0.25,b = 1,c = 0.2,d = 0.4

DecoderEmbeddingEmbedding(10000, 512)
MultiHeadAttentiontgt_emb(10000, 512)
MultiHeadAttentionpos_emb(10000, 512)
PositionWiseFCNetworkSparse-ReLUSparse-ReLU :a = 0.25,b = 1,c = 0.1,d = 0.4

OutputLayerNormLayerNormeps = 1e-05
FcFc(512,10000)