Abstract

The goal of the session-based recommendation system (SBRS) is to predict the user’s next behavior based on anonymous sessions. Since long-term historical information of users is not available, deep learning technology has become the mainstream technology in session-based recommendation systems instead of traditional content-based recommendation methods. However, most SBRS methods only consider the session itself, ignoring the collaborative information from other sessions. Even if some SBRS models consider collaborations between sessions, they mostly use the click order to calculate the similarity only and ignore the time the user spends on different items, which might imply the user’s varying interest on these items. In this paper, we propose a session-based recommendation model with GNN and time-aware memory networks (SR-GTM), which learns the user’s interest representation by combining the information from the session itself and the collaborative information from relevant neighbor sessions. Specifically, SR-GTM mainly includes inner feature extraction module (IFEM) and outer feature extraction module (OFEM). IFEM uses GNN to learn the session features based on its item sequence, and OFEM uses a memory network with dwell time information encoded to extract collaborative information. Finally, SR-GTM aggregates IFEM and OFEM by the gating mechanism and then decodes the output by a softmax layer to obtain the recommendation score for each candidate item. Experiments on three public datasets Yoochoose1/64, Yoochoose1/4, and RetailRocket show that SR-GTM achieves optimal performance compared with other state-of-the-art methods. More specifically, SR-GTM has improvements of 0.77%, 0.38%, and 3.63% over the best baseline method in P@20 and has improvements of 2.91%, 2.52%, and 2.49% in MRR@20, respectively.

1. Introduction

Session-based recommendation (SBR) is a hot research field in recent years. However, traditional methods, such as collaborative filtering [1] and content-based recommendation methods [2], only focus on the user’s long-term static preferences and ignore short-term interactions. On the other hand, they are only suitable for situations where user information is available, and they are powerless for anonymous users. Therefore, some session-based recommendation methods are proposed for SBR, and their task is to predict the user’s next behavior based on the historical behavior of the current session [3].

Previous session-based recommendation methods predict the probability of the next item by calculating the similarity between items [4, 5] or using Markov-decision process-based methods [6]. Traditional recommendation methods only learn the shallow features of items and cannot achieve excellent performance. In recent years, with the advancement of hardware, deep learning technology begins prevailing. Deep learning is often considered a subfield of machine learning. The defining essence of deep learning is that it learns deep representations, that is, learning multiple levels of representations and abstractions from data [7]. With its powerful feature learning ability, deep learning technology shines in SBRS, among which the recurrent neural network (RNN) was notable and has been widely used with its powerful sequence processing capability [3, 811]. GRU4REC [3], NARM [10], STAMP [11], and other methods are all based on RNN, and they treat each session as an ordered sequence, and learn the sequential representation of the session. However, these models only consider the single-way transitions between items and cannot capture complex interactions. Therefore, RNN-based methods perform poorly when facing the problem of inaccessible user behaviors. In recent years, the models based on graph neural network (GNN) has achieved great performance, such as SR-GNN [12], FGNN [13], and GC-SAN [14]. They construct each session as a graph and then use GNN to learn complex interactions between items, which significantly improves the performance.

Besides modeling the internal structure of the sessions, some works explored how to utilize external collaborative information from neighbor sessions, such as CSRM [15] and STAN [16]. More similar sessions are regarded as more collaborative. However, existing works generally ignore the user’s dwell time on items, which can also reflect the user’s preferences. Bogina and Kuflik [17] proved that dwell time can significantly improve the performance for recommendation systems. In this work, we find that when extracting collaborative information, explicitly considering dwell time is also helpful. For example, in Figure 1(a), previous models only cares about items and their interactions in different sessions, so they conclude that session 1 and session 2 provide approximate collaborative contribution to the current session as they all share the same prefix ; therefore, both items and have the same recommendation scores. The model in Figure 1(b) considers user’s dwell time on each item when filtering neighbor sessions. The dwell time of and in the current session is more similar to session 1 than to session 2, so session 1 contributes more collaborative information to the current session and would have higher recommendation score than . The above example demonstrates that dwell time also reflects user’ preferences, and it is not neglectable if we want to obtain a more accurate collaboration measurement between sessions.

In order to solve above problems, we propose a session-based recommendation model with GNN and time-aware memory networks (SR-GTM), which includes four components: item representation learning layer, inner feature extraction module (IFEM), outer feature extraction module (OFEM), and session representation and prediction layer. Specifically, for the current session, SR-GTM first learns the item representations by the item representation learning layer as the input to IFEM and OFEM. Then, IEFM uses position coding and attention mechanisms to extract the features of the session itself by learning the internal item interactions in the current session and obtains the inner session representation. OEFM merges the dwell time embeddings with the item embeddings and encodes the time information to the session representation, which is then passed to the memory module to filter the neighbors of the current session and obtains the collaborative information, i.e., the outer session representation. Finally, the session representation and prediction layer uses a gating mechanism to aggregate the inner session representation and outer session representation into the final session representation and decodes and calculates the recommendation score of each candidate item by the softmax layer.

We conducted extensive experiments on the two benchmark datasets to verify the effectiveness of our model. The results show that the SR-GTM model outperforms state-of-the-art session-based recommendation baselines in terms of two evaluation metrics. Our main contributions are summarized as follows: (1)We propose a session-based recommendation model with GNN and time-aware memory networks (SR-GTM), which learn session’s own information and collaborative information. Compared with the previous SBR model, SR-GTM makes full use of the powerful feature learning ability of GNN and the advantages of memory network in collaborative information extraction to improves the performance of the model(2)An effective inner feature extraction module (IFEM) is proposed to learn the session’s own information. It applies a graph neural network to learn item representations and uses position coding to incorporate the position information in item representations. This improves the performance of the proposed method and experiments also demonstrate its effectiveness(3)Different from the previous models that consider collaborative information, SR-GTM uses a novel neighbor session extraction module (i.e., OFEM), which merges the dwell time and the interactive pattern of the click sequence to extract collaborative representations more accurately. To the best of our knowledge, the method proposed in this paper is the first work that explicitly considers dwell time information when computing session similarity(4)We conduct extensive experiments on two benchmark datasets, and the results show the effectiveness of SR-GTM, compared with the state-of-the-art session-based recommendation methods

This part reviews the related work of SBRS from three perspectives: conventional recommendation methods, deep learning-based methods, and collaborative information-based methods.

2.1. Conventional Recommendation Methods

Collaborative filtering (CF) [18, 19] is a classic recommendation method, which captures user preferences by modelling the interaction between users and items. However, collaborative filtering requires explicit user identity information, which is not suitable for anonymous session recommendation. In addition, due to the data sparsity problem, CF-based recommendation methods usually need to incorporate other information to improve the performance [20]. Some SBRS models [3, 16] use item-KNN [4] to calculate similarity between the last item in the current session and the items in other sessions and achieve great performance. These methods only rely on the similarity between items but ignore the sequence patterns in a session and might miss important information.

Markov chain can predict the next item to be clicked based on the previous item. Shani et al. [6] proposed the Markov decision process (MDP), which takes the session as a Markov chain to predict the next item according to the transition probability between items. Rendle et al. [21] combined matrix factorization and Markov chain and proposed the FPMC model, which learned sequential dependencies by decomposing a personalized probability transition matrix. The Markov chain-based models are better than traditional collaborative filtering-based methods, but it only models transitions between adjacent items and fail to capture long-distance dependencies. Meanwhile, its state space quickly becomes unmanageable when trying to calculate high-order transition probabilities.

2.2. Deep Learning-Based Methods

In recent years, deep learning technology has been widely used in SBRS due to its powerful feature learning capabilities. Hidasi et al. proposed GRU4REC [3], which applied the GRU network to SBRS task for the first time and used session parallel and minibatch for training. Tan et al. [8] further studied the application of GRU network in SBRS task and proposed data enhancement by changing the input data distribution to improve the performance. Li et al. [10] modelled the users’ sequential click behavior and their main purpose, made full use of the information contained in the current session, and proposed a neural attention recommendation machine (NARM). Kang and McAuley [22] applied the self-attention method to SBRS and proposed a self-attention sequence recommendation model (SASRec). Liu et al. [11] used a multilayer perceptron and attention network to learn the user’s general interest and current interest from the current session and proposed a short-term attention priority model (STAMP). Luo et al. [23] proposed a collaborative self-attention network (CoSAN), which not only considers the preference representation of the current session and neighbor sessions but also dynamically learns the item representations. Methods based on RNN and attention mechanism effectively improve the performance of the model compared to traditional methods, but they always model single-way transitions between consecutive items and ignore transitions among the contexts.

Less user behavior and identity information are the main characteristics of SBRS, which are also the reasons why SBRS is challenging. With the widespread application of graph neural networks [24], some SBRS models have begun to use graph to model sessions and use GNN to mine the complex item interactions within a session. Wu et al. [12] proposed to use a gated graph neural network model (SR-GNN) to construct each session sequence as a directed graph, learn the item representations on this session graph, and finally use the attention mechanism to learn the session representation. Subsequently, Xu et al. [25] improved the SR-GNN model and used self-attention networks to learn long-distance dependencies. The FGNN model proposed by Qiu et al. [13] comprehensively considers the explicit order and implicit order in the current session. These methods use graph neural networks to significantly improve the recommendation performance, but when the number of layers in the network increases, overfitting problem is prone to occur. Moreover, these methods only use the information of the session itself, without considering the collaborative information.

2.3. Collaborative Information-Based Methods

The collaborative information in SBRS refers to the auxiliary information from neighbor sessions which are similar to the current session, and the similarity is mainly measured by the item co-occurrence between sessions. Wang et al. [15] proposed a collaborative session recommendation model (CSRM) based on the memory network. This is the first time that collaborative information was considered in the end-to-end SBRS task. CSRM uses an outer memory encoder to learn the current session based on the similarity between sessions. Finally, the fusion gate is used to merge the inner session information and the collaborative information as the session representation. Garg et al. [16] studied the influence of three factors on the SBRS model: the position of the item in the session, the time interval between the neighbor session and the current session, and the position of the item in the neighbor session, and proposed the STAN model. Wang et al. [26] proposed a global context-enhanced graph neural network model (GCE-GNN), which uses global graph and session graph to learn the global context information and local context information of the current session, respectively. Zhang et al. [27] proposed a dual part-pooling attentive networks for session-based recommendation (DPAN4Rec), which applies sequential acquisition and collective acquisition to capture sequential dependencies and collective dependencies in sessions, respectively. Choi et al. [28] proposed session-aware linear item similarity/transition model (SLIST) for considering the holistic aspects of the sessions. SLIST applies two linear models with different perspectives to capture various features of sessions and achieves great performance and highly scalable. Although these models have achieved great performance, they only consider item interactions when filtering neighbor sessions and do not consider the user’s dwell time on each item. Further, due to the limited expressiveness of dot product function [29], the above model cannot describe the different impacts of certain latent factors (e.g., dwell time) when calculating session similarity. Therefore, the model we proposed considers dwell time as auxiliary information to calculate session similarity.

3. Method

In this part, we first give a formulaic description of SBRS task; then, we introduce the SR-GTM model. Finally, we introduce the four modules of SR-GTM, namely, the item representation learning layer, the inner extraction module (IFEM), the outer feature extraction module (OFEM), and session representation and prediction layer.

3.1. Session-Based Recommendation System

The task of the SBRS is to predict the next item based on the current session. In this section, we give a formal description of SBR. Let represent the set of items included in all sessions, where is the total number of all items. An anonymous session is including items can be represented by a chronological list , where represents a clicked item in session , and the goal is to predict . Specifically, for session , SBRS needs to output the probability of all candidate items , where is the recommendation score of item . Finally, the top- items will be recommended to the user.

3.2. Overview of SR-GTM

The workflow of the SR-GTM model is shown in Figure 2. First, SR-GTM embeds the representation of all items into a -dimensional space and then uses GNN to learn the representation of every item. Next, SR-GTM input the learned item representations into the IFEM and the OFEM. IFEM applies position coding and attention mechanism on the representations of the items in the session to learn the inner session representation, i.e., . OFEM combines the dwell time with the item representations and uses the same position coding and attention mechanism as in IFEM to learn the time-aware session representation, which is then fed into the memory module. The memory module selects sessions that are most similar to the current session as the collaborative sessions and assigns different weights for them, denoted as . Finally, the session representation and prediction layer aggregates and using a gating mechanism to obtain the session representation as the user’s interest representation and make recommendations according to the score of each item.

3.3. Item Representation Learning Layer

SR-GTM uses the item representation learning layer to preprocess the initial embedding vectors. Specifically, SR-GTM embeds the items as a -dimensional vector, and then constructs these items as a directed graph, in which the items are regarded as nodes and the adjacency relationships between items as regarded edges. SR-GTM use gated graph neural network (GGNN) to learn the transitions between nodes to produce more accurate item representations.

3.4. Directed Graph Structure

Given a session , we first construct it as a session graph and then learn the embedding of each item by GGNN. Specifically, each item forms a node and each transition forms an edge in graph . We assign a normalized weight to each edge, which is its number of occurrence divided by the out degree of its start node because there may be duplicate items in the session. We construct the outgoing matrix and the incoming matrix of GGNN to update the node vectors. For example, considering a session , the corresponding session graph and connection matrices are shown in Figure 3.

3.5. Update Node Vectors

After constructing the session graph, we need to update the node vectors according to the adjacency matrix. For the node in , the propagation information at time is calculated from connection matrix: where are parameter matrices and are the bias vectors. are the th row of incoming matrix and outgoing matrix corresponding to node , respectively. is the list of node vectors in session at time . represents the propagation information for node at time .

Then, we take and previous state as input and feed into the GGNN to update node vectors: where and are learnable parameters. is the sigmoid function and is element-wise multiplication operator. and are the update gate and the reset gate, which decide how much information of the previous state should be preserved and discarded, respectively.

After learning the vector representation for items, the item representation learning layer passes them into the IFEM and OFEM, respectively.

3.6. Inner Feature Extraction

IFEM learns the inner session representation according to the click order of the items in the session, denoted as .First, in order to learn the position information, we add a position code to each item representation, denoted as :

As sessions have different lengths, we choose the reverse position code to accurately reflect the importance of the most recent items.

After obtaining the vector representation of each item, we combine the global interest and the local interest to generate the session representation according to [10, 12]. Specifically, we take the representation of the last item as the local interest, i.e., , and then use the attention mechanism over all the items in the session to obtain the global interest representation :

Then, we concatenate the local interest and global interest to obtain the inner session representation : where is the concatenation operation. transforms the concatenated vector into the latent space .

3.7. Outer Feature Extraction

In order to better utilize intersession information to generate more accurate session representations, OFEM uses time coding and memory matrix to select sessions most similar to the current session in the recent period as neighbor sessions and generates collaborative session representation.

3.8. Time Information Embedding

For session , the user’s dwell time on each item is explicitly used in OFEM, denoted by . Specifically, SR-GTM feed the time information into OFEM, where t represents the time when the th item was clicked. Dwell time of the th item is defined as the difference between the and the . For the last item , the dwell time is defined as the average dwell time of the previous items:

where represents the length of the current session. Then, we embed the dwell time into the -dimensional vector space, denoted by .

3.9. Collaborative Information Learning

After obtaining the dwell time vector of each item, OFEM combines it with the item representation vector obtained by the item representation learning layer and uses the formulas (3)–(5) to obtain the session representation with time information. Finally, OFEM get the collaborative information of the current session by the memory module.

Specifically, we use the memory matrix to save the most recent sessions. For the current session , we first calculate the cosine similarity with each session in the memory matrix:

where represents the similarity between and . We select most similar sessions as neighbor sessions of the current session, denoted by , and then normalize these similarities:

where represents the normalized weight of the th neighbor, which reflects its influence on the current session . represents the intensity parameter. Larger value may result in larger normalized weight difference.

Finally, according to the weight value of each neighbor session, the collaborative session representation is calculated:

3.10. Session Representation and Prediction

In this layer, we use the fusion gate mechanism to aggregate and to get the final session representation : where and are weight matrices. Then, we normalize the embedding representations of the session representation and each candidate item : where is Euclidean norm. Then, we compute the score for each candidate item by multiplying the session representation with its embedding vector:

Finally, we apply a softmax layer to normalize the scores : where scaling factor is helpful for better convergence [30]. is the probability of item becoming the next item.

We employ cross-entropy as the loss function to train SR-GTM: where denotes the one-hot encoding vector of the ground truth items. Finally, we apply the back-propagation through time (BPTT) algorithm to train SR-GTM. The complete algorithm of SR-GTM is summarized in Algorithm 1.

Input: the input session and time information
Output: Top- recommendations list.
1: 
2: for l in do
3:  Learn the item representation: by Equations (1) and (2)
4: end for
5: Inner feature extraction:
6:  Calculate position information: by Equation (3)
7:  
8:   by Equation (4)
9:  Learn inner session representation: by Equation (5)
10: Outer feature extraction:
11:  Learn dwell time information: by Equations (6) and (7)
12:  
13:  Learn outer session representation: by Equations (8)–(10)
14: Learn session representation: by Equation (11)
15:  by Equation (12)
16:  by Equation (13)
17: Calculate the rating of items: by Equations (14) and (15)
18: Loss function is given by Equation (16)
19: Recommend top- items as the recommendations list

4. Experiments and Analysis

4.1. Datasets

We conduct experiments on two public benchmark datasets, i.e., Yoochoose dataset and RetailRocket dataset, to evaluate the SR-GTM model and baselines. (1)Yoochoose is a public dataset released by RecSys Challenge 2015, which contains user clicks on an e-commerce website within 6 months(2)RetailRocket is also a dataset of an e-commerce website, which includes user behaviour data within 4.5 months

Following [10], we filter out sessions of length 1 and items appeared less than 5 times in both datasets. For Yoochoose, we set the sessions of the last day as the test set and the remaining part as the training set. For sessions with items more than 20, we only adopt the most recent 20 items. Similar to [1012], we split each session for data augment. Specifically, for session , we generate the sequences and corresponding labels as ,,…, for training. For RetailRocket, we use the sessions in the subsequent week for testing, and the rest of the process is the same as Yoochoose. Moreover, we also use the most recent 1/64 and 1/4 fractions of the training sequences of Yoochoose, denoted as Yoochoose1/64 and Yoochoose1/4, respectively. After above processing, the three datasets are summarized in Table 1.

4.2. Baseline Methods

In order to verify the effectiveness of SR-GTM, we compare it with the following baselines: item-KNN [4], FPMC [21], GRU4REC [3], NARM [10], STAMP [11], SR-GNN [12], DPAN4Rec [27], SLIST [28], NISER+ [30], CSRM [15], and STAN [16]. (i)Item-KNN takes the number of cooccurrences of two items in different sessions as their similarity and then recommends items that are most similar to the items in the current session(ii)FPMC is a prediction method for sequential data that combines Markov chain and matrix factorization(iii)GRU4REC uses GRU to model the item sequence in a session and then uses session parallel and minibatch to train the model(iv)NARM introduces the attention mechanism in RNN to learn the user’s sequential behavior and main purpose(v)STAMP combines the general interest learnt from the session by the attention mechanism and the current interest learnt form the last item to do the prediction(vi)SR-GNN uses gated graph neural network to study the item representation, and uses the attention mechanism to calculate the session representation(vii)DPAN4Rec applies sequential acquisition and collective acquisition to capture sequential dependencies and collective dependencies in sessions, respectively(viii)SLIST applies two linear models with different perspectives to capture various features of sessions(ix)NISER+ uses normalized item and session representations to solve the long tail problem and alleviate the popularity bias problem(x)CSRM uses two parallel memory modules to model the user’s preference in the current session and the collaborative information in the neighbor’s session(xi)STAN considers the item position in the current session, the time interval between the neighbor sessions and the current session, and the position of the collaborative item in the neighbor sessions, and significantly improves the performance of model

4.3. Parameter Setup

We set the hidden vector dimension , the scaling factor , the number of layers , and the number of neighbor sessions in the memory module . Following [10, 12], we use 10% of the training data as the validation set and adjust the hyperparameters on this set. SR-GTM uses the Adam optimizer. The initial learning rate is set to 0.001, and the decay factor is 0.1 for every 3 epochs. The batch size is 100, and the L2 penalty factor is set to 10-5. In addition, all parameters are initialized randomly following a Gaussian distribution with a mean of 0 and a standard deviation of . We run SR-GTM five times on three datasets and take the average as the final result.

4.4. Evaluation Metrics

Following [12], we adopt P@K and MRR@K as the evaluation metrics, is the size of the recommendation list. where is the number of sessions in the test set. is the number of sessions that contains the target top- recommended items, is the target item in the th session, and denotes the rank of .

4.5. Performance Comparison

We compared SR-GTM with the baseline models. Table 2 shows the experimental results on the three datasets. The best result in each column is highlighted in bold, and the second-best result is underlined. It can be seen from Table 2 that our proposed SR-GTM model on the two metrics P@20 and MRR@20 outperforms other models, which shows that the collaborative information effectively improves the performance of the model, and verifies the effectiveness of SR-GTM.

As shown in Table 2, the overall performance of the conventional models is the worst. For example, FPMC only uses the Markov properties of the last items in a session to recommend the next item without consider the interactions between other items, so the session is not taken as a whole and it is difficult to get good performance. Compared with conventional models, several deep learning-based models have significantly improved two metrics on three datasets, which also proves the effectiveness of deep learning methods in the recommendation field. Specifically, NARM combines the attention mechanism and the RNN network and has significantly improved the performance compared with GRU4REC that only uses the RNN network. This can be attributed to the attention mechanism which can highlight important information and filter out noise data. The GNN-based methods, such as SR-GNN, are the best category in the baselines. This is because the graph neural network can learn the complex item interactions in the session and can extract high-order features.

In addition, we can observe that compared with RNN-based and attention-based models such as NARM, STAMP, and DPAN4Rec models that consider collaborative information, such as CSRM and STAN, have improvements on the two metrics of P@20 and MRR@20, verifying that the collaborative information of neighbor sessions is beneficial to improve the recommendation performance. Since CSRM and STAN are both RNN-based models, their performance is not as good as SR-GNN, NISER+, and other GNN-based models. Notably, SLIST adopts a linear item-item model to fully utilize features of sessions and achieves comparable performance with existing RNN-based models, which illustrates the importance of fully utilizing features to improve model performance.

Among all the models, our proposed model SR-GTM achieves the best performance on the three datasets, which is mainly due to the two feature extraction modules of SR-GTM, i.e., IFEM and OFEM. Compared with the GNN-based models, SR-GTM considers the collaborative information to gain better performance; compared with the models that consider the collaborative information (such as CSRM), SR-GTM uses the GNN to learn more complex items interactions, and dwell time is also considered when extracting neighbor sessions, which further improves the performance. Specifically, SR-GTM has improvements of 0.77%, 0.38%, and 3.63% over the best baseline NISER+ in P@20 on all three datasets and has improvements of 2.91%, 2.52%, and 2.49% in MRR@20, respectively. Based on the above results, two conclusions can be drawn: First, on the two datasets of Yoochoose, SR-GTM has more improvements on MRR@20 than on P@20, which means that on the Yoochoose dataset, SR-GTM is better at optimizing the position of the target product in the recommended list. Second, the improvement of SR-GTM in the P@20 on the RetailRocket dataset is significantly greater than that of the other two datasets. We attribute this to the shorter average session length of the RetailRocket dataset, which may introduce less noise data from the collaborative information, making the model much easier to learn accurate user interests.

4.6. Influence of OFEM

To further study the influence of OFEM on SR-GTM, we design experiments to evaluate the model under different settings. Specifically, we use SR-GTM-NT to represent the model from which the user’s dwell time is removed, and we use SR-GTM-IFEM to represent the model from which OFEM is removed. The results of SR-GTM and its two variant models on the two datasets i.e., Yoochoose1/64 and RetailRocket, are shown in Table 3:

According to the results in Table 3, we can draw the following conclusions. First, OFEM is an indispensable component for SR-GTM. Specifically, SR-GTM achieves significant performance improvement over SR-GTM-IFEM on all metrics and datasets, which shows that dwell time-aware collaborative information can more accurately capture user interests and give more accurate recommendation results. Secondly, by comparing the SR-GTM-NT and SR-GTM, we can conclude that considering dwell time information does not ensure performance improvements in all cases. On the RetailRocket dataset, employing the dwell time has provided varying positive contributions on all 4 metrics. On the Yoochoose1/64 dataset, the performance is only improved on the MRR metrics. However, for the two metrics P@10 and P@20, the performance gap between the two models is very small. Although introducing dwell time cannot guarantee better performance in all scenarios, the benefits it brings are still considerable.

4.7. Influence of Position Information

The click-order of items in a session can reflect the user’s interest to some extent as it implies user’s interest shift over time. SR-GTM uses reverse position embedding to learn positional influence of items. We design two models for comparison: SR-GTM-NP does not use position embedding, and SR-GTM-FP uses forward position embedding. The comparative results of the three models are summarized in Table 4.

From Table 4, it can be seen that SR-GTM has the best performance on all metrics and all datasets. SR-GTM-NP performs the worst among the three models, because it does not explicitly consider the position information of the items in a session. The other two models outperform SR-GTM-NP, which verifies the effectiveness of position information. SR-GTM has better performance over SR-GTM-FP, as the latter one uses forward position embedding, which cannot determine the relative position between a items in the session and the predicted item, while the reverse position embedding can better reflects this relationship.

4.8. Influence of Different Aggregation Operations

SR-GTM uses a gating mechanism to aggregate the output of IFEM and OFEM. In order to verify the effectiveness of this gating-based aggregation method, we designed experiments to compare several aggregation methods: (i)SR-GTM-MAX use maximum pool mechanism. It aggregates and by taking the maximum value of every dimension. The th dimension of the session representation vector is obtained by the following formula:(ii)SR-GTM-CAT uses the concatenation operation. and are concatenated to calculate the session representation :

The results of the above aggregation methods and the SR-GTM are as summarized in Table 5.

It can be seen from Table 5 that SR-GTM has better performance than the other two aggregation methods, which verifies that the gating mechanism is effective. In addition, SR-GTM-MAX performs the worst among the three models, as the maximum pooling operation simply selects the maximum value of each dimension of IFEM and OFEM outputs and neglects their interactions. SR-GTM-CAT concatenates IFEM and OFEM outputs and learns their interaction in each dimension by the parameter learning. It achieves better performance than maximum pooling but is still incomparable to the gating mechanism. This may be because the concatenation mechanism learns more fine-grained interactions between IFEM and OFEM which might lead to over-fitting problem, while the gating mechanism takes either IFEM or OFEM as an individual component and learns their interactions in a more coarse-grained way.

4.9. Influence of Different Numbers of Neighbors

In order to further explore the influence of collaborative information, we design experiments to evaluate the influence of the numbers of selected neighbor sessions. We take the number of neighbors , and the results are shown in Table 6.

It can be seen from Table 6 that, on the Yoochoose1/64 dataset, in general, as the number of neighbors increases, the performance of the model gets better and better. But when , the P@20 is higher than when . For RetailRocket dataset, SR-GTM performs best when . This may be due to the short average session length of RetailRocket dataset, and too many neighbor sessions will introduce more noisy data. From the experiment, we can also see that the number of neighbor sessions has a greater impact on RetailRocket dataset than on Yoochoose1/64 dataset, and the RetailRocket dataset is easier to be affected by the number of neighbor sessions.

5. Conclusion

In this work, we propose a session-based recommendation model with GNN and time-aware memory networks (SR-GTM). SR-GTM uses IFEM and OFEM to extract the inner session representation and collaborative session representation and uses the gating aggregation mechanism to learn the final session representation. The results on two standard datasets show that our proposed SR-GTM model can effectively improve the performance. In addition, we have also designed the comparative experiments to study the influence of OFEM, position embedding, aggregation mechanism, and the influence of the number of neighbor sessions, which verifies the effectiveness of SR-GTM.

In future work, first of all, we hope to further explore the relationship between the user’s dwell time and user interest. Secondly, it takes a long time to train the GNN-based model and consumes a lot of resources. We hope to reduce the complexity of the model in the premise of ensuring the performance of the model.

Data Availability

The Yoochoose and Diginetica datasets used to support the findings of this study are openly available at https://www.kaggle.com/chadgostopp/recsys-challenge-2015 and https://www.kaggle.com/retailrocket/ecommerce-dataset.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the Humanities and Social Science Research Project of the Ministry of Education under Grant 17YJCZH187, the Taishan Scholar Climbing Program of Shandong Province under Grant No. ts2090936, SDUST Research Fund under Grant No. 2015TDJH102, 2021 National Statistical Science Research Project under Grant 2021LY053, Shandong Postgraduate Education Quality Improvement Plan (No. SDYJG19075), and Shandong Education Teaching Research Key Project (No. 2021JXZ010).