License: CC BY 4.0
arXiv:2606.21540v1 [cs.LG] 19 Jun 2026

Memory Is No Longer a Bottleneck: Memory-Efficient Graph Filtering for Scalable Collaborative Filtering

Jin-Duk Park and Won-Yong Shin, Senior Member, IEEE J.-D. Park and W.-Y. Shin are with the School of Mathematics and Computing (Computational Science and Engineering), Yonsei University, Seoul 03722, Republic of Korea.
E-mail: {jindeok6, wy.shin}@yonsei.ac.kr (Corresponding author: Won-Yong Shin.)
Abstract

Graph convolutional networks (GCNs) have demonstrated significant success in capturing complex user–item relationships for collaborative filtering (CF). However, due to their reliance on extensive model training, training-free graph filtering (GF)-based CF methods have emerged as a promising alternative, offering computational efficiency by smoothing graph signals via matrix operations. In particular, polynomial GF-based approaches demonstrate improved accuracy through their ability to design more expressive and flexible filtering functions. Despite these advantages, existing GF methods suffer from a critical memory bottleneck: they necessitate storing the full item similarity graph, incurring prohibitive memory costs for large-scale datasets, which limits their practical applicability. To tackle this challenge, we propose Mem-GF (Memory-efficient GF), a new GF-based CF method that departs from conventional designs by principally leveraging the structure of Krylov subspaces as a core mechanism for approximating polynomial graph filters without explicitly storing the item similarity graph. We theoretically analyze the minimum Krylov subspace size that guarantees lossless approximation. Through extensive experiments, we demonstrate that Mem-GF achieves up to 5.74×\times lower memory usage and 4.38×\times speedup in runtime, while consistently exceeding the recommendation accuracy of state-of-the-art GF and GCN-based methods. Mem-GF robustly scales to datasets with tens of millions of interactions, establishing itself as a practically viable and theoretically grounded solution for efficient CF. The source code of Mem-GF is available at https://github.com/jindeok/Mem-GF.

I Introduction

I-A Background and Motivation

Recommender systems have become a cornerstone component of modern online platforms [25, 3, 6, 7]. By leveraging the information of historical user–item interactions, collaborative filtering (CF) has proven one of the most effective recommendation techniques in uncovering latent user interests and item affinities, facilitating personalization and relevance for positive user experiences [11, 23, 5, 10].

On one hand, graph convolutional networks (GCNs) [18, 10, 52] have emerged as a state-of-the-art CF solution. Thanks to their intrinsic ability to model the high-order connectivity between users and items, GCN-based approaches have achieved high accuracy in various recommendation scenarios [10, 52, 30, 56, 15]. However, GCNs inherently suffer from scalability issues due to their recursive message-passing nature, which may limit their applicability to large-scale systems [1, 60]. As a result, the training time required for GCNs is often significantly longer compared to other architectures such as multi-layer perceptrons (MLPs) that are widely used for CF [60, 64]. This increased computational demand can hinder the efficiency of real-time recommendation systems, making it crucial to explore alternative methods that balance accuracy and scalability.

On the other hand, from the spectral viewpoint, GCNs can be regarded as approximate and learnable graph filters via parameter optimization [18, 43, 27]. To mitigate the extensive training time associated with GCNs, recent research on CF has increasingly focused on graph filtering (GF)-based CF methods that leverage predefined graph filters instead of learnable ones [43, 31, 59, 58, 27]. Due to their training-free nature, GF has been successfully applied beyond standard training-based CF to diverse recommendation scenarios, including sequential [58], group [15], cross-domain [20], multi-modal [40], and multi-criteria [32] recommendations, demonstrating its broad applicability and effectiveness. By using the item similarity graph, these GF-based CF methods transform each user’s interaction vector (i.e., graph signals) via various forms of graph filters, such as linear/ideal low-pass filters (LPFs) or polynomial approximations, to improve computational efficiency of the GF process [43, 58, 27, 31, 59, 36].

Refer to caption
Figure 1: Memory usage (RAM/VRAM) comparison during preprocessing among six representative GF-based CF methods and our method (Mem-GF) on the Amazon-book dataset. Here, ‘OOM’ represents an out-of-memory error.
Refer to caption
Figure 2: Comparison of space consumption between conventional GF-based CF methods and the proposed Mem-GF. Here, |||\mathcal{I}| represents the number of items in a dataset.

Despite these merits, many GF-based CF methods [43, 58, 27, 31, 59] still face a critical memory bottleneck when high-accuracy filters are deployed at scale. Fig. 1 showcases the RAM and VRAM (i.e., GPU memory) usage of six representative GF-based approaches (such as EASE [46], GF-CF [43], PGSP [27], Turbo-CF [31], FPSR [54], and HiGSP [59]) on the Amazon-book dataset [8]. In a single-node environment equipped with 48 GB of VRAM and 128 GB of RAM, most of these methods suffer from out-of-memory (OOM) errors, underscoring their limited feasibility for large-scale recommendation scenarios. The core design issue is not merely implementation, but the reliance on a universal (i.e., user-agnostic) graph filter whose dimension is ||×|||\mathcal{I}|\times|\mathcal{I}|, where |||\mathcal{I}| is the item cardinality, i.e., the number of items. Even when sparse matrix operations, top-kk eigenvalue components, or polynomial graph filters reduce computation, the filtering stage still tends to depend on the full-size item similarity matrix. As illustrated in Fig. 2, applying such filters uniformly to all user signals severely limits scalability and can make real-time inference infeasible under memory-constrained environments.

I-B Main Contribution

To address this challenge, we introduce Mem-GF, a memory-efficient GF method for CF that goes beyond straightforward Krylov subspace usage. Instead of treating Krylov methods [42, 47] as a computational shortcut, Mem-GF uses them as the filtering principle itself. For each user, a Krylov subspace captures progressively transformed directions of the user’s interaction vector under repeated applications of the item similarity matrix. This enables polynomial graph filtering without explicitly storing the full item similarity matrix. Importantly, Mem-GF facilitates the design of versatile graph filters through high-order polynomial approximations, exemplified by the fifth-order polynomial that closely approximates Gaussian filters widely used in image and graph domains [13, 21]. Notably, Mem-GF offers significant advantages in both preprocessing and inference efficiency. In particular, its memory-efficient and low-latency inference capability addresses a critical requirement for deploying GF-based recommenders in real-world environments, where tight hardware constraints and real-time responsiveness are essential [14, 22].

Our contributions are summarized as follows:

  • Problem identification: We clarify the universal (i.e., user-agnostic) graph filter design that causes the memory bottleneck in many GF-based CF methods and empirically demonstrate its impracticality on large-scale datasets.

  • Principled Krylov-based solution: We propose a novel GF approach that integrates Krylov subspaces at its core, achieving per-user filter approximation without storing the full item similarity matrix, thereby enhancing filter design flexibility.

  • Theoretical support: We rigorously analyze the condition on the number of Krylov subspace bases for which no approximation error occurs. We also theoretically show that Mem-GF achieves linear space/time complexity to the number of items, users, and interactions.

  • Comprehensive evaluation: We empirically show that Mem-GF delivers 5.74×\times memory savings and 4.38×\times speedup. Despite being primarily designed for improving efficiency, it still achieves state-of-the-art recommendation accuracy across all benchmark datasets.

I-C Organization and Notations

The remainder of this paper is organized as follows. Section II surveys related work. Section III introduces the preliminaries by reviewing GF and Krylov subspaces. Section IV presents the proposed Mem-GF method, including its methodological details, theoretical analysis, and complexity analysis. Section V reports extensive experimental results and analyses evaluating efficiency, scalability, and accuracy. Section VI concludes the paper with a discussion of future research directions.

Table I summarizes the notation that is used in this paper. This notation will be formally defined in the following sections when we introduce our methodology.

TABLE I: Summary of notations.
Notation Description
𝒰,\mathcal{U},\ \mathcal{I} Set of users, 𝒰\mathcal{U}, and set of items, \mathcal{I}
𝐑\mathbf{R} User–item interaction matrix
𝐫u\mathbf{r}_{u} uu-th row of the interaction matrix 𝐑\mathbf{R} (graph signal for user uu)
𝐑~\tilde{\mathbf{R}} Degree-normalized interaction matrix: 𝐑~=𝐃𝒰γ𝐑𝐃1γ\tilde{\mathbf{R}}=\mathbf{D}_{\mathcal{U}}^{\gamma}\mathbf{R}\mathbf{D}_{\mathcal{I}}^{1-\gamma}
𝐑¯\bar{\mathbf{R}} Adjusted interaction matrix: 𝐑¯=𝐑~s\bar{\mathbf{R}}=\tilde{\mathbf{R}}^{\circ s}
𝐃𝒰,𝐃\mathbf{D}_{\mathcal{U}},\ \mathbf{D}_{\mathcal{I}} Degree matrices: 𝐃𝒰=diag(𝐑𝟏),𝐃=diag(𝟏𝐑)\mathbf{D}_{\mathcal{U}}=\mathrm{diag}(\mathbf{R}\mathbf{1}),\ \mathbf{D}_{\mathcal{I}}=\mathrm{diag}(\mathbf{1}^{\top}\mathbf{R})
𝐏~\tilde{\mathbf{P}} Item similarity matrix: 𝐏~=𝐑¯𝐑¯\tilde{\mathbf{P}}=\bar{\mathbf{R}}^{\top}\bar{\mathbf{R}} for Mem-GF and 𝐏~=𝐑~𝐑~\tilde{\mathbf{P}}=\tilde{\mathbf{R}}^{\top}\tilde{\mathbf{R}} for the other methods
𝐋,𝐔,𝚲\mathbf{L},\ \mathbf{U},\ \boldsymbol{\Lambda} Graph Laplacian 𝐋\mathbf{L}, its eigenvectors 𝐔\mathbf{U}, and eigenvalues 𝚲\boldsymbol{\Lambda}
H(),h()H(\cdot),\ h(\cdot) Graph filter H()H(\cdot) and its frequency response h()h(\cdot)
𝐬u\mathbf{s}_{u} Predicted scores for user uu: 𝐬u=𝐫u2𝐐uf(𝐓u)𝐞1\mathbf{s}_{u}=\|\mathbf{r}_{u}\|_{2}\,\mathbf{Q}_{u}\,f(\mathbf{T}_{u})\,\mathbf{e}_{1}
𝒦K()\mathcal{K}_{K}(\cdot) KK-dimensional Krylov subspace
𝐐u,𝐓u\mathbf{Q}_{u},\ \mathbf{T}_{u} User-specific Krylov basis 𝐐u\mathbf{Q}_{u} and projected tridiagonal matrix 𝐓u=𝐐u𝐏~𝐐u\mathbf{T}_{u}=\mathbf{Q}_{u}^{\top}\tilde{\mathbf{P}}\mathbf{Q}_{u} in the Krylov subspace
KK Size of Krylov basis
γ\gamma Hyperparameter for normalizing 𝐑\mathbf{R}
ss Hyperparameter for adjusting 𝐑~\tilde{\mathbf{R}} via the Hadamard power
𝐞1\mathbf{e}_{1} First canonical basis vector in K\mathbb{R}^{K}

II Related Work

In this section, we review CF methods related to Mem-GF, with emphasis on scalable GF-based CF and Krylov-based graph learning.

MF-based CF methods. MF remains a foundational technique in CF, representing users and items as latent vectors and using their dot product to approximate interactions. Early approaches relied on low-rank decompositions [9], while NeuMF [11] enhanced expressiveness by introducing an MLP-based similarity function. GRMF [38] integrated graph Laplacian regularization, and DMF [61] leveraged deep neural architectures. HOP-rec [62] bridged MF and GCN by explicitly capturing higher-order relationships in user–item graphs.

GCN-based CF methods. Viewing user–item interactions as a bipartite graph enables GCN-based models to exploit high-order connectivity. GC-MC [48] merged GCNs with matrix completion for link prediction. PinSage [63] scaled graph convolution with random walks. SpectralCF [65] used spectral graph structure, NGCF [52] modeled higher-order collaborative signals, LightGCN [10] simplified propagation layers, and SGL [55] adopted contrastive learning for robust representation learning.

Autoencoder-based CF methods. Autoencoder-based approaches reconstruct each user’s interaction vector in a single latent space. CDAE [57] incorporated denoising, Multi-DAE and Multi-VAE [23] improved reconstruction with deep and variational encoders, RecVAE [44] added regularization, and MD-CVAE [66] injected item embeddings into latent user modeling.

GF-based CF methods. Training-free GF-based CF methods avoid iterative model training by filtering user signals on an item similarity graph. GF-CF [43] uses linear and ideal LPFs; EASE [46] can be interpreted from a graph signal processing viewpoint; PGSP [27] augments the similarity graph; HiGSP [59] customizes filters by user clusters; Turbo-CF [31] accelerates inference with predefined polynomial LPFs; SGFCF [34] uses top-kk singular values; and ChebyCF [16] employs Chebyshev polynomial filtering. These methods improve scalability through sparse matrix operations, top-kk eigenvalue components, polynomial graph filters, or user clustering, but they still commonly rely on a large item similarity graph or a full-size graph filter. In contrast, Mem-GF performs GF in a user-specific Krylov subspace and avoids constructing the full item similarity matrix.

Krylov-based graph learning. LanczosNet [24] also uses Krylov subspaces, but for learnable spectral graph convolution in GNNs. Its goal is representation learning over graph nodes, requiring training and inference over the graph. Mem-GF differs in its purpose and inference setting: it is a training-free CF method that builds per-user Krylov subspaces for the item similarity matrix to reduce memory usage during recommendation. Thus, Krylov subspaces are not used for training-based GNNs, but as the core mechanism for memory-efficient graph filtering in CF.

III Preliminaries

This section presents the preliminaries, providing an overview of the fundamental concepts underlying GF and Krylov subspaces.

III-A Characterization of GF

We begin by presenting the basic concepts of GF (also known as graph signal processing). First, let us consider a weighted, undirected graph G=(V,E)G=(V,E), represented by an adjacency matrix 𝐀|V|×|V|\mathbf{A}\in\mathbb{R}^{|V|\times|V|}. A graph signal assigns a scalar or feature vector to each node; in this paper, we primarily consider a scalar signal represented as 𝐱=[x1,x2,,x|V|]T|V|\mathbf{x}=[x_{1},x_{2},\ldots,x_{|V|}]^{T}\in\mathbb{R}^{|V|}, where xix_{i} indicates the signal strength of node ii. The smoothness of 𝐱\mathbf{x} on GG is quantified by the graph quadratic form, a measure based on the graph Laplacian 𝐋=𝐃𝐀\mathbf{L}=\mathbf{D}-\mathbf{A},111Alternatively, one may use the normalized Laplacian 𝐋=𝐈𝐀~\mathbf{L}=\mathbf{I}-\tilde{\mathbf{A}}, where 𝐀~=𝐃1/2𝐀𝐃1/2\tilde{\mathbf{A}}=\mathbf{D}^{-1/2}\mathbf{A}\mathbf{D}^{-1/2}. where 𝐃=diag(𝐀𝟏)\mathbf{D}=\text{diag}(\mathbf{A}\mathbf{1}) is the degree matrix and 𝟏\mathbf{1} denotes the all-ones vector of any dimension for simplicity. The smoothness measure S(𝐱)S(\mathbf{x}) is formally given by

S(𝐱)=i,jAi,j(xixj)2=𝐱T𝐋𝐱.S(\mathbf{x})=\sum_{i,j}A_{i,j}(x_{i}-x_{j})^{2}=\mathbf{x}^{T}\mathbf{L}\mathbf{x}. (1)

The smaller the value of S(𝐱)S(\mathbf{x}), the smoother the signal 𝐱\mathbf{x} on the graph [45, 43]. Next, we define the graph Fourier transform (GFT) of a signal 𝐱\mathbf{x} as 𝐱^=𝐔T𝐱\hat{\mathbf{x}}\;=\;\mathbf{U}^{T}\,\mathbf{x}, where 𝐔|V|×|V|\mathbf{U}\in\mathbb{R}^{|V|\times|V|} comprises the eigenvectors of 𝐋\mathbf{L}. Specifically, from the eigen-decomposition 𝐋=𝐔𝚲𝐔T,\mathbf{L}\;=\;\mathbf{U}\,\boldsymbol{\Lambda}\,\mathbf{U}^{T}, let 𝚲=diag(λ1,,λ|V|)\boldsymbol{\Lambda}=\mathrm{diag}(\lambda_{1},\dots,\lambda_{|V|}) with ordered eigenvalues λ1λ|V|\lambda_{1}\leq\dots\leq\lambda_{|V|}. Because the GFT is an orthogonal linear transform, the inverse GFT is 𝐱=𝐔𝐱^.\mathbf{x}\;=\;\mathbf{U}\,\hat{\mathbf{x}}. Finally, we formalize the notions of a graph filter and graph convolution:

Definition III.1 (Graph filter [45, 43, 58, 29]).

Given a graph Laplacian 𝐋\mathbf{L}, a graph filter H(𝐋)|V|×|V|H(\mathbf{L})\in\mathbb{R}^{|V|\times|V|} is defined as

H(𝐋)=𝐔diag(h(λ1),,h(λ|V|))𝐔T,H(\mathbf{L})\;=\;\mathbf{U}\,\mathrm{diag}\bigl(h(\lambda_{1}),\dots,h(\lambda_{|V|})\bigr)\,\mathbf{U}^{T}, (2)

where h:+h:\mathbb{R}_{+}\to\mathbb{R} is the frequency response function mapping each non-negative eigenvalue {λ1,,λ|V|}\{\lambda_{1},\dots,\lambda_{|V|}\} of 𝐋\mathbf{L} to {h(λ1),,h(λ|V|)}.\{h(\lambda_{1}),\dots,h(\lambda_{|V|})\}.

Definition III.2 (Graph convolution [45, 43, 58]).

For a graph signal 𝐱\mathbf{x} and a graph filter H(𝐋)H(\mathbf{L}), it follows that

H(𝐋)𝐱=𝐔diag(h(λ1),,h(λ|V|))𝐔T𝐱,H(\mathbf{L})\,\mathbf{x}\;=\;\mathbf{U}\,\mathrm{diag}\bigl(h(\lambda_{1}),\dots,h(\lambda_{|V|})\bigr)\,\mathbf{U}^{T}\,\mathbf{x}, (3)

which first applies the GFT to 𝐱\mathbf{x}, transforms the resulting spectrum by h()h(\cdot), and finally performs the inverse GFT.

In signal processing, signals are typically characterized by their smoothness and low-frequency components, whereas noise is usually non-smooth and dominates at high frequencies [43]. In this context, a significant category of filters is LPFs, which enhance the smoothness of graph signals, thereby aiding noise reduction. We formally define the LPF as follows:

Definition III.3.

(LPF) [43, 37, 50]: For k=1,,|V|k=1,\cdots,|V| and λ1λ|V|\lambda_{1}\leq\cdots\leq\lambda_{|V|}, the graph filter H(𝐋)H(\mathbf{L}) is kk-low-pass if and only if ηk[0,1]\eta_{k}\in[0,1], where

ηk:=max{|h(λk+1)|,,|h(λ|V|)|}min{|h(λ1)|,,|h(λk)|}.\eta_{k}:=\frac{max\{|h(\lambda_{k+1})|,\cdots,|h(\lambda_{|V|})|\}}{min\{|h(\lambda_{1})|,\cdots,|h(\lambda_{k})|\}}. (4)

III-B Krylov Subspaces

Krylov subspaces compactly capture the repeated action of a large matrix on a vector. Thus, for a user signal, they approximate the spectral properties of the item similarity matrix using the directions generated from that user, instead of requiring a full matrix decomposition. They are mathematically formalized as follows.

Definition III.4.

(Krylov subspace [42, 47]) Given a matrix 𝐌d×d\mathbf{M}\in\mathbb{R}^{d\times d} and a vector 𝐫d\mathbf{r}\in\mathbb{R}^{d}, the KK-th order Krylov subspace generated by (𝐌,𝐫)(\mathbf{M},\mathbf{r}) is defined as

𝒦K(𝐌,𝐫)=span{𝐫,𝐌𝐫,𝐌2𝐫,,𝐌K1𝐫},\displaystyle\mathcal{K}_{K}(\mathbf{M},\mathbf{r})=\text{span}\{\mathbf{r},\mathbf{M}\mathbf{r},\mathbf{M}^{2}\mathbf{r},\ldots,\mathbf{M}^{K-1}\mathbf{r}\}, (5)

where 𝐌\mathbf{M} is the matrix of interest, dd is the dimensionality of the ambient space, and 𝐫\mathbf{r} is the initial vector from which the subspace is generated. Here, KK indicates the order of the subspace and span{}\text{span}\{\} denotes the set of all linear combinations of the given vectors.

IV Methodology

This section begins by outlining the practical challenges associated with GF-based CF methods. We then provide a comprehensive description of the proposed Mem-GF method, along with its theoretical analyses.

IV-A Challenges in GF-Based CF Methods

Conventional training-free GF-based CF methods [43, 58, 27, 31, 59] in canonical recommendation tasks begin with constructing an item similarity graph, where each item is represented as a node and the similarities between items are modeled as edges. Its adjacency matrix is generally expressed as

𝐏~=𝐑~𝐑~;𝐑~=𝐃𝒰γ𝐑𝐃1γ,\displaystyle\tilde{\mathbf{P}}=\tilde{\mathbf{R}}^{\top}\tilde{\mathbf{R}};\tilde{\mathbf{R}}=\mathbf{D}^{\gamma}_{\mathcal{U}}\mathbf{R}\mathbf{D}^{1-\gamma}_{\mathcal{I}}, (6)

where |𝒰||\mathcal{U}| is the number of users in a dataset; 𝐑|𝒰|×||\mathbf{R}\in\mathbb{R}^{|\mathcal{U}|\times|\mathcal{I}|} is the interaction matrix; 𝐑~\tilde{\mathbf{R}} is the normalized interaction matrix; 𝐃𝒰=diag(𝐑𝟏)\mathbf{D}_{\mathcal{U}}=\text{diag}(\mathbf{R}\mathbf{1}) and 𝐃=diag(𝟏𝐑)\mathbf{D}_{\mathcal{I}}=\text{diag}(\mathbf{1}^{\top}\mathbf{R}); γ[0,1]\gamma\in[0,1] controls the normalization along users/items; and 𝐏~\tilde{\mathbf{P}} is the adjacency matrix representing the item similarity graph. Meanwhile, the standard notion of GF involves performing full eigenvalue decompositions of the Laplacian or adjacency matrices representing the item similarity graph [29], which incurs a prohibitive computational complexity of O(||3)O(|\mathcal{I}|^{3}). To prevent such computational issues, scalable GF-based CF methods exploit several techniques, including sparse matrix operations [43], top-kk eigenvalue components [43, 58], polynomial graph filters [31], and user clustering [59]. Nevertheless, they still depend on the full-size item similarity matrix during filtering. For example, GF-based CF methods [43, 58, 27, 59] typically leverage linear LPFs and/or ideal LPFs extracting only the top-kk eigenvalue components instead of fully decomposing the item similarity matrix:

𝐬u=(𝐏~+μ𝐃U1/2𝐔¯𝐔¯T𝐃I1/2)𝐫u,\mathbf{s}_{u}=\bigl(\tilde{\mathbf{P}}+\mu\mathbf{D}^{-1/2}_{U}\bar{\mathbf{U}}\bar{\mathbf{U}}^{T}\mathbf{D}^{-1/2}_{I}\bigr)\mathbf{r}_{u}, (7)

where 𝐬u||\mathbf{s}_{u}\in\mathbb{R}^{|\mathcal{I}|} is the predicted preferences for user uu; 𝐫u{\mathbf{r}}_{u} denotes the uu-th row of 𝐑\mathbf{R}, treated as a column graph signal for user uu; 𝐔¯||×k\bar{\mathbf{U}}\in\mathbb{R}^{|\mathcal{I}|\times k} is the top-kk singular vectors of 𝐑~0\tilde{\mathbf{R}}_{0}; 𝐏~\tilde{\mathbf{P}} is the linear LPF; 𝐃U1/2𝐔¯𝐔¯T𝐃I1/2\mathbf{D}^{-1/2}_{U}\bar{\mathbf{U}}\bar{\mathbf{U}}^{T}\mathbf{D}^{-1/2}_{I} is the ideal LPF of 𝐏~\tilde{\mathbf{P}}; and μ\mu is a hyperparameter balancing between the two filters. However, due to their limited expressiveness in designing various filters having different frequency response functions, recent studies proposed to use polynomial graph filters to produce diverse frequency response functions using polynomial functions [31, 43, 36]. As representative work [31], the prediction score for user uu via polynomial GF is formulated as

𝐬u=n=1Nan𝐏~n𝐫u,\mathbf{s}_{u}=\sum_{n=1}^{N}{a_{n}}\tilde{\mathbf{P}}^{n}\mathbf{r}_{u}, (8)

where {an}n=1N\{a_{n}\}_{n=1}^{N} are the polynomial coefficients; NN is the maximum order of the matrix polynomial; and n=1Nan𝐏~n\sum_{n=1}^{N}a_{n}\tilde{\mathbf{P}}^{n} represents the polynomial graph filter on the item similarity graph. It was proven that the polynomial graph filter n=1Nan𝐏~n\sum_{n=1}^{N}{a_{n}}\tilde{\mathbf{P}}^{n} has a frequency response function of n=1Nan(1λ)n\sum_{n=1}^{N}{a_{n}}(1-\lambda)^{n} [31, 43, 36].

While conventional GF-based CF methods have adopted user-specific batch strategies (see (7) and (8)), they still suffer from expensive costs for constructing or applying a universal (i.e., user-agnostic) graph filter across all users based on the large item similarity matrix 𝐏~\tilde{\mathbf{P}} (see Fig. 2). For instance, a simple linear LPF (i.e., 𝐏~\tilde{\mathbf{P}}) having a frequency response function of h(λ)=1λh(\lambda)=1-\lambda has a dimension of (||,||)(|\mathcal{I}|,|\mathcal{I}|). Moreover, ideal LPFs or higher-order polynomial filters can further increase memory footprint and computation. As a result, devising a memory-efficient polynomial GF technique that can effectively operate under resource-constrained environments becomes a critical research challenge. This raises a natural question: “How can we optimize batch processing for each user to enhance memory efficiency in GF-based recommendations?” To answer this question, we elaborate on the proposed Mem-GF method in the following subsection.

IV-B Proposed Method: Mem-GF

To directly overcome the challenges outlined in Section IV-A, we propose Mem-GF, a memory-efficient CF method that leverages polynomial GF without incurring the prohibitive memory costs of conventional GF-based CF approaches. The core idea is to use the Krylov subspace for each user signal, thereby approximating matrix-vector operations without explicitly forming or storing large matrices such as 𝐏~\tilde{\mathbf{P}}. The Krylov subspace 𝒦K(𝐏~,𝐫u)\mathcal{K}_{K}(\tilde{\mathbf{P}},\mathbf{r}_{u}) in (5), defined as the span of vectors {𝐫u,𝐏~𝐫u,𝐏~2𝐫u,,\{\mathbf{r}_{u},\tilde{\mathbf{P}}\mathbf{r}_{u},\tilde{\mathbf{P}}^{2}\mathbf{r}_{u},\dots, 𝐏~K1𝐫u}\tilde{\mathbf{P}}^{K-1}\mathbf{r}_{u}\}, captures progressively transformed directions of user uu’s interaction vector under the repeated action of 𝐏~\tilde{\mathbf{P}}. By restricting computations to 𝒦K(𝐏~,𝐫u)\mathcal{K}_{K}(\tilde{\mathbf{P}},\mathbf{r}_{u}), we approximate the spectral properties of 𝐏~\tilde{\mathbf{P}} in the user-specific Krylov subspace. To construct its orthonormal basis, we utilize the Lanczos algorithm [19], which avoids direct eigenvalue/eigenvector computation. The schematic overview of the Mem-GF method is illustrated in Fig. 3. We summarize the pseudocode of Mem-GF in Algorithm 1.

IV-B1 Methodological Details

Mem-GF first refines the normalized interaction matrix 𝐑~\tilde{\mathbf{R}} by applying the Hadamard power, effectively regularizing the differences among elements in 𝐑~\tilde{\mathbf{R}}. Notably, unlike prior approaches [31, 15, 32] that apply the Hadamard power to the item similarity matrix 𝐏~\tilde{\mathbf{P}}, our method operates directly on 𝐑~\tilde{\mathbf{R}} to avoid the need for explicit storage of 𝐏~\tilde{\mathbf{P}}:

𝐑¯=𝐑~s,\bar{\mathbf{R}}\;=\;\tilde{\mathbf{R}}^{\circ s}, (9)

where ss is a hyperparameter for adjusting 𝐑~\tilde{\mathbf{R}} and 𝐑¯\bar{\mathbf{R}} is the adjusted interaction matrix. Thus, the item similarity matrix is defined as 𝐏~=𝐑¯𝐑¯\tilde{\mathbf{P}}=\bar{\mathbf{R}}^{\top}\bar{\mathbf{R}} accordingly, but it is never explicitly stored. Next, let us turn to describing the Lanczos algorithm, which is central to Mem-GF. We construct a Krylov subspace 𝒦K(𝐏~,𝐫u)\mathcal{K}_{K}(\tilde{\mathbf{P}},\mathbf{r}_{u}) with KK bases, where 𝐫u\mathbf{r}_{u} represents user uu’s interaction vector.222Note that Mem-GF is flexible in using the uu-th row of 𝐑¯\bar{\mathbf{R}}, denoted by 𝐫¯u\bar{\mathbf{r}}_{u}, as graph signals. This process generates a user-specific orthonormal Krylov basis 𝐐u||×K\mathbf{Q}_{u}\in\mathbb{R}^{|\mathcal{I}|\times K} and a low-dimensional tridiagonal matrix 𝐓uK×K\mathbf{T}_{u}\in\mathbb{R}^{K\times K}. The matrix 𝐓u\mathbf{T}_{u} represents the projection of 𝐏~\tilde{\mathbf{P}} onto the Krylov subspace:

𝐓u=𝐐u𝐏~𝐐u.\mathbf{T}_{u}=\mathbf{Q}_{u}^{\top}\tilde{\mathbf{P}}\mathbf{Q}_{u}. (10)

More precisely, finite-step Lanczos may involve a residual term outside the constructed subspace; however, the polynomial action on the starting vector 𝐫u\mathbf{r}_{u} is exactly represented for degrees N<KN<K under exact arithmetic, as formalized in Theorem IV.1.

Refer to caption
Figure 3: The schematic overview of Mem-GF.

IV-B2 Core Idea of Mem-GF

Algorithm 1 Mem-GF
0: interaction matrix 𝐑|𝒰|×||\mathbf{R}\in\mathbb{R}^{|\mathcal{U}|\times|\mathcal{I}|}, set of users 𝒰\mathcal{U}, set of items \mathcal{I}
0:{𝐬u}u𝒰\{\mathbf{s}_{u}\}_{u\in\mathcal{U}}
1:𝐑~=𝐃𝒰γ𝐑𝐃1γ\tilde{\mathbf{R}}=\mathbf{D}^{\gamma}_{\mathcal{U}}\mathbf{R}\mathbf{D}^{1-\gamma}_{\mathcal{I}}
2:𝐑¯=𝐑~s\bar{\mathbf{R}}=\tilde{\mathbf{R}}^{\circ s}
3:for each u𝒰u\in\mathcal{U} do
4:  /* Lanczos iteration */
5:  β00,𝐪u,0𝟎,𝐪u,1𝐫u/𝐫u2\beta_{0}\leftarrow 0,\ \mathbf{q}_{u,0}\leftarrow\mathbf{0},\ \mathbf{q}_{u,1}\leftarrow\mathbf{r}_{u}/\|\mathbf{r}_{u}\|_{2}
6:  for j1j\leftarrow 1 to KK do
7:   𝐰u𝐑¯(𝐑¯𝐪u,j)βj1𝐪u,j1\mathbf{w}_{u}\leftarrow\bar{\mathbf{R}}^{\top}(\bar{\mathbf{R}}\,\mathbf{q}_{u,j})-\beta_{j-1}\,\mathbf{q}_{u,j-1}
8:   αj𝐪u,j𝐰u\alpha_{j}\leftarrow\mathbf{q}_{u,j}^{\top}\mathbf{w}_{u}
9:   𝐰u𝐰uαj𝐪u,j\mathbf{w}_{u}\leftarrow\mathbf{w}_{u}-\alpha_{j}\,\mathbf{q}_{u,j}
10:   βj𝐰u2\beta_{j}\leftarrow\|\mathbf{w}_{u}\|_{2}
11:   if βj0\beta_{j}\neq 0 then
12:    𝐪u,j+1𝐰u/βj\mathbf{q}_{u,j+1}\leftarrow\mathbf{w}_{u}/\beta_{j}
13:   end if
14:  end for
15:  Form 𝐐u\mathbf{Q}_{u} with 𝐪u,1,,𝐪u,K\mathbf{q}_{u,1},\ldots,\mathbf{q}_{u,K} and 𝐓u\mathbf{T}_{u} with {αj,βj}\{\alpha_{j},\beta_{j}\}
16:  f(𝐓u)n=0Nan(𝐓u)nf(\mathbf{T}_{u})\leftarrow\sum_{n=0}^{N}a_{n}\,(\mathbf{T}_{u})^{n}
17:  𝐬u𝐫u2𝐐uf(𝐓u)𝐞1\mathbf{s}_{u}\leftarrow\|\mathbf{r}_{u}\|_{2}\;\mathbf{Q}_{u}\;f(\mathbf{T}_{u})\;\mathbf{e}_{1}
18:end for
19:return {𝐬u}u𝒰\{\mathbf{s}_{u}\}_{u\in\mathcal{U}}

The Lanczos algorithm [19] proceeds as follows:

  • Initialization: The algorithm begins with the normalized vector 𝐪u,1=𝐫u/𝐫u2\mathbf{q}_{u,1}=\mathbf{r}_{u}/\|\mathbf{r}_{u}\|_{2}, where 𝐫u\mathbf{r}_{u} is the interaction vector for user uu. Additional parameters, including β0=0\beta_{0}=0 and 𝐪u,0=𝟎\mathbf{q}_{u,0}=\mathbf{0}, required for iterations are also initialized.

  • Iteration: At each iteration jj, the algorithm computes the projection of 𝐏~\tilde{\mathbf{P}} onto the current basis:

    𝐰u\displaystyle\mathbf{w}_{u} =𝐏~𝐪u,jβj1𝐪u,j1,\displaystyle=\tilde{\mathbf{P}}\mathbf{q}_{u,j}-\beta_{j-1}\mathbf{q}_{u,j-1},
    αj\displaystyle\alpha_{j} =𝐪u,j𝐰u.\displaystyle=\mathbf{q}_{u,j}^{\top}\mathbf{w}_{u}.

    The residual 𝐰u\mathbf{w}_{u} is then orthogonalized and normalized:

    𝐰u\displaystyle\mathbf{w}_{u} 𝐰uαj𝐪u,j,\displaystyle\leftarrow\mathbf{w}_{u}-\alpha_{j}\mathbf{q}_{u,j},
    βj\displaystyle\beta_{j} =𝐰u2,\displaystyle=\|\mathbf{w}_{u}\|_{2},
    𝐪u,j+1\displaystyle\mathbf{q}_{u,j+1} =𝐰u/βj(if βj0).\displaystyle=\mathbf{w}_{u}/\beta_{j}\quad\text{(if }\beta_{j}\neq 0\text{)}.
  • Termination: The iterations continue until j=Kj=K or βj=0\beta_{j}=0, while constructing the tridiagonal matrix 𝐓u\mathbf{T}_{u} using the coefficients {αj}j=1K\{\alpha_{j}\}_{j=1}^{K} and {βj}j=1K1\{\beta_{j}\}_{j=1}^{K-1}:

    𝐓u=[α1β100β1α2β20β2α30βK100βK1αK],\mathbf{T}_{u}=\begin{bmatrix}\alpha_{1}&\beta_{1}&0&\cdots&0\\ \beta_{1}&\alpha_{2}&\beta_{2}&\ddots&\vdots\\ 0&\beta_{2}&\alpha_{3}&\ddots&0\\ \vdots&\ddots&\ddots&\ddots&\beta_{K-1}\\ 0&\cdots&0&\beta_{K-1}&\alpha_{K}\\ \end{bmatrix}, (11)

where 𝐪u,j||\mathbf{q}_{u,j}\in\mathbb{R}^{|\mathcal{I}|} is the jj-th vector in the Lanczos basis for user uu; αj\alpha_{j} represents the local projection coefficient of 𝐏~𝐪u,j\tilde{\mathbf{P}}\,\mathbf{q}_{u,j} onto 𝐪u,j\mathbf{q}_{u,j}; and βj\beta_{j} is the norm of the resulting residual, used to orthogonalize 𝐪u,j+1\mathbf{q}_{u,j+1} w.r.t. the previous basis vectors. It is worth noting that, instead of directly storing 𝐏~\tilde{\mathbf{P}}, Mem-GF calculates the matrix-vector product 𝐏~𝐐u\tilde{\mathbf{P}}\mathbf{Q}_{u} using only the adjusted interaction matrix 𝐑¯\bar{\mathbf{R}} as 𝐏~𝐐u=𝐑¯(𝐑¯𝐐u)\tilde{\mathbf{P}}\mathbf{Q}_{u}=\bar{\mathbf{R}}^{\top}(\bar{\mathbf{R}}\mathbf{Q}_{u}). Since 𝐑¯|𝒰|×||\bar{\mathbf{R}}\in\mathbb{R}^{|\mathcal{U}|\times|\mathcal{I}|} is much smaller and sparser than the full item similarity matrix 𝐏~=𝐑¯𝐑¯\tilde{\mathbf{P}}=\bar{\mathbf{R}}^{\top}\bar{\mathbf{R}}, this reduces the memory requirement from storing a full-size item–item matrix to storing only 𝐑¯\bar{\mathbf{R}} and the compact Krylov quantities, making Mem-GF viable for large-scale datasets (refer to Figs. 2 and 3).

To see how Mem-GF computes polynomial graph filters, note that 𝐐u\mathbf{Q}_{u} forms a user-specific Krylov subspace, while 𝐓u\mathbf{T}_{u} is the projected representation of 𝐏~\tilde{\mathbf{P}} in that subspace. By the Lanczos initialization, we have 𝐐u𝐫u=𝐫u2𝐞1\mathbf{Q}_{u}^{\top}\mathbf{r}_{u}=\|\mathbf{r}_{u}\|_{2}\mathbf{e}_{1}. Therefore, for a polynomial graph filter f(𝐏~)=n=0Nan𝐏~nf(\tilde{\mathbf{P}})=\sum_{n=0}^{N}a_{n}\tilde{\mathbf{P}}^{n} with N<KN<K, the Krylov representation gives

f(𝐏~)𝐫u=𝐫u2𝐐u(n=0Nan𝐓un)𝐞1,f(\tilde{\mathbf{P}})\mathbf{r}_{u}=\|\mathbf{r}_{u}\|_{2}\,\mathbf{Q}_{u}\left(\sum_{n=0}^{N}a_{n}\mathbf{T}_{u}^{n}\right)\mathbf{e}_{1}, (12)

as formally shown in Theorem IV.1. This relation enables polynomial graph filtering without explicitly forming 𝐏~\tilde{\mathbf{P}}.

Thus, our Mem-GF method is finally formulated as follows:

𝐬u=𝐫u2𝐐uf(𝐓u)𝐞1,\mathbf{s}_{u}=\|\mathbf{r}_{u}\|_{2}\mathbf{Q}_{u}f(\mathbf{T}_{u})\mathbf{e}_{1}, (13)

where f(𝐓u)f(\mathbf{T}_{u}) represents the user-specific polynomial graph filter in the Krylov subspace and is expressed as

f(𝐓u)=n=0Nan𝐓un.f(\mathbf{T}_{u})=\sum_{n=0}^{N}a_{n}\mathbf{T}_{u}^{n}. (14)

This formulation ensures that both the graph filter and the signal are user-specific, enabling personalized and efficient recommendation by allocating memory in a batch-based manner for each user’s operations. By reducing the polynomial graph filter computation into smaller, user-specific batches, our method is capable of further optimizing memory usage, ensuring scalability even with limited hardware resources. We next highlight additional advantages offered by the proposed Mem-GF method.

Remark.

First, the user-specific matrices 𝐐u\mathbf{Q}_{u} and 𝐓u\mathbf{T}_{u} can be pre-computed and cached, enabling near-instantaneous inference through lightweight matrix multiplications. This design facilitates deployment in real-world systems with stringent latency requirements. Second, once the Krylov basis (𝐐u,𝐓u)(\mathbf{Q}_{u},\,\mathbf{T}_{u}) is constructed through KK Lanczos iterations, an arbitrary polynomial graph filter of order K<KK^{\prime}<K can be calculated by utilizing only the first KK^{\prime} Krylov basis vectors, without further access to the interaction matrix 𝐑¯\bar{\mathbf{R}}. Consequently, validation-based filter selection becomes much cheaper than recomputing each filter independently. Such capability is especially valuable in personalized recommendation scenarios, as optimal polynomial filters can vary substantially across individual users [27, 32].

Next, we are interested in analyzing how many Krylov subspace bases are sufficient to guarantee no approximation error. To address this, we provide a theoretical analysis by establishing the following theorem.

Theorem IV.1 (Zero-Error Guarantees in Krylov Subspaces).

Let 𝐏~||×||\tilde{\mathbf{P}}\in\mathbb{R}^{|\mathcal{I}|\times|\mathcal{I}|} be a symmetric matrix (e.g., an item similarity matrix), and let 𝐫u||\mathbf{r}_{u}\in\mathbb{R}^{|\mathcal{I}|} be the signal for user uu. Run KK iterations of the Lanczos algorithm on the pair (𝐏~,𝐫u)(\tilde{\mathbf{P}},\mathbf{r}_{u}), yielding an orthonormal Krylov basis 𝐐u||×K\mathbf{Q}_{u}\in\mathbb{R}^{|\mathcal{I}|\times K} and a tridiagonal projected matrix 𝐓u=𝐐u𝐏~𝐐uK×K\mathbf{T}_{u}=\mathbf{Q}_{u}^{\top}\tilde{\mathbf{P}}\mathbf{Q}_{u}\in\mathbb{R}^{K\times K}. Consider a polynomial graph filter f(𝐏~)=n=0Nan𝐏~nf(\tilde{\mathbf{P}})\;=\;\sum_{n=0}^{N}a_{n}\,\tilde{\mathbf{P}}^{n} of degree N.N. If N<KN<K, then the Lanczos algorithm in Mem-GF incurs no approximation error under exact arithmetic.

Proof.

The Lanczos procedure constructs 𝐐u=[𝐪u,1,,𝐪u,K]\mathbf{Q}_{u}=[\mathbf{q}_{u,1},\ldots,\mathbf{q}_{u,K}] with 𝐪u,1=𝐫u/𝐫u2\mathbf{q}_{u,1}=\mathbf{r}_{u}/\|\mathbf{r}_{u}\|_{2} and the projected matrix 𝐓u=𝐐u𝐏~𝐐u\mathbf{T}_{u}=\mathbf{Q}_{u}^{\top}\tilde{\mathbf{P}}\mathbf{Q}_{u}. By the Krylov construction, for every 0nN<K0\leq n\leq N<K, the vector 𝐏~n𝐫u\tilde{\mathbf{P}}^{n}\mathbf{r}_{u} belongs to 𝒦K(𝐏~,𝐫u)\mathcal{K}_{K}(\tilde{\mathbf{P}},\mathbf{r}_{u}) and can be represented in the basis 𝐐u\mathbf{Q}_{u} as

𝐏~n𝐫u=𝐫u2𝐐u𝐓un𝐞1.\tilde{\mathbf{P}}^{n}\mathbf{r}_{u}=\|\mathbf{r}_{u}\|_{2}\mathbf{Q}_{u}\mathbf{T}_{u}^{n}\mathbf{e}_{1}.

Therefore,

f(𝐏~)𝐫u\displaystyle f(\tilde{\mathbf{P}})\mathbf{r}_{u} =n=0Nan𝐏~n𝐫u\displaystyle=\sum_{n=0}^{N}a_{n}\tilde{\mathbf{P}}^{n}\mathbf{r}_{u}
=𝐫u2𝐐u(n=0Nan𝐓un)𝐞1=𝐫u2𝐐uf(𝐓u)𝐞1.\displaystyle=\|\mathbf{r}_{u}\|_{2}\mathbf{Q}_{u}\left(\sum_{n=0}^{N}a_{n}\mathbf{T}_{u}^{n}\right)\mathbf{e}_{1}=\|\mathbf{r}_{u}\|_{2}\mathbf{Q}_{u}f(\mathbf{T}_{u})\mathbf{e}_{1}.

Thus, for all N<KN<K, the equality holds with no approximation error. This completes the proof of Theorem IV.1. ∎

Theorem IV.1 implies that, whenever the polynomial degree NN is less than the Krylov subspace size KK, Mem-GF exactly reproduces f(𝐏~)𝐫uf(\tilde{\mathbf{P}})\mathbf{r}_{u} in the original space under exact arithmetic. Thus, setting K=N+1K=N+1 is sufficient to avoid approximation error in the algebraic Krylov representation. In practice, finite-precision computation or choosing NKN\geq K may affect this exactness; therefore, we tune KK and NN on the validation set. Separately, severe sparsity or noise in user signals may degrade empirical recommendation quality rather than invalidate the algebraic guarantee itself. We provide experimental verification of this theoretical claim in Section V-F.

IV-B3 High-Order Polynomial Graph Filter Designs

Conventional polynomial GF-based CF methods [31, 15, 32] restrict polynomial filters up to third-order terms to bypass the problem of memory constraints. However, explicitly constructing and storing 𝐏~\tilde{\mathbf{P}} or computing higher-order powers (e.g., 𝐏~3\tilde{\mathbf{P}}^{3}) still requires memory scaling as 𝒪(||2)\mathcal{O}(|\mathcal{I}|^{2}), which is infeasible for large sets of items. In contrast, Mem-GF overcomes the memory bottleneck issue using the user-specific tridiagonal matrix 𝐓u\mathbf{T}_{u} in (10). Since K||K\ll|\mathcal{I}| for 𝐓uK×K\mathbf{T}_{u}\in\mathbb{R}^{K\times K}, it becomes computationally trivial to raise 𝐓u\mathbf{T}_{u} to higher powers, thereby allowing the order of polynomial filters to be significantly higher.

Refer to caption
Figure 4: Comparison of the function h(λ)=eγλ2h(\lambda)=e^{-\gamma\lambda^{2}} and its polynomial approximations for different polynomial orders (N=1,2,5N=1,2,5). The maximum and mean errors are annotated within each subplot to indicate approximation accuracy.

This flexibility allows Mem-GF to approximate a broad spectrum of target frequency responses. Specifically, we consider employing graph filters having more complex frequency responses. This leads us to propose a new polynomial graph filter that approximates the Gaussian filter [13, 21] that is widely used in image and graph domains, whose frequency response is expressed as h(λ)=eτλ2h(\lambda)=e^{-\tau\lambda^{2}}, where τ\tau is the damping factor in the Gaussian filter.333In addition to the Gaussian filter, any sophisticated filters can be designed by appropriately adjusting the polynomial coefficients. Due to the fact that the polynomial graph filter n=1Nan𝐏~n\sum_{n=1}^{N}{a_{n}}\tilde{\mathbf{P}}^{n} has a frequency response function of n=1Nan(1λ)n\sum_{n=1}^{N}{a_{n}}(1-\lambda)^{n} [31, 43, 36], we are ready to numerically solve a non-linear least squares (LS) problem to find the polynomial coefficients {an}n=1N\{a_{n}\}_{n=1}^{N} in (14) within the maximum polynomial order NN, as in [31]. As illustrated in Fig. 4, performing higher-order polynomial approximations (e.g., N=5N=5) significantly reduces the error compared to lower-order polynomials (e.g., N=1,2N=1,2). This demonstrates the clear advantage of high-order polynomials in closely approximating the original function h(λ)=eτλ2h(\lambda)=e^{-\tau\lambda^{2}}. Therefore, by setting N=5N=5 and τ=3\tau=3,444One can arbitrarily set the hyperparameter τ\tau. Here, increasing τ\tau results in the further utilization of the low-pass frequency components, and vice versa. we are capable of discovering a new predefined polynomial graph filter having a frequency response function as h(λ)=e3λ2h(\lambda)=e^{-3\lambda^{2}} as follows:

f(𝐓u)=8𝐓u+3.1𝐓u211.2𝐓u3+6.2𝐓u41.1𝐓u5.f(\mathbf{T}_{u})=8\mathbf{T}_{u}+3.1\mathbf{T}_{u}^{2}-11.2\mathbf{T}_{u}^{3}+6.2\mathbf{T}_{u}^{4}-1.1\mathbf{T}_{u}^{5}. (15)

Finally, in our study, we utilize three predefined LPFs used in [31, 15] as well as our newly devised fifth-order polynomial filter, which are listed as follows:

  • Mem-GF-1 (f(𝐓u)=𝐓uf(\mathbf{T}_{u})=\mathbf{T}_{u}): The linear LPF with the first polynomial order, whose frequency response is h(λ)=1λh(\lambda)=1-\lambda.

  • Mem-GF-2 (f(𝐓u)=2𝐓u𝐓u2f(\mathbf{T}_{u})=2\mathbf{T}_{u}-\mathbf{T}_{u}^{2}): The second-order LPF, whose frequency response is h(λ)=1λ2h(\lambda)=1-\lambda^{2}

  • Mem-GF-3 (f(𝐓u)=𝐓u+0.01(𝐓u3+10𝐓u229𝐓u)f(\mathbf{T}_{u})=\mathbf{T}_{u}+0.01({-\mathbf{T}_{u}^{3}+10\mathbf{T}_{u}^{2}-29\mathbf{T}_{u}})): The third-order LPF which approximates ideal LPF (i.e., h(λ)=𝟏λph(\lambda)=\mathbf{1}_{\lambda\leq p} with cutoff frequency pp) [31].

  • Mem-GF-5 (f(𝐓u)=8𝐓u+3.1𝐓u211.2𝐓u3+6.2𝐓u41.1𝐓u5f(\mathbf{T}_{u})=8\mathbf{T}_{u}+3.1\mathbf{T}_{u}^{2}-11.2\mathbf{T}_{u}^{3}+6.2\mathbf{T}_{u}^{4}-1.1\mathbf{T}_{u}^{5}): The fifth-order LPF, whose frequency response approximates the Gaussian filter h(λ)=e3λ2h(\lambda)=e^{-3\lambda^{2}}.

The optimal filter and hyperparameters of Mem-GF can be found via hyperparameter tuning on the validation set. Although the aforementioned four graph filters are utilized in our experiments, higher-order graph filters can be utilized and any arbitrary functions can be designed via the same procedures.

IV-B4 Complexity Analysis

We first theoretically analyze the space (i.e., memory) complexity of Mem-GF during preprocessing stage. A central advantage of Mem-GF is its ability to avoid explicitly forming the item similarity matrix 𝐏~=𝐑¯𝐑¯||×||\tilde{\mathbf{P}}=\bar{\mathbf{R}}^{\top}\bar{\mathbf{R}}\in\mathbb{R}^{|\mathcal{I}|\times|\mathcal{I}|}. Storing 𝐏~\tilde{\mathbf{P}} would naïvely demand memory of 𝒪(||2)\mathcal{O}(|\mathcal{I}|^{2}), which can be reduced to nnz(𝐏~)\mathrm{nnz}(\tilde{\mathbf{P}}) in sparse format. However, 𝐏~\tilde{\mathbf{P}} in (6) is often far denser than 𝐑~\tilde{\mathbf{R}}, as each element contains two-hop item-neighbor information. Instead, Mem-GF stores only 𝐑¯\bar{\mathbf{R}}, 𝐐u\mathbf{Q}_{u}, and 𝐓u\mathbf{T}_{u} (see (9) and (10)). First, the adjusted interaction matrix in a sparse format has the dimension of 𝐑¯|𝒰|×||\bar{\mathbf{R}}\in\mathbb{R}^{|\mathcal{U}|\times|\mathcal{I}|}. Thus, its space requirement is 𝒪(nnz(𝐑))\mathcal{O}\bigl(\mathrm{nnz}(\mathbf{R})\bigr). Since 𝐑¯\bar{\mathbf{R}} is obtained by a Hadamard power of 𝐑~\tilde{\mathbf{R}}, it shares the same sparsity pattern as 𝐑\mathbf{R}, implying that nnz(𝐑¯)=nnz(𝐑)\mathrm{nnz}(\bar{\mathbf{R}})=\mathrm{nnz}(\mathbf{R}). Next, we need to store temporary user-specific Lanczos vectors 𝐐u||×K\mathbf{Q}_{u}\in\mathbb{R}^{|\mathcal{I}|\times K} and the tridiagonal matrix 𝐓uK×K\mathbf{T}_{u}\in\mathbb{R}^{K\times K}, which have the dimension linear in ||K|\mathcal{I}|K and K2K^{2}, respectively. Importantly, from the fact that they are constructed per user (or per batch), the algorithm can overwrite or free this memory once each user’s final predictions are computed. Hence, the overall memory footprint is finally given by

𝒪(nnz(𝐑)+||K+K2),\mathcal{O}\bigl(\mathrm{nnz}(\mathbf{R})+|\mathcal{I}|K+K^{2}), (16)

rather than 𝒪(||2)\mathcal{O}(|\mathcal{I}|^{2}) or 𝒪(nnz(𝐏~))\mathcal{O}(\mathrm{nnz}(\tilde{\mathbf{P}})). This design substantially alleviates the memory bottleneck, making it feasible to handle large-scale datasets on standard hardware resources.

Next, we delve into theoretically analyzing the time (i.e., computational) complexity of Mem-GF during preprocessing. The computational cost of Mem-GF can be broken down into two core stages: (i) constructing the Krylov subspace for each user via the Lanczos algorithm and (ii) performing polynomial GF in the reduced space.

  1. 1.

    Krylov subspace construction (Lanczos algorithm). For user uu, let 𝐫u||\mathbf{r}_{u}\in\mathbb{R}^{|\mathcal{I}|} be the interaction vector. We conduct KK iterations of the Lanczos algorithm to build the matrix 𝐐u||×K\mathbf{Q}_{u}\in\mathbb{R}^{|\mathcal{I}|\times K} and the tridiagonal matrix 𝐓uK×K\mathbf{T}_{u}\in\mathbb{R}^{K\times K}. Each iteration requires a matrix-vector multiplication with 𝐏~=𝐑¯𝐑¯\tilde{\mathbf{P}}=\bar{\mathbf{R}}^{\top}\bar{\mathbf{R}}. Rather than forming 𝐏~\tilde{\mathbf{P}} explicitly, we compute 𝐏~𝐪u,j=𝐑¯(𝐑¯𝐪u,j)\tilde{\mathbf{P}}\,\mathbf{q}_{u,j}\;=\;\bar{\mathbf{R}}^{\top}\!\bigl(\bar{\mathbf{R}}\,\mathbf{q}_{u,j}\bigr). Each multiplication with 𝐑¯\bar{\mathbf{R}} or 𝐑¯\bar{\mathbf{R}}^{\top} can be executed in 𝒪(nnz(𝐑¯))\mathcal{O}\!\bigl(\mathrm{nnz}(\bar{\mathbf{R}})\bigr) time by exploiting sparsity. Consequently, one Lanczos iteration costs 𝒪(nnz(𝐑¯))\mathcal{O}\!\bigl(\mathrm{nnz}(\bar{\mathbf{R}})\bigr). Over KK steps, the cost per user is 𝒪(Knnz(𝐑¯))\mathcal{O}(K\,\mathrm{nnz}(\bar{\mathbf{R}})), which results in 𝒪(|𝒰|Knnz(𝐑¯))\mathcal{O}(|\mathcal{U}|\,K\,\mathrm{nnz}(\bar{\mathbf{R}})) for |𝒰||\mathcal{U}| total users.

  2. 2.

    Polynomial GF and score reconstruction. Given 𝐐u\mathbf{Q}_{u} and 𝐓u\mathbf{T}_{u}, we next compute a polynomial graph filter f(𝐓u)=n=0Nan𝐓unf(\mathbf{T}_{u})=\sum_{n=0}^{N}a_{n}\,\mathbf{T}_{u}^{n} in the reduced space. Specifically, the prediction score 𝐬u\mathbf{s}_{u} is 𝐫u2𝐐uf(𝐓u)𝐞1||\|\mathbf{r}_{u}\|_{2}\,\mathbf{Q}_{u}\,f(\mathbf{T}_{u})\,\mathbf{e}_{1}\in\mathbb{R}^{|\mathcal{I}|}. Because 𝐓u\mathbf{T}_{u} is of size K×KK\times K, powering 𝐓u\mathbf{T}_{u} up to order NN typically takes 𝒪(K2)\mathcal{O}(K^{2}) up to 𝒪(K3)\mathcal{O}(K^{3}) depending on implementation [41]. In practice, since KK is small (e.g., K=N+1K=N+1 due to Theorem IV.1), this step is minor. Finally, multiplying 𝐐uf(𝐓u)𝐞1\mathbf{Q}_{u}f(\mathbf{T}_{u})\mathbf{e}_{1} back into the original dimension costs 𝒪(||K)\mathcal{O}(|\mathcal{I}|K) per user. Aggregated across all users, the reconstruction needs 𝒪(|𝒰|||K)\mathcal{O}\bigl(|\mathcal{U}|\;|\mathcal{I}|\;K\bigr).

Therefore, the total time complexity of Mem-GF is given by

𝒪(|𝒰|K(nnz(𝐑)+||)).\mathcal{O}\Bigl(|\mathcal{U}|\,K\,\bigl(\mathrm{nnz}(\mathbf{R})+|\mathcal{I}|\bigr)\Bigr). (17)

V Experimental Evaluations

In this section, we systematically conduct extensive experiments to address the key research questions (RQs) outlined below:

  • RQ1: How memory-efficient is Mem-GF during preprocessing compared to benchmark GF-based recommendation methods?

  • RQ2: How efficient is Mem-GF during inference in terms of runtime and memory consumption compared to benchmark GF-based recommendation methods?

  • RQ3: How does the memory on Mem-GF scale w.r.t. the number of users, items, and interactions?

  • RQ4: How much does Mem-GF improve recommendation accuracy over benchmark recommendation methods?

  • RQ5: How does the number of Krylov subspace bases affect the performance of Mem-GF?

  • RQ6: How do types of polynomial filters affect the performance of Mem-GF?

  • RQ7: How sensitive is Mem-GF to variations in key hyperparameters?

V-A Experimental Settings

TABLE II: The statistics of three datasets.
Dataset # of users # of items # of interactions Density
Yelp 31,668 38,048 1,561,406 0.130%
Amazon-book 52,643 91,599 2,984,108 0.062%
ML-20M 138,493 26,744 20,000,263 0.540%

Datasets. We carry out experiments on the three benchmark datasets widely adopted for evaluating the performance of recommender systems for CF, which include Yelp, Amazon-book, and MovieLens-20M (ML-20M) [48, 63, 65, 53, 52, 10, 11, 9]. Notably, ML-20M is a large-scale, real-world dataset, consisting of 20 million user interactions. The statistics of the three datasets are summarized in Table II.

Competitors. We compare Mem-GF against twenty-one benchmark CF methods, which can be grouped into two categories. The first group comprises canonical benchmark CF methods built on commonly used architectures, including MF-based methods (MF-BPR [39] and NeuMF [11]), autoencoder-based methods (Multi-DAE, Multi-VAE [23], RecVAE [44], and SVD-AE [12]), GCN-based methods (NGCF [52], LightGCN [10], DGCF [53], SGL [55], SimpleX [28], and SGDE [35]), generative model-based methods (CFGAN [2], DiffRec, L-DiffRec [51], and FlowCF [26]), and link propagation-based method (LinkProp and LinkProp-Multi [4]). The second group consists of recent GF-based CF methods, including GF-CF [43], PGSP [27], FPSR [54], Turbo-CF [31], and HiGSP [59]. Certain recent GF-based approaches, including PolyCF [36] and LanczosNet [24], were excluded from our empirical comparison, as the source code of PolyCF is not publicly available and LanczosNet was not specifically designed for CF tasks with recommendation datasets involving tens of thousands of nodes.

Evaluation protocols. For the Yelp and Amazon-book datasets, we follow the same data splits and performance as those in prior studies [10, 52, 31, 27], directly using each paper’s reported best performance for consistency. For ML-20M, we preprocess the dataset by splitting interactions into a ratio of 80:20 for training and testing, and then further reserve 10% of the training set as a validation set under a 10-core setting (i.e., filter out users with fewer than ten interactions). To assess the performance of the top-KK recommendations, we use widely adopted benchmark metrics from the literature [48, 63, 65, 53, 52, 10, 11, 9], including recall and normalized discounted cumulative gain (NDCG), with KK set to 20 by default. For each metric, we report the average results over 10 independent runs, except for deterministic methods [46, 43, 27, 31, 12, 59] (including Mem-GF). Unless otherwise specified, all runtime reports for GF-based methods refer to their preprocessing time.

TABLE III: Comparison of memory (in GB) and time (in seconds) consumption during preprocessing among seven training-free GF-based recommendation methods across three datasets. RAM and VRAM values represent the peak memory usage recorded during execution. Entries marked as OOM denote out-of-memory errors, and entries marked as “–” denote unavailable GPU measurements caused by GPU buffer OOM or CPU-only fallback. The best and second-best performers are marked in bold and underlined, respectively. The performance of Turbo-CF and Mem-GF is evaluated according to the maximum order of the polynomial graph filters used.
Dataset Method EASE GF-CF PGSP HiGSP FPSR Turbo-CF Mem-GF
1 2 3 1 2 3 5
Yelp VRAM (A) 31.2 10.9 9.3 9.2 25.7 25.7 1.9 2.4 2.7 3.3
RAM (B) 0.8 2.1 74.1 OOM 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7
Total memory (A)+(B) 32.0 13.0 74.1 OOM 10.0 9.9 26.4 26.4 2.6 3.1 3.4 4.0
Time 15.68 7.0 1398.0 58.7 4.2 140.2 420.7 1.5 2.9 4.3 6.5
GPU OOM
Amazon-book VRAM (A) 34.5 32.4 45 4.4 5.2 5.9 7.3
RAM (B) OOM 0.7 OOM OOM 2.0 0.7 OOM OOM 0.7 0.7 0.7 0.7
Total memory (A)+(B) OOM 35.2 OOM OOM 34.4 45.7 OOM OOM 5.1 5.9 6.6 8.0
Time 35.7 157.1 35.5 6.6 11.4 16.4 26.4
GPU OOM
ML-20M VRAM (A) 19.7 11.5 4.6 4.8 4.8 5.6
RAM (B) 12.6 1.3 OOM OOM 4.5 6.6 9.6 12.8 1.2 1.2 1.2 1.2
Total memory (A)+(B) 12.6 21.0 OOM OOM 15.9 6.6 9.6 12.8 5.8 6.0 6.0 6.8
Time 223.0 59.8 121.1 187.6 2735.5 7432.8 102.2 156.6 209.1 314.1
GPU OOM

Implementation details. We adopt the optimal hyperparameters for competitors through extensive hyperparameter tuning on the validation set. For all experiments, we use default hyperparameters γ\gamma, ss, and polynomial order NN for Mem-GF, unless otherwise stated: (γ,s,N)(\gamma,s,N) = (0.55, 0.9, 5) for Yelp; (0.5, 0.9, 2) for Amazon-book; and (0.5, 1.1, 2) for ML-20M. We set K=N+1K=N+1 for all datasets. All training-based methods are optimized using the Adam optimizer [17] with a batch size of 1024. Our experiments are conducted on a single server equipped with Intel (R) 12-Core (TM) i7-9700K CPUs @ 3.60 GHz, NVIDIA GeForce RTX A6000 GPUs with 48GB VRAM, and 128GB of RAM.

All GF-based CF algorithms [43, 31, 46, 27, 59] are implemented using PyTorch tensors [33] to ensure a consistent experimental environment. Whenever a method exceeds the size of GPU VRAM, we revert to CPU-only computation leveraging system RAM to complete the process without disrupting the pipeline. For FPSR [54], we use τ=0.3\tau=0.3 as a default hyperparameter to generate 6 partitions. For Turbo-CF, we identically use the three predefined filters presented in [31, 15]. Since the filtering matrix is constructed as 𝐑¯𝐑¯\bar{\mathbf{R}}^{\top}\bar{\mathbf{R}}, using 𝐫¯u\bar{\mathbf{r}}_{u} also aligns the user signal with the same normalization and adjustment applied to the item similarity matrix. This alignment is particularly helpful for the ML-20M dataset, where interactions are highly concentrated on popular items, because it mitigates popularity-dominated signal amplification.

To evaluate the scalability of our Mem-GF in Section V-D, we generated sets of synthetic interaction matrices with varying dimensions and sparsity levels. In particular, we employ the random matrix generation function provided by the scipy.random module in the SciPy package [49] to create these matrices. By default, the number of users, items, and nonzero entries (i.e., the number of interactions) was set to 5,000, 5,000, and 225,000, respectively. In order to assess scalability across a wide range of problem sizes, the experiments are carried out according to the following setup:

  • Varying the number of users/items: The number of users and items is systematically increased to each of the following values: 1,000, 5,000, 10,000, 15,000, 20,000, 30,000, 40,000, 50,000, 80,000, and 100,000.

  • Varying the number of interactions: The number of nonzero entries (i.e., nnz(R)\mathrm{nnz}(R)) is scaled to 225,000, 1,125,000, 2,250,000, 4,500,000, 6,750,000, 9,000,000, and 11,250,000.

For each configuration, we record the corresponding memory usage and runtime of our Mem-GF. These systematic variations allow us to quantify how the performance of our method scales w.r.t. the number of users, items, and interactions.

V-B Preprocessing Efficiency Analysis (RQ1)

Table III summarizes the comparison of Mem-GF with existing GF-based recommendation methods in terms of memory usage and runtime on the three benchmark datasets during preprocessing (i.e., offline computation). Table IV shows the runtime comparison among Mem-GF and three representative CF methods that require model training (NGCF, LightGCN, and DiffRec). Our key observations are as follows:

  1. (i)

    Leveraging the low-dimensional Krylov subspace, Mem-GF significantly reduces both memory consumption and computation time compared to other GF-based CF competitors.

  2. (ii)

    On Amazon-book, where the set of items is relatively large, most GF-based CF competitors encounter OOM errors on the single GPU device. In contrast, Mem-GF efficiently operates within around 5GB of VRAM under the first-order polynomial filter, demonstrating up to a 574% memory advantage and a 438% runtime speedup compared to the second-best performers. Even with higher-order polynomial filters, Mem-GF maintains a VRAM usage of around 8GB.

  3. (iii)

    On ML-20M, the extremely large number of interactions (20M) triggers buffer OOM issues for many competing methods when performing sparse matrix multiplications on the GPU, forcing them to rely solely on CPU computations and resulting in prolonged runtime. On the contrary, Mem-GF seamlessly handles high-order filters without such memory bottlenecks.

  4. (iv)

    Table IV demonstrates the benefits of Mem-GF over representative CF methods that require model training in terms of runtime. Because Mem-GF is entirely training-free, it circumvents the costly iterative parameter optimization loops. Notably, on the datasets having a large amount of interactions, such as ML-20M, Mem-GF significantly outperforms two-tower models based on pairwise optimization (e.g., NGCF and LightGCN), highlighting its practical value in scenarios where both accuracy and speed are critical.

TABLE IV: Runtime (in seconds) of Mem-GF and representative competitors that require model training. Here, runtime represents training time for NGCF, LightGCN, and DiffRec, while it refers to processing time for Mem-GF.
NGCF LightGCN DiffRec Mem-GF
Yelp 1.5×1041.5\times 10^{4} 1.1×1041.1\times 10^{4} 1.4×1041.4\times 10^{4} 1.5\bf 1.5
Amazon-book 7.1×1047.1\times 10^{4} 4.6×1044.6\times 10^{4} 3.7×1043.7\times 10^{4} 6.6\bf 6.6
ML-20M 4.6×1054.6\times 10^{5} 3.1×1053.1\times 10^{5} 1.1×1041.1\times 10^{4} 102.2\bf 102.2
Model training
TABLE V: Inference time and peak GPU memory usage for representative GF-based methods and Mem-GF with different polynomial orders on the Yelp dataset.
Method Order Time (s) VRAM (GB)
GF-CF - 0.1197 28.79
Turbo-CF 1 0.1348 6.60
Mem-GF-1 1 0.0044 0.57
Mem-GF-2 2 0.0077 0.57
Mem-GF-3 3 0.0123 0.57
Mem-GF-5 5 0.0145 0.57

V-C Inference Efficiency Analysis (RQ2)

Table V reports the single-batch inference cost (i.e., the convolution with graph signals) on the Yelp dataset, as other datasets exhibit similar trends. The comparison assumes that the graph filter (H(𝐋)H(\mathbf{L}) in (3)) has already been precomputed and focuses only on runtime and peak GPU memory usage during inference.

  1. (i)

    By confining the graph convolution entirely within a low-dimensional Krylov subspace, Mem-GF completes inference in 0.00440.0044 seconds, yielding up to a 26.2×\times speedup over GF-CF and Turbo-CF, both of which require convolution with a full ||×|||\mathcal{I}|\times|\mathcal{I}| graph filter. Even with the fifth-order filter, Mem-GF maintains low latency, supporting real-time recommendation systems.

  2. (ii)

    Since Mem-GF avoids explicit construction and storage of the full-size ||×|||\mathcal{I}|\times|\mathcal{I}| matrix, it achieves over a 10.5×\times reduction in peak VRAM usage compared to Turbo-CF. The memory footprint remains nearly unaffected by the polynomial order because the compact Krylov basis dominates the filtering cost, whereas GF-CF requires dense filter storage for ideal LPF computations.

Refer to caption
Figure 5: Memory usage (left) and runtime (right) of Mem-GF as the number of users (|𝒰||\mathcal{U}|), items (|||\mathcal{I}|), and non-zero interactions (nnz(R)\mathrm{nnz}(R)) increases. Here, the linear asymptotic line is depicted as a red-dashed line for reference.

V-D Scalability Analysis (RQ3)

To empirically validate our theoretical complexity analysis in Section IV-B4, we conduct experiments on synthetic datasets (i.e., interaction matrices) with varying scales of users, items, and interaction densities. Specifically, in Fig. 5, we present how Mem-GF’s memory usage and runtime for preprocessing scale as |𝒰||\mathcal{U}|, |||\mathcal{I}|, and nnz(R)\mathrm{nnz}(R) grow. By default, the number of users, items, and interactions was set to 5,000, 5,000, and 225,000, respectively. Furthermore, in Fig. 6, we compare Mem-GF with two representative GF-based CF competitors (GF-CF and Turbo-CF) when |||\mathcal{I}| increases. For a fair comparison with GF-CF, both Turbo-CF and Mem-GF employ the first-order polynomial graph filter. We highlight the following observations:

  1. (i)

    As shown in the left side of Fig. 5, Mem-GF exhibits memory usage growing nearly linearly with |𝒰||\mathcal{U}|, |||\mathcal{I}|, and nnz(R)\mathrm{nnz}(R). Notably, the memory footprint is significantly reduced compared to the case of forming the item similarity matrix of size ||×|||\mathcal{I}|\times|\mathcal{I}|, allowing Mem-GF to make full use of GPU acceleration even on large datasets. We note that the size of RAM does not tend to scale with the increasing data dimension as long as VRAM is sufficiently available.

  2. (ii)

    As shown in the right side of Fig. 5, the runtime of Mem-GF also scales linearly as each data dimension grows. This validates the scalability of Mem-GF. Moreover, since Mem-GF efficiently operates within the GPU VRAM, its runtime only takes less than 11 seconds even when nnz(R)\mathrm{nnz}(R) increases up to 10710^{7} via the exploitation of the GPU resource.

  3. (iii)

    In Fig. 6, we observe that Mem-GF is substantially more scalable in both memory and runtime, compared to GF-CF and Turbo-CF. In particular, GF-CF incurs excessive memory overhead due to its dense ideal LPF matrix having dimensionality of (||,||)(|\mathcal{I}|,|\mathcal{I}|), thus yielding superlinear complexity in practice.

These results also indicate how sparsity affects Mem-GF: increasing nnz(𝐑)\mathrm{nnz}(\mathbf{R}) raises runtime through sparse matrix-vector products, but the growth remains close to linear because the full item similarity matrix is never formed.

Refer to caption
Figure 6: Total memory (left) and runtime (right) comparison among representative GF-based CF methods w.r.t. |||\mathcal{I}|.
TABLE VI: Performance comparison among Mem-GF and competitors. The best and second-best performers are highlighted in bold and underlined, respectively. The improvements of Mem-GF over the best competitors are all statistically significant with pp-values 0.01\leq 0.01.
Yelp Amazon-book ML-20M
Method Recall NDCG Recall NDCG Recall NDCG
MF-BPR 0.0433 0.0354 0.0250 0.0190 0.0219 0.1108
NeuMF 0.0451 0.0363 0.0258 0.0200 0.0288 0.1437
NGCF 0.0579 0.0477 0.0344 0.0263 0.0307 0.1409
LightGCN 0.0649 0.0530 0.0411 0.0315 0.0402 0.1702
DGCF 0.0654 0.0534 0.0422 0.0324 OOM OOM
SGL 0.0675 0.0555 0.0478 0.0379 0.0388 0.1677
SGDE 0.0693 0.0563 0.0483 0.0391 0.0398 0.1683
Multi-DAE 0.0562 0.0434 0.0410 0.0322 0.0385 0.1666
Multi-VAE 0.0584 0.0450 0.0407 0.0315 0.0402 0.1702
RecVAE 0.0579 0.0432 0.0400 0.0302 0.0414 0.1732
SVD-AE 0.0681 0.0572 0.0602 0.0479 0.0536 0.2379
SimpleX 0.0701 0.0575 0.0583 0.0400 0.0304 0.1451
LinkProp 0.0676 0.0559 0.0684 0.0559 OOM OOM
LinkProp-Multi 0.0690 0.0571 0.0721 0.0588 OOM OOM
DiffRec 0.0656 0.0552 0.0514 0.0418 0.0549 0.2258
L-DiffRec 0.0614 0.0518 0.0502 0.0409 0.0470 0.1837
FlowCF 0.0686 0.0587 0.0497 0.0431 0.0531 0.2340
EASE 0.0657 0.0552 0.0710 0.0567 0.0522 0.2264
GF-CF 0.0697 0.0571 0.0710 0.0584 0.0487 0.2076
PGSP 0.0710 0.0583 0.0712 0.0587 OOM OOM
FPSR 0.0703 0.0584 0.0715 0.0588 0.0493 0.2104
Turbo-CF 0.0693 0.0574 0.0730 0.0611 0.0534 0.2299
HiGSP OOM OOM OOM OOM OOM OOM
Mem-GF (Ours) 0.0705 0.0586 0.0753 0.0620 0.0566 0.2402

V-E Recommendation Accuracy (RQ4)

Table VI summarizes the recommendation accuracy of Mem-GF and various competitors across three distinct datasets. The main findings are as follows:

  1. (i)

    Despite its exceptional computational efficiency, Mem-GF achieves the highest accuracy for all datasets (except the recall on Yelp), underscoring its robustness across diverse data characteristics.

  2. (ii)

    In particular, the gains of Mem-GF over training-dependent models are indeed substantial across all datasets.

  3. (iii)

    Mem-GF consistently exhibits gains over Turbo-CF, another polynomial GF approach. On the Amazon-book dataset, Turbo-CF was restricted only to the first-order linear filter to avoid OOM issues (see Table III), whereas Mem-GF can freely use higher-order filters without incurring any resource scarcity issues on a single device. This design can improve accuracy empirically, not because memory reduction itself guarantees better predictions, but because expressive polynomial filters can be selected within the user-specific Krylov subspace without being constrained by the storage cost of full-size graph filters.

TABLE VII: Recall performance across different values of KK and polynomial filters. Here, Mem-GF-k denotes an ablation model that omits the Krylov basis construction in Mem-GF. Cases where KNK\leq N are marked in gray.
Yelp
KK Mem-GF-1 Mem-GF-2 Mem-GF-3 Mem-GF-5
1 0.0035 0.0035 0.0035 0.0035
2 0.0684 0.0684 0.0684 0.0684
3 0.0684 0.0692 0.0683 0.0695
4 0.0684 0.0692 0.0683 0.0695
5 0.0684 0.0692 0.0683 0.0701
6 0.0684 0.0692 0.0683 0.0703
Mem-GF-k 0.0684 0.0692 0.0683 0.0703
Amazon-book
KK Mem-GF-1 Mem-GF-2 Mem-GF-3 Mem-GF-5
1 0.0011 0.0011 0.0011 0.0011
2 0.0710 0.0628 0.0710 0.0699
3 0.0710 0.0754 0.0707 0.0710
4 0.0710 0.0754 0.0707 0.0745
5 0.0710 0.0754 0.0707 0.0739
6 0.0710 0.0754 0.0707 0.0739
Mem-GF-k 0.0710 0.0754 0.0707 0.0739
ML-20M
KK Mem-GF-1 Mem-GF-2 Mem-GF-3 Mem-GF-5
1 0.0248 0.0248 0.0248 0.0248
2 0.0554 0.0554 0.0554 0.0554
3 0.0554 0.0566 0.0552 0.0566
4 0.0554 0.0566 0.0516 0.0565
5 0.0554 0.0566 0.0516 0.0564
6 0.0554 0.0566 0.0516 0.0564
Mem-GF-k 0.0554 0.0566 0.0516 0.0564

V-F Empirical Validation of Theorem IV.1 (RQ5)

Table VII summarizes our experiments across all datasets, where we varied the number of Krylov basis vectors (KK) for polynomial filters of different maximum polynomial orders (NN). Here, Mem-GF-k denotes an ablation model that omits the Krylov basis construction in Mem-GF. It is observed that, for an NN-th order polynomial filter, performance remains consistent with that of Mem-GF-k when KN+1K\geq N+1. This finding confirms that setting K=N+1K=N+1 produces no approximation error under the theorem condition. In contrast, the gray entries with KNK\leq N illustrate regimes where the guarantee does not apply and performance can become unstable.

V-G Analysis of Polynomial Graph Filters (RQ6)

Table VIII shows that the optimal polynomial filter varies depending on datasets; for instance, the fifth-order filter (Mem-GF-5) achieves the highest recall on Yelp, whereas the second-order filter (Mem-GF-2) performs best on Amazon-book and ML-20M. As expected, higher-order filters incur longer runtime. Nonetheless, since the overall computation time remains manageable and the search space is restricted to four predefined filters, it is feasible to perform a validation set-based search. Increasing the polynomial order NN can improve filter expressiveness, but a sufficiently large Krylov subspace size KK is required; in particular, the exactness guarantee holds under exact arithmetic when N<KN<K, whereas regimes with KNK\leq N may become unstable.

TABLE VIII: Runtime for preprocessing and recall according to four different types of polynomial graph filters in Mem-GF.
Yelp Amazon-book ML-20M
Runtime Recall Runtime Recall Runtime Recall
Mem-GF-1 1.5s 0.0687 6.6s 0.0730 102.2s 0.0554
Mem-GF-2 2.9s 0.0692 11.4s 0.0753 156.6s 0.0566
Mem-GF-3 4.3s 0.0683 16.4s 0.0693 209.1s 0.0516
Mem-GF-5 6.5s 0.0705 26.4s 0.0739 314.1s 0.0564
Refer to caption
Figure 7: The effect of two hyperparameters γ\gamma and ss, on the accuracy of Mem-GF.

V-H Sensitivity Analysis (RQ7)

Fig. 7 visualizes the impact of two key hyperparameters in Mem-GF: the normalization parameter γ\gamma in (6) and the adjustment parameter ss in (9). First, while Yelp attains its highest accuracy at γ=0.55\gamma=0.55, both Amazon-book and ML-20M achieve optimal performance at γ=0.5\gamma=0.5. Notably, using the conventional choice of γ=0.5\gamma=0.5 still yields sufficiently robust accuracy across all datasets. Second, the optimal values of ss are 0.90.9 on both Yelp and Amazon-book, and 1.01.0 on ML-20M, indicating that setting closely to s=1s=1 generally yields robust performance. In addition, we observe a notable performance drop on Yelp and ML-20M when s<0.9s<0.9 and s<1.0s<1.0, respectively, whereas, outside these ranges, performance remains relatively stable. This aligns with the findings in [31], highlighting the importance of selecting proper ranges of ss to ensure optimal recommendation accuracy.

V-I Practical Deployment and Limitations

Mem-GF can be integrated into real-world recommender systems as a training-free GF-based CF module for candidate generation or scoring, and its output scores can be passed to subsequent ranking or re-ranking modules. Its user-specific Krylov subspace construction and score computation are naturally partitionable across workers or GPUs, since they can be performed independently for different users once the adjusted interaction matrix is available. Fully distributed system-level optimization, including distributed data management and cache-refresh strategies, is left as future work. The main limitations are as follows. First, Mem-GF is most directly suited to item similarity graphs built from user–item interactions; social networks, knowledge graphs, or other heterogeneous graph structures may require additional graph filter designs. Second, extremely sparse or noisy user signals, including cold-start-like cases with very few observed interactions, may degrade recommendation quality because the resulting user-specific Krylov subspace may contain insufficient or noisy preference information. In addition, very high polynomial orders may increase numerical sensitivity or computational overhead when KK is not sufficiently large. These issues can be mitigated through validation-based tuning of KK, NN, γ\gamma, and ss.

VI Conclusions and Outlook

In this paper, we introduced Mem-GF, a training-free and memory-efficient GF method for CF. By operating within user-specific Krylov subspaces, Mem-GF eliminates the need to construct or store the full item similarity matrix, achieving substantial reductions in memory usage and preprocessing/inference time. In addition to scalability, Mem-GF supports expressive polynomial filters with low overhead. We further provided the condition under which Krylov filtering incurs no approximation error. Extensive experiments across three benchmark datasets demonstrated that Mem-GF achieves up to 574% memory reduction, 438% preprocessing speedup, 2620% inference speedup, and state-of-the-art recommendation accuracy. Future work includes extending Mem-GF to heterogeneous graph structures and distributed recommendation systems.

Acknowledgement

This work was supported by the National Research Foundation of Korea (NRF), a South Korea grant funded by the Korea government (MSIT) (RS-2021-NR059723), and by SMEs Technology Innovation Development Program through the Technology Innovation and Promotion Agency (TIPA), funded by the Ministry of SMEs and Startups (RS-2024-00511332).

References

  • [1] F. Borisyuk, S. He, Y. Ouyang, M. Ramezani, P. Du, X. Hou, C. Jiang, N. Pasumarthy, P. Bannur, B. Tiwana, et al. (2024-Aug.) LiGNN: graph neural networks at linkedin. In Proc. of the 30th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining (KDD 2024), Barcelona, Spain, pp. 4793–4803. Cited by: §I-A.
  • [2] D.-K. Chae, J.-S. Kang, S.-W. Kim, and J.-T. Lee (2018-Oct.) CFGAN: a generic collaborative filtering framework based on generative adversarial networks. In Proc. 27th ACM Int. Conf. Inf. Knowl. Manag. (CIKM 2018), Torino, Italy, pp. 137–146. Cited by: §V-A.
  • [3] P. Covington, J. Adams, and E. Sargin (2016-Sep.) Deep neural networks for YouTube recommendations. In Proc. of the 10th ACM Conf. on Recommender Systems (RecSys 2016), Boston, MA, USA, pp. 191–198. Cited by: §I-A.
  • [4] H.-M. Fu, P. Poirson, K. S. Lee, and C. Wang (2022) Revisiting neighborhood-based link prediction for collaborative filtering. In Proc. ACM Web Conf. (WWW 2022), Lyon, France, pp. 1009–1018. Cited by: §V-A.
  • [5] M. Fu, H. Qu, Z. Yi, L. Lu, and Y. Liu (2019-Mar.) A novel deep learning-based collaborative filtering model for recommendation system. IEEE Trans. Cybern. 49, pp. 1084–1096. Cited by: §I-A.
  • [6] C. A. Gomez-Uribe and N. Hunt (2015-Dec.) The Netflix recommender system: algorithms, business value, and innovation. ACM Trans. Manage. Inf. Syst. 6 (4), pp. 1–19. Cited by: §I-A.
  • [7] Y. Gong, Z. Jiang, Y. Feng, B. Hu, K. Zhao, Q. Liu, and W. Ou (2020-Oct.) EdgeRec: recommender system on edge in mobile Taobao. In Proc. of the 29th ACM Int. Conf. on Information & Knowledge Management (CIKM 2020), Virtual Event, pp. 2477–2484. Cited by: §I-A.
  • [8] R. He and J. McAuley (2016-Apr.) Ups and downs: modeling the visual evolution of fashion trends with one-class collaborative filtering. In Proc. of the 25th Int. Conf. on World Wide Web (WWW 2016), Montréal, QC, Canada, pp. 507–517. Cited by: §I-A.
  • [9] R. He and J. McAuley (2016-Feb.) VBPR: visual bayesian personalized ranking from implicit feedback. In Proc. of the 30th AAAI Conf. on Artificial Intelligence (AAAI 2016), Phoenix, AZ, USA, pp. 144–150. Cited by: §II, §V-A, §V-A.
  • [10] X. He, K. Deng, X. Wang, Y. Li, Y. Zhang, and M. Wang (2020-Jul.) LightGCN: simplifying and powering graph convolution network for recommendation. In Proc. of the 43rd Int. ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR 2020), Virtual Event, pp. 639–648. Cited by: §I-A, §I-A, §II, §V-A, §V-A, §V-A.
  • [11] X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T.-S. Chua (2017-Apr.) Neural collaborative filtering. In Proc. of the 26th Int. Conf. on World Wide Web (WWW 2017), Perth, Australia, pp. 173–182. Cited by: §I-A, §II, §V-A, §V-A, §V-A.
  • [12] S. Hong, J. Choi, Y.-C. Lee, S. Kumar, and N. Park (2024-Aug.) SVD-AE: simple autoencoders for collaborative filtering. In Proc. of the 33rd Int. Joint Conf. on Artificial Intelligence (IJCAI 2024), Jeju, Republic of Korea. Cited by: §V-A, §V-A.
  • [13] K. Ito and K. Xiong (2000-05) Gaussian filters for nonlinear filtering problems. IEEE Trans. Autom. Control 45 (5), pp. 910–927. Cited by: §I-B, §IV-B3.
  • [14] W. Jiang, Z. He, S. Zhang, T. B. Preußer, K. Zeng, L. Feng, J. Zhang, T. Liu, Y. Li, J. Zhou, et al. (2021-Apr.) MicroRec: efficient recommendation inference by hardware and data structure solutions. In Proc. of the 4th Conf. on Machine Learning and Systems (MLSys 2021), Virtual Event, pp. 845–859. Cited by: §I-B.
  • [15] C. Kim, Y. Choi, J.-D. Park, and W.-Y. Shin (2025-Apr.) Leveraging member-group relations via multi-view graph filtering for effective group recommendation. In Proc. of the 34th Int. Conf. on World Wide Web (WWW 2025), Sydney, NSW, Australia, pp. 1077–1081. Cited by: §I-A, §I-A, §IV-B1, §IV-B3, §IV-B3, §V-A.
  • [16] C. Kim, J. Sung, Y. Han, and J. Lee (2025) Graph spectral filtering with chebyshev interpolation for recommendation. In Proc. 48th Int. ACM SIGIR Conf. Res. Dev. Inf. Retr. (SIGIR 2025), pp. 1964–1974. Cited by: §II.
  • [17] D. P. Kingma and J. Ba (2015) Adam: A method for stochastic optimization. In Proc. 3rd Int. Conf. Learn. Represent. (ICLR 2015), San Diego, CA, pp. 1–15. Cited by: §V-A.
  • [18] T. N. Kipf and M. Welling (2017-Apr.) Semi-supervised classification with graph convolutional networks. In Proc. of the 5th Int. Conf. on Learning Representations (ICLR 2017), Toulon, France, pp. 1–14. Cited by: §I-A, §I-A.
  • [19] C. Lanczos (1950-Oct.) An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. J. Res. Natl. Bur. Stand. 45 (4), pp. 255–282. Cited by: §IV-B2, §IV-B.
  • [20] J. Lee, S. Kang, W.-Y. Shin, J. Choi, N. Park, and D. Lee (2024) Graph signal processing for cross-domain recommendation. arXiv preprint arXiv:2407.12374. Cited by: §I-A.
  • [21] M. Li, X. Guo, Y. Wang, Y. Wang, and Z. Lin (2022-Jul.) G2CN: graph gaussian convolution networks with concentrated graph filters. In Proc. of the 39th Int. Conf. on Machine Learning (ICML 2022), Baltimore, MD, USA, pp. 12782–12796. Cited by: §I-B, §IV-B3.
  • [22] D. Lian, H. Wang, Z. Liu, J. Lian, E. Chen, and X. Xie (2020-Apr.) LightRec: a memory and search-efficient recommender system. In Proc. of the 29th Int. Conf. on World Wide Web (WWW 2020), Taipei, Taiwan, pp. 695–705. Cited by: §I-B.
  • [23] D. Liang, R. G. Krishnan, M. D. Hoffman, and T. Jebara (2018-Apr.) Variational autoencoders for collaborative filtering. In Proc. of the 2018 Int. World Wide Web Conf. (WWW 2018), Lyon, France, pp. 689–698. Cited by: §I-A, §II, §V-A.
  • [24] R. Liao, Z. Zhao, R. Urtasun, and R. S. Zemel (2019-05) LanczosNet: multi-scale deep graph convolutional networks. Int. Conf. Learn. Represent. (ICLR). Cited by: §II, §V-A.
  • [25] G. Linden, B. Smith, and J. York (2003-Feb.) Amazon.com recommendations: item-to-item collaborative filtering. IEEE Internet Comput. 7 (1), pp. 76–80. Cited by: §I-A.
  • [26] C. Liu, Y. Zhang, J. Wang, R. Ying, and J. Caverlee (2025) Flow matching for collaborative filtering. In Proc. 31st ACM SIGKDD Conf. Knowl. Discov. Data Min. (KDD 2025), Toronto, Canada, pp. 1765–1775. Cited by: §V-A.
  • [27] J. Liu, D. Li, H. Gu, T. Lu, P. Zhang, L. Shang, and N. Gu (2023-Apr.) Personalized graph signal processing for collaborative filtering. In Proc. of the 32nd Int. Conf. on World Wide Web (WWW 2023), Austin, TX, USA, pp. 1264–1272. Cited by: §I-A, §I-A, §II, §IV-A, §IV-A, §V-A, §V-A, §V-A, Remark.
  • [28] K. Mao, J. Zhu, J. Wang, Q. Dai, Z. Dong, X. Xiao, and X. He (2021-Nov.) SimpleX: a simple and strong baseline for collaborative filtering. In Proc. of the 30th ACM Int. Conf. on Information and Knowledge Management (CIKM 2021), Virtual Event, pp. 1243–1252. Cited by: §V-A.
  • [29] A. Ortega, P. Frossard, J. Kovačević, J. M. F. Moura, and P. Vandergheynst (2018-05) Graph signal processing: overview, challenges, and applications. Proc. IEEE 106 (5), pp. 808–828. Cited by: Definition III.1, §IV-A.
  • [30] J.-D. Park, S. Li, X. Cao, and W.-Y. Shin (2023-Aug.) Criteria tell you more than ratings: criteria preference-aware light graph convolution for effective multi-criteria recommendation. In Proc. of the 29th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining (KDD 2023), Long Beach, CA, USA, pp. 1356–1365. Cited by: §I-A.
  • [31] J.-D. Park, Y.-M. Shin, and W.-Y. Shin (2024-Jul.) Turbo-CF: matrix decomposition-free graph filtering for fast recommendation. In Proc. of the 47th Int. ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR 2024), Washington, DC, USA, pp. 2672–2676. Cited by: §I-A, §I-A, §II, 3rd item, §IV-A, §IV-A, §IV-A, §IV-A, §IV-B1, §IV-B3, §IV-B3, §IV-B3, §V-A, §V-A, §V-A, §V-H.
  • [32] J.-D. Park, J. Yoo, and W.-Y. Shin (2025-Apr.) Criteria-aware graph filtering: extremely fast yet accurate multi-criteria recommendation. In Proc. of the 34th Int. Conf. on World Wide Web (WWW 2025), Sydney, NSW, Australia, pp. 4482–4493. Cited by: §I-A, §IV-B1, §IV-B3, Remark.
  • [33] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. (2019) PyTorch: an imperative style, high-performance deep learning library. In Proc. 33rd Conf. Neural Inf. Process. Syst. (NeurIPS 2019), Vancouver, BC, Canada. Cited by: §V-A.
  • [34] S. Peng, X. Liu, K. Sugiyama, and T. Mine (2024) How powerful is graph filtering for recommendation. In Proc. 30th ACM SIGKDD Conf. Knowl. Discov. Data Min. (KDD 2024), Barcelona, Spain, pp. 2388–2399. Cited by: §II.
  • [35] S. Peng, K. Sugiyama, and T. Mine (2024-Jan.) Less is more: removing redundancy of graph convolutional networks for recommendation. ACM Trans. Inf. Syst. 42 (3), pp. 1–26. Cited by: §V-A.
  • [36] Y. Qin, W. Ju, Y. Gu, Z. Qiao, Z. Xiao, and M. Zhang (2025-Jun.) PolyCF: towards optimal spectral graph filters for collaborative filtering. ACM Trans. Inf. Syst. 43 (4), pp. 1–28. Cited by: §I-A, §IV-A, §IV-A, §IV-B3, §V-A.
  • [37] R. Ramakrishna, H.-T. Wai, and A. Scaglione (2020-Nov.) A user guide to low-pass graph signal processing and its applications: tools and applications. IEEE Signal Process. Mag. 37 (6), pp. 74–85. Cited by: Definition III.3.
  • [38] N. Rao, H.-F. Yu, P. K. Ravikumar, and I. S. Dhillon (2015-Dec.) Collaborative filtering with graph information: consistency and scalable methods. In Proc. 28th Adv. Neural Inf. Process. Syst. (NIPS 2015), Montreal, Canada, pp. 2107–2115. Cited by: §II.
  • [39] S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme (2009-Jun.) BPR: Bayesian personalized ranking from implicit feedback. In Proc. of the 25th Conf. on Uncertainty in Artificial Intelligence (UAI 2009), Montreal, QC, Canada, pp. 452–461. Cited by: §V-A.
  • [40] Y.-S. Roh, J.-Y. Kim, J.-D. Park, and W.-Y. Shin (2026-Jun.) Training-free adjustable polynomial graph filtering for ultra-fast multimodal recommendation. Engineering Applications of Artificial Intelligence 173, pp. 1–14. Cited by: §I-A.
  • [41] Y. Saad (2003) Iterative methods for sparse linear systems. SIAM. Cited by: item 2.
  • [42] Y. Saad (2011) Numerical methods for large eigenvalue problems: revised edition. Vol. 66, SIAM. Cited by: §I-B, Definition III.4.
  • [43] Y. Shen, Y. Wu, Y. Zhang, C. Shan, J. Zhang, B. K. Letaief, and D. Li (2021-Nov.) How powerful is graph convolution for recommendation?. In Proc. of the 30th ACM Int. Conf. on Information and Knowledge Management (CIKM 2021), Gold Coast, Queensland, Australia, pp. 1619–1629. Cited by: §I-A, §I-A, §II, §III-A, §III-A, Definition III.1, Definition III.2, Definition III.3, §IV-A, §IV-A, §IV-A, §IV-A, §IV-B3, §V-A, §V-A, §V-A.
  • [44] I. Shenbin, A. Alekseev, E. Tutubalina, V. Malykh, and S. I. Nikolenko (2020-Feb.) RecVAE: a new variational autoencoder for top-N recommendations with implicit feedback. In Proc. of the 13th ACM Int. Conf. on Web Search and Data Mining (WSDM 2020), Houston, TX, USA, pp. 528–536. Cited by: §II, §V-A.
  • [45] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst (2013-05) The emerging field of signal processing on graphs: extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Process. Mag. 30 (3), pp. 83–98. Cited by: §III-A, Definition III.1, Definition III.2.
  • [46] H. Steck (2019-05) Embarrassingly shallow autoencoders for sparse data. In Proc. of the 28th Int. Conf. on World Wide Web (WWW 2019), San Francisco, CA, USA, pp. 3251–3257. Cited by: §I-A, §II, §V-A, §V-A.
  • [47] L. N. Trefethen and D. B. III (1997) Numerical linear algebra. SIAM. Cited by: §I-B, Definition III.4.
  • [48] R. van den Berg, T. N. Kipf, and M. Welling (2018-Aug.) Graph convolutional matrix completion. In Proc. KDD’18 Deep Learning Day, London, United Kingdom. Cited by: §II, §V-A, §V-A.
  • [49] P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, et al. (2020-Feb.) SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature Methods 17 (3), pp. 261–272. Cited by: §V-A.
  • [50] H.-T. Wai, S. Segarra, A. E. Ozdaglar, A. Scaglione, and A. Jadbabaie (2020-Jan.) Blind community detection from low-rank excitations of a graph filter. IEEE Trans. Signal Process. 68, pp. 436–451. Cited by: Definition III.3.
  • [51] W. Wang, Y. Xu, F. Feng, X. Lin, X. He, and T.-S. Chua (2023-Jul.) Diffusion recommender model. In Proc. 46th Int. ACM SIGIR Conf. Res. Dev. Inf. Retr. (SIGIR 2023), Taipei, Taiwan, pp. 832–841. Cited by: §V-A.
  • [52] X. Wang, X. He, M. Wang, F. Feng, and T.-S. Chua (2019-Jul.) Neural graph collaborative filtering. In Proc. of the 42nd Int. ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR 2019), Paris, France, pp. 165–174. Cited by: §I-A, §II, §V-A, §V-A, §V-A.
  • [53] X. Wang, H. Jin, A. Zhang, X. He, T. Xu, and T.-S. Chua (2020-Jul.) Disentangled graph collaborative filtering. In Proc. of the 43rd Int. ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR 2020), Virtual Event, pp. 1001–1010. Cited by: §V-A, §V-A, §V-A.
  • [54] T. Wei, J. Ma, and T. W. S. Chow (2023-Apr.) Fine-tuning partition-aware item similarities for efficient and scalable recommendation. In Proc. of the 32nd Int. Conf. on World Wide Web (WWW 2023), Austin, TX, USA, pp. 823–832. Cited by: §I-A, §V-A, §V-A.
  • [55] J. Wu, X. Wang, F. Feng, X. He, L. Chen, J. Lian, and X. Xie (2021-Jul.) Self-supervised graph learning for recommendation. In Proc. of the 44th Int. ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR 2021), Virtual Event, pp. 726–735. Cited by: §II, §V-A.
  • [56] S. Wu, Y. Tang, Y. Zhu, L. Wang, X. Xie, and T. Tan (2019-Jan.) Session-based recommendation with graph neural networks. In Proc. of the 33rd AAAI Conf. on Artificial Intelligence (AAAI 2019), Honolulu, HI, USA, pp. 346–353. Cited by: §I-A.
  • [57] Y. Wu, C. DuBois, A. X. Zheng, and M. Ester (2016) Collaborative denoising auto-encoders for top-N recommender systems. In Proc. 9th ACM Int. Conf. Web Search Data Mining (WSDM 2016), pp. 153–162. Cited by: §II.
  • [58] J. Xia, D. Li, H. Gu, J. Liu, T. Lu, and N. Gu (2022-Apr.) FIRE: fast incremental recommendation with graph signal processing. In Proc. of the 31st Int. Conf. on World Wide Web (WWW 2022), Lyon, France, pp. 1808–1819. Cited by: §I-A, §I-A, Definition III.1, Definition III.2, §IV-A, §IV-A.
  • [59] J. Xia, D. Li, H. Gu, T. Lu, P. Zhang, L. Shang, and N. Gu (2024-05) Hierarchical graph signal processing for collaborative filtering. In Proc. of the 33rd Int. Conf. on World Wide Web (WWW 2024), Singapore, pp. 3229–3240. Cited by: §I-A, §I-A, §II, §IV-A, §IV-A, §V-A, §V-A, §V-A.
  • [60] L. Xia, C. Huang, J. Shi, and Y. Xu (2023-Apr.) Graph-less collaborative filtering. In Proc. of the 32nd Int. Conf. on World Wide Web (WWW 2023), Austin, TX, USA, pp. 17–27. Cited by: §I-A.
  • [61] H.-J. Xue, X. Dai, J. Zhang, S. Huang, and J. Chen (2017) Deep matrix factorization models for recommender systems. In Proc. 26th Int. Joint Conf. Artif. Intell. (IJCAI 2017), Melbourne, Australia, pp. 3203–3209. Cited by: §II.
  • [62] J.-H. Yang, C.-M. Chen, C.-J. Wang, and M.-F. Tsai (2018) HOP-rec: high-order proximity for implicit recommendation. In Proc. 12th ACM Conf. Recommender Syst. (RecSys), Vancouver, BC, Canada, pp. 140–144. Cited by: §II.
  • [63] R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamilton, and J. Leskovec (2018-Aug.) Graph convolutional neural networks for web-scale recommender systems. In Proc. of the 24th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining (KDD 2018), London, United Kingdom, pp. 974–983. Cited by: §II, §V-A, §V-A.
  • [64] S. Zhang, Y. Liu, Y. Sun, and N. Shah (2022-Apr.) Graph-less neural networks: teaching old mlps new tricks via distillation. In Proc. of the 10th Int. Conf. on Learning Representations (ICLR 2022), Virtual Event. Cited by: §I-A.
  • [65] L. Zheng, C.-T. Lu, F. Jiang, J. Zhang, and P. S. Yu (2018-Sep.) Spectral collaborative filtering. In Proc. of the 12th ACM Conf. on Recommender Systems (RecSys 2018), Vancouver, BC, Canada, pp. 311–319. Cited by: §II, §V-A, §V-A.
  • [66] Y. Zhu and Z. Chen (2022) Mutually-regularized dual collaborative variational auto-encoder for recommendation systems. In WWW 2022, Lyon, France, pp. 2379–2387. Cited by: §II.