Memory Is No Longer a Bottleneck: Memory-Efficient Graph Filtering for Scalable Collaborative Filtering

Jin-Duk Park and Won-Yong Shin, Senior Member, IEEE J.-D. Park and W.-Y. Shin are with the School of Mathematics and Computing (Computational Science and Engineering), Yonsei University, Seoul 03722, Republic of Korea.
E-mail: {jindeok6, wy.shin}@yonsei.ac.kr (Corresponding author: Won-Yong Shin.)

Abstract

Graph convolutional networks (GCNs) have demonstrated significant success in capturing complex user–item relationships for collaborative filtering (CF). However, due to their reliance on extensive model training, training-free graph filtering (GF)-based CF methods have emerged as a promising alternative, offering computational efficiency by smoothing graph signals via matrix operations. In particular, polynomial GF-based approaches demonstrate improved accuracy through their ability to design more expressive and flexible filtering functions. Despite these advantages, existing GF methods suffer from a critical memory bottleneck: they necessitate storing the full item similarity graph, incurring prohibitive memory costs for large-scale datasets, which limits their practical applicability. To tackle this challenge, we propose Mem-GF (Memory-efficient GF), a new GF-based CF method that departs from conventional designs by principally leveraging the structure of Krylov subspaces as a core mechanism for approximating polynomial graph filters without explicitly storing the item similarity graph. We theoretically analyze the minimum Krylov subspace size that guarantees lossless approximation. Through extensive experiments, we demonstrate that Mem-GF achieves up to 5.74 $\times$ lower memory usage and 4.38 $\times$ speedup in runtime, while consistently exceeding the recommendation accuracy of state-of-the-art GF and GCN-based methods. Mem-GF robustly scales to datasets with tens of millions of interactions, establishing itself as a practically viable and theoretically grounded solution for efficient CF. The source code of Mem-GF is available at https://github.com/jindeok/Mem-GF.

I Introduction

I-A Background and Motivation

Recommender systems have become a cornerstone component of modern online platforms [25, 3, 6, 7]. By leveraging the information of historical user–item interactions, collaborative filtering (CF) has proven one of the most effective recommendation techniques in uncovering latent user interests and item affinities, facilitating personalization and relevance for positive user experiences [11, 23, 5, 10].

On one hand, graph convolutional networks (GCNs) [18, 10, 52] have emerged as a state-of-the-art CF solution. Thanks to their intrinsic ability to model the high-order connectivity between users and items, GCN-based approaches have achieved high accuracy in various recommendation scenarios [10, 52, 30, 56, 15]. However, GCNs inherently suffer from scalability issues due to their recursive message-passing nature, which may limit their applicability to large-scale systems [1, 60]. As a result, the training time required for GCNs is often significantly longer compared to other architectures such as multi-layer perceptrons (MLPs) that are widely used for CF [60, 64]. This increased computational demand can hinder the efficiency of real-time recommendation systems, making it crucial to explore alternative methods that balance accuracy and scalability.

On the other hand, from the spectral viewpoint, GCNs can be regarded as approximate and learnable graph filters via parameter optimization [18, 43, 27]. To mitigate the extensive training time associated with GCNs, recent research on CF has increasingly focused on graph filtering (GF)-based CF methods that leverage predefined graph filters instead of learnable ones [43, 31, 59, 58, 27]. Due to their training-free nature, GF has been successfully applied beyond standard training-based CF to diverse recommendation scenarios, including sequential [58], group [15], cross-domain [20], multi-modal [40], and multi-criteria [32] recommendations, demonstrating its broad applicability and effectiveness. By using the item similarity graph, these GF-based CF methods transform each user’s interaction vector (i.e., graph signals) via various forms of graph filters, such as linear/ideal low-pass filters (LPFs) or polynomial approximations, to improve computational efficiency of the GF process [43, 58, 27, 31, 59, 36].

Refer to caption — Figure 1: Memory usage (RAM/VRAM) comparison during preprocessing among six representative GF-based CF methods and our method (Mem-GF) on the Amazon-book dataset. Here, ‘OOM’ represents an out-of-memory error.

Despite these merits, many GF-based CF methods [43, 58, 27, 31, 59] still face a critical memory bottleneck when high-accuracy filters are deployed at scale. Fig. 1 showcases the RAM and VRAM (i.e., GPU memory) usage of six representative GF-based approaches (such as EASE [46], GF-CF [43], PGSP [27], Turbo-CF [31], FPSR [54], and HiGSP [59]) on the Amazon-book dataset [8]. In a single-node environment equipped with 48 GB of VRAM and 128 GB of RAM, most of these methods suffer from out-of-memory (OOM) errors, underscoring their limited feasibility for large-scale recommendation scenarios. The core design issue is not merely implementation, but the reliance on a universal (i.e., user-agnostic) graph filter whose dimension is $|\mathcal{I}|\times|\mathcal{I}|$ , where $|\mathcal{I}|$ is the item cardinality, i.e., the number of items. Even when sparse matrix operations, top- $k$ eigenvalue components, or polynomial graph filters reduce computation, the filtering stage still tends to depend on the full-size item similarity matrix. As illustrated in Fig. 2, applying such filters uniformly to all user signals severely limits scalability and can make real-time inference infeasible under memory-constrained environments.

I-B Main Contribution

To address this challenge, we introduce Mem-GF, a memory-efficient GF method for CF that goes beyond straightforward Krylov subspace usage. Instead of treating Krylov methods [42, 47] as a computational shortcut, Mem-GF uses them as the filtering principle itself. For each user, a Krylov subspace captures progressively transformed directions of the user’s interaction vector under repeated applications of the item similarity matrix. This enables polynomial graph filtering without explicitly storing the full item similarity matrix. Importantly, Mem-GF facilitates the design of versatile graph filters through high-order polynomial approximations, exemplified by the fifth-order polynomial that closely approximates Gaussian filters widely used in image and graph domains [13, 21]. Notably, Mem-GF offers significant advantages in both preprocessing and inference efficiency. In particular, its memory-efficient and low-latency inference capability addresses a critical requirement for deploying GF-based recommenders in real-world environments, where tight hardware constraints and real-time responsiveness are essential [14, 22].

Our contributions are summarized as follows:

•

Problem identification: We clarify the universal (i.e., user-agnostic) graph filter design that causes the memory bottleneck in many GF-based CF methods and empirically demonstrate its impracticality on large-scale datasets.
•

Principled Krylov-based solution: We propose a novel GF approach that integrates Krylov subspaces at its core, achieving per-user filter approximation without storing the full item similarity matrix, thereby enhancing filter design flexibility.
•

Theoretical support: We rigorously analyze the condition on the number of Krylov subspace bases for which no approximation error occurs. We also theoretically show that Mem-GF achieves linear space/time complexity to the number of items, users, and interactions.
•

Comprehensive evaluation: We empirically show that Mem-GF delivers 5.74 $\times$ memory savings and 4.38 $\times$ speedup. Despite being primarily designed for improving efficiency, it still achieves state-of-the-art recommendation accuracy across all benchmark datasets.

I-C Organization and Notations

The remainder of this paper is organized as follows. Section II surveys related work. Section III introduces the preliminaries by reviewing GF and Krylov subspaces. Section IV presents the proposed Mem-GF method, including its methodological details, theoretical analysis, and complexity analysis. Section V reports extensive experimental results and analyses evaluating efficiency, scalability, and accuracy. Section VI concludes the paper with a discussion of future research directions.

Table I summarizes the notation that is used in this paper. This notation will be formally defined in the following sections when we introduce our methodology.

TABLE I: Summary of notations.

Notation	Description
$\mathcal{U},\ \mathcal{I}$	Set of users, $\mathcal{U}$ , and set of items, $\mathcal{I}$
$\mathbf{R}$	User–item interaction matrix
$\mathbf{r}_{u}$	$u$ -th row of the interaction matrix $\mathbf{R}$ (graph signal for user $u$ )
$\tilde{\mathbf{R}}$	Degree-normalized interaction matrix: $\tilde{\mathbf{R}}=\mathbf{D}_{\mathcal{U}}^{\gamma}\mathbf{R}\mathbf{D}_{\mathcal{I}}^{1-\gamma}$
$\bar{\mathbf{R}}$	Adjusted interaction matrix: $\bar{\mathbf{R}}=\tilde{\mathbf{R}}^{\circ s}$
$\mathbf{D}_{\mathcal{U}},\ \mathbf{D}_{\mathcal{I}}$	Degree matrices: $\mathbf{D}_{\mathcal{U}}=\mathrm{diag}(\mathbf{R}\mathbf{1}),\ \mathbf{D}_{\mathcal{I}}=\mathrm{diag}(\mathbf{1}^{\top}\mathbf{R})$
$\tilde{\mathbf{P}}$	Item similarity matrix: $\tilde{\mathbf{P}}=\bar{\mathbf{R}}^{\top}\bar{\mathbf{R}}$ for Mem-GF and $\tilde{\mathbf{P}}=\tilde{\mathbf{R}}^{\top}\tilde{\mathbf{R}}$ for the other methods
$\mathbf{L},\ \mathbf{U},\ \boldsymbol{\Lambda}$	Graph Laplacian $\mathbf{L}$ , its eigenvectors $\mathbf{U}$ , and eigenvalues $\boldsymbol{\Lambda}$
$H(\cdot),\ h(\cdot)$	Graph filter $H(\cdot)$ and its frequency response $h(\cdot)$
$\mathbf{s}_{u}$	Predicted scores for user $u$ : $\mathbf{s}_{u}=\\|\mathbf{r}_{u}\\|_{2}\,\mathbf{Q}_{u}\,f(\mathbf{T}_{u})\,\mathbf{e}_{1}$
$\mathcal{K}_{K}(\cdot)$	$K$ -dimensional Krylov subspace
$\mathbf{Q}_{u},\ \mathbf{T}_{u}$	User-specific Krylov basis $\mathbf{Q}_{u}$ and projected tridiagonal matrix $\mathbf{T}_{u}=\mathbf{Q}_{u}^{\top}\tilde{\mathbf{P}}\mathbf{Q}_{u}$ in the Krylov subspace
$K$	Size of Krylov basis
$\gamma$	Hyperparameter for normalizing $\mathbf{R}$
$s$	Hyperparameter for adjusting $\tilde{\mathbf{R}}$ via the Hadamard power
$\mathbf{e}_{1}$	First canonical basis vector in $\mathbb{R}^{K}$

II Related Work

In this section, we review CF methods related to Mem-GF, with emphasis on scalable GF-based CF and Krylov-based graph learning.

MF-based CF methods. MF remains a foundational technique in CF, representing users and items as latent vectors and using their dot product to approximate interactions. Early approaches relied on low-rank decompositions [9], while NeuMF [11] enhanced expressiveness by introducing an MLP-based similarity function. GRMF [38] integrated graph Laplacian regularization, and DMF [61] leveraged deep neural architectures. HOP-rec [62] bridged MF and GCN by explicitly capturing higher-order relationships in user–item graphs.

GCN-based CF methods. Viewing user–item interactions as a bipartite graph enables GCN-based models to exploit high-order connectivity. GC-MC [48] merged GCNs with matrix completion for link prediction. PinSage [63] scaled graph convolution with random walks. SpectralCF [65] used spectral graph structure, NGCF [52] modeled higher-order collaborative signals, LightGCN [10] simplified propagation layers, and SGL [55] adopted contrastive learning for robust representation learning.

Autoencoder-based CF methods. Autoencoder-based approaches reconstruct each user’s interaction vector in a single latent space. CDAE [57] incorporated denoising, Multi-DAE and Multi-VAE [23] improved reconstruction with deep and variational encoders, RecVAE [44] added regularization, and MD-CVAE [66] injected item embeddings into latent user modeling.

GF-based CF methods. Training-free GF-based CF methods avoid iterative model training by filtering user signals on an item similarity graph. GF-CF [43] uses linear and ideal LPFs; EASE [46] can be interpreted from a graph signal processing viewpoint; PGSP [27] augments the similarity graph; HiGSP [59] customizes filters by user clusters; Turbo-CF [31] accelerates inference with predefined polynomial LPFs; SGFCF [34] uses top- $k$ singular values; and ChebyCF [16] employs Chebyshev polynomial filtering. These methods improve scalability through sparse matrix operations, top- $k$ eigenvalue components, polynomial graph filters, or user clustering, but they still commonly rely on a large item similarity graph or a full-size graph filter. In contrast, Mem-GF performs GF in a user-specific Krylov subspace and avoids constructing the full item similarity matrix.

Krylov-based graph learning. LanczosNet [24] also uses Krylov subspaces, but for learnable spectral graph convolution in GNNs. Its goal is representation learning over graph nodes, requiring training and inference over the graph. Mem-GF differs in its purpose and inference setting: it is a training-free CF method that builds per-user Krylov subspaces for the item similarity matrix to reduce memory usage during recommendation. Thus, Krylov subspaces are not used for training-based GNNs, but as the core mechanism for memory-efficient graph filtering in CF.

III Preliminaries

This section presents the preliminaries, providing an overview of the fundamental concepts underlying GF and Krylov subspaces.

III-A Characterization of GF

We begin by presenting the basic concepts of GF (also known as graph signal processing). First, let us consider a weighted, undirected graph $G=(V,E)$ , represented by an adjacency matrix $\mathbf{A}\in\mathbb{R}^{|V|\times|V|}$ . A graph signal assigns a scalar or feature vector to each node; in this paper, we primarily consider a scalar signal represented as $\mathbf{x}=[x_{1},x_{2},\ldots,x_{|V|}]^{T}\in\mathbb{R}^{|V|}$ , where $x_{i}$ indicates the signal strength of node $i$ . The smoothness of $\mathbf{x}$ on $G$ is quantified by the graph quadratic form, a measure based on the graph Laplacian $\mathbf{L}=\mathbf{D}-\mathbf{A}$ ,¹¹1Alternatively, one may use the normalized Laplacian $\mathbf{L}=\mathbf{I}-\tilde{\mathbf{A}}$ , where $\tilde{\mathbf{A}}=\mathbf{D}^{-1/2}\mathbf{A}\mathbf{D}^{-1/2}$ . where $\mathbf{D}=\text{diag}(\mathbf{A}\mathbf{1})$ is the degree matrix and $\mathbf{1}$ denotes the all-ones vector of any dimension for simplicity. The smoothness measure $S(\mathbf{x})$ is formally given by

S(\mathbf{x})=\sum_{i,j}A_{i,j}(x_{i}-x_{j})^{2}=\mathbf{x}^{T}\mathbf{L}\mathbf{x}.

(1)

The smaller the value of $S(\mathbf{x})$ , the smoother the signal $\mathbf{x}$ on the graph [45, 43]. Next, we define the graph Fourier transform (GFT) of a signal $\mathbf{x}$ as $\hat{\mathbf{x}}\;=\;\mathbf{U}^{T}\,\mathbf{x}$ , where $\mathbf{U}\in\mathbb{R}^{|V|\times|V|}$ comprises the eigenvectors of $\mathbf{L}$ . Specifically, from the eigen-decomposition $\mathbf{L}\;=\;\mathbf{U}\,\boldsymbol{\Lambda}\,\mathbf{U}^{T},$ let $\boldsymbol{\Lambda}=\mathrm{diag}(\lambda_{1},\dots,\lambda_{|V|})$ with ordered eigenvalues $\lambda_{1}\leq\dots\leq\lambda_{|V|}$ . Because the GFT is an orthogonal linear transform, the inverse GFT is $\mathbf{x}\;=\;\mathbf{U}\,\hat{\mathbf{x}}.$ Finally, we formalize the notions of a graph filter and graph convolution:

Definition III.1 (Graph filter [45, 43, 58, 29]).

Given a graph Laplacian $\mathbf{L}$ , a graph filter $H(\mathbf{L})\in\mathbb{R}^{|V|\times|V|}$ is defined as

H(\mathbf{L})\;=\;\mathbf{U}\,\mathrm{diag}\bigl(h(\lambda_{1}),\dots,h(\lambda_{|V|})\bigr)\,\mathbf{U}^{T},

(2)

where $h:\mathbb{R}_{+}\to\mathbb{R}$ is the frequency response function mapping each non-negative eigenvalue $\{\lambda_{1},\dots,\lambda_{|V|}\}$ of $\mathbf{L}$ to $\{h(\lambda_{1}),\dots,h(\lambda_{|V|})\}.$

Definition III.2 (Graph convolution [45, 43, 58]).

For a graph signal $\mathbf{x}$ and a graph filter $H(\mathbf{L})$ , it follows that

H(\mathbf{L})\,\mathbf{x}\;=\;\mathbf{U}\,\mathrm{diag}\bigl(h(\lambda_{1}),\dots,h(\lambda_{|V|})\bigr)\,\mathbf{U}^{T}\,\mathbf{x},

(3)

which first applies the GFT to $\mathbf{x}$ , transforms the resulting spectrum by $h(\cdot)$ , and finally performs the inverse GFT.

In signal processing, signals are typically characterized by their smoothness and low-frequency components, whereas noise is usually non-smooth and dominates at high frequencies [43]. In this context, a significant category of filters is LPFs, which enhance the smoothness of graph signals, thereby aiding noise reduction. We formally define the LPF as follows:

Definition III.3.

(LPF) [43, 37, 50]: For $k=1,\cdots,|V|$ and $\lambda_{1}\leq\cdots\leq\lambda_{|V|}$ , the graph filter $H(\mathbf{L})$ is $k$ -low-pass if and only if $\eta_{k}\in[0,1]$ , where

\eta_{k}:=\frac{max\{|h(\lambda_{k+1})|,\cdots,|h(\lambda_{|V|})|\}}{min\{|h(\lambda_{1})|,\cdots,|h(\lambda_{k})|\}}.

(4)

III-B Krylov Subspaces

Krylov subspaces compactly capture the repeated action of a large matrix on a vector. Thus, for a user signal, they approximate the spectral properties of the item similarity matrix using the directions generated from that user, instead of requiring a full matrix decomposition. They are mathematically formalized as follows.

Definition III.4.

(Krylov subspace [42, 47]) Given a matrix $\mathbf{M}\in\mathbb{R}^{d\times d}$ and a vector $\mathbf{r}\in\mathbb{R}^{d}$ , the $K$ -th order Krylov subspace generated by $(\mathbf{M},\mathbf{r})$ is defined as

\displaystyle\mathcal{K}_{K}(\mathbf{M},\mathbf{r})=\text{span}\{\mathbf{r},\mathbf{M}\mathbf{r},\mathbf{M}^{2}\mathbf{r},\ldots,\mathbf{M}^{K-1}\mathbf{r}\},

(5)

where $\mathbf{M}$ is the matrix of interest, $d$ is the dimensionality of the ambient space, and $\mathbf{r}$ is the initial vector from which the subspace is generated. Here, $K$ indicates the order of the subspace and $\text{span}\{\}$ denotes the set of all linear combinations of the given vectors.

IV Methodology

This section begins by outlining the practical challenges associated with GF-based CF methods. We then provide a comprehensive description of the proposed Mem-GF method, along with its theoretical analyses.

IV-A Challenges in GF-Based CF Methods

Conventional training-free GF-based CF methods [43, 58, 27, 31, 59] in canonical recommendation tasks begin with constructing an item similarity graph, where each item is represented as a node and the similarities between items are modeled as edges. Its adjacency matrix is generally expressed as

\displaystyle\tilde{\mathbf{P}}=\tilde{\mathbf{R}}^{\top}\tilde{\mathbf{R}};\tilde{\mathbf{R}}=\mathbf{D}^{\gamma}_{\mathcal{U}}\mathbf{R}\mathbf{D}^{1-\gamma}_{\mathcal{I}},

(6)

where $|\mathcal{U}|$ is the number of users in a dataset; $\mathbf{R}\in\mathbb{R}^{|\mathcal{U}|\times|\mathcal{I}|}$ is the interaction matrix; $\tilde{\mathbf{R}}$ is the normalized interaction matrix; $\mathbf{D}_{\mathcal{U}}=\text{diag}(\mathbf{R}\mathbf{1})$ and $\mathbf{D}_{\mathcal{I}}=\text{diag}(\mathbf{1}^{\top}\mathbf{R})$ ; $\gamma\in[0,1]$ controls the normalization along users/items; and $\tilde{\mathbf{P}}$ is the adjacency matrix representing the item similarity graph. Meanwhile, the standard notion of GF involves performing full eigenvalue decompositions of the Laplacian or adjacency matrices representing the item similarity graph [29], which incurs a prohibitive computational complexity of $O(|\mathcal{I}|^{3})$ . To prevent such computational issues, scalable GF-based CF methods exploit several techniques, including sparse matrix operations [43], top- $k$ eigenvalue components [43, 58], polynomial graph filters [31], and user clustering [59]. Nevertheless, they still depend on the full-size item similarity matrix during filtering. For example, GF-based CF methods [43, 58, 27, 59] typically leverage linear LPFs and/or ideal LPFs extracting only the top- $k$ eigenvalue components instead of fully decomposing the item similarity matrix:

\mathbf{s}_{u}=\bigl(\tilde{\mathbf{P}}+\mu\mathbf{D}^{-1/2}_{U}\bar{\mathbf{U}}\bar{\mathbf{U}}^{T}\mathbf{D}^{-1/2}_{I}\bigr)\mathbf{r}_{u},

(7)

where $\mathbf{s}_{u}\in\mathbb{R}^{|\mathcal{I}|}$ is the predicted preferences for user $u$ ; ${\mathbf{r}}_{u}$ denotes the $u$ -th row of $\mathbf{R}$ , treated as a column graph signal for user $u$ ; $\bar{\mathbf{U}}\in\mathbb{R}^{|\mathcal{I}|\times k}$ is the top- $k$ singular vectors of $\tilde{\mathbf{R}}_{0}$ ; $\tilde{\mathbf{P}}$ is the linear LPF; $\mathbf{D}^{-1/2}_{U}\bar{\mathbf{U}}\bar{\mathbf{U}}^{T}\mathbf{D}^{-1/2}_{I}$ is the ideal LPF of $\tilde{\mathbf{P}}$ ; and $\mu$ is a hyperparameter balancing between the two filters. However, due to their limited expressiveness in designing various filters having different frequency response functions, recent studies proposed to use polynomial graph filters to produce diverse frequency response functions using polynomial functions [31, 43, 36]. As representative work [31], the prediction score for user $u$ via polynomial GF is formulated as

\mathbf{s}_{u}=\sum_{n=1}^{N}{a_{n}}\tilde{\mathbf{P}}^{n}\mathbf{r}_{u},

(8)

where $\{a_{n}\}_{n=1}^{N}$ are the polynomial coefficients; $N$ is the maximum order of the matrix polynomial; and $\sum_{n=1}^{N}a_{n}\tilde{\mathbf{P}}^{n}$ represents the polynomial graph filter on the item similarity graph. It was proven that the polynomial graph filter $\sum_{n=1}^{N}{a_{n}}\tilde{\mathbf{P}}^{n}$ has a frequency response function of $\sum_{n=1}^{N}{a_{n}}(1-\lambda)^{n}$ [31, 43, 36].

While conventional GF-based CF methods have adopted user-specific batch strategies (see (7) and (8)), they still suffer from expensive costs for constructing or applying a universal (i.e., user-agnostic) graph filter across all users based on the large item similarity matrix $\tilde{\mathbf{P}}$ (see Fig. 2). For instance, a simple linear LPF (i.e., $\tilde{\mathbf{P}}$ ) having a frequency response function of $h(\lambda)=1-\lambda$ has a dimension of $(|\mathcal{I}|,|\mathcal{I}|)$ . Moreover, ideal LPFs or higher-order polynomial filters can further increase memory footprint and computation. As a result, devising a memory-efficient polynomial GF technique that can effectively operate under resource-constrained environments becomes a critical research challenge. This raises a natural question: “How can we optimize batch processing for each user to enhance memory efficiency in GF-based recommendations?” To answer this question, we elaborate on the proposed Mem-GF method in the following subsection.

IV-B Proposed Method: Mem-GF

To directly overcome the challenges outlined in Section IV-A, we propose Mem-GF, a memory-efficient CF method that leverages polynomial GF without incurring the prohibitive memory costs of conventional GF-based CF approaches. The core idea is to use the Krylov subspace for each user signal, thereby approximating matrix-vector operations without explicitly forming or storing large matrices such as $\tilde{\mathbf{P}}$ . The Krylov subspace $\mathcal{K}_{K}(\tilde{\mathbf{P}},\mathbf{r}_{u})$ in (5), defined as the span of vectors $\{\mathbf{r}_{u},\tilde{\mathbf{P}}\mathbf{r}_{u},\tilde{\mathbf{P}}^{2}\mathbf{r}_{u},\dots,$ $\tilde{\mathbf{P}}^{K-1}\mathbf{r}_{u}\}$ , captures progressively transformed directions of user $u$ ’s interaction vector under the repeated action of $\tilde{\mathbf{P}}$ . By restricting computations to $\mathcal{K}_{K}(\tilde{\mathbf{P}},\mathbf{r}_{u})$ , we approximate the spectral properties of $\tilde{\mathbf{P}}$ in the user-specific Krylov subspace. To construct its orthonormal basis, we utilize the Lanczos algorithm [19], which avoids direct eigenvalue/eigenvector computation. The schematic overview of the Mem-GF method is illustrated in Fig. 3. We summarize the pseudocode of Mem-GF in Algorithm 1.

IV-B1 Methodological Details

Mem-GF first refines the normalized interaction matrix $\tilde{\mathbf{R}}$ by applying the Hadamard power, effectively regularizing the differences among elements in $\tilde{\mathbf{R}}$ . Notably, unlike prior approaches [31, 15, 32] that apply the Hadamard power to the item similarity matrix $\tilde{\mathbf{P}}$ , our method operates directly on $\tilde{\mathbf{R}}$ to avoid the need for explicit storage of $\tilde{\mathbf{P}}$ :

\bar{\mathbf{R}}\;=\;\tilde{\mathbf{R}}^{\circ s},

(9)

where $s$ is a hyperparameter for adjusting $\tilde{\mathbf{R}}$ and $\bar{\mathbf{R}}$ is the adjusted interaction matrix. Thus, the item similarity matrix is defined as $\tilde{\mathbf{P}}=\bar{\mathbf{R}}^{\top}\bar{\mathbf{R}}$ accordingly, but it is never explicitly stored. Next, let us turn to describing the Lanczos algorithm, which is central to Mem-GF. We construct a Krylov subspace $\mathcal{K}_{K}(\tilde{\mathbf{P}},\mathbf{r}_{u})$ with $K$ bases, where $\mathbf{r}_{u}$ represents user $u$ ’s interaction vector.²²2Note that Mem-GF is flexible in using the $u$ -th row of $\bar{\mathbf{R}}$ , denoted by $\bar{\mathbf{r}}_{u}$ , as graph signals. This process generates a user-specific orthonormal Krylov basis $\mathbf{Q}_{u}\in\mathbb{R}^{|\mathcal{I}|\times K}$ and a low-dimensional tridiagonal matrix $\mathbf{T}_{u}\in\mathbb{R}^{K\times K}$ . The matrix $\mathbf{T}_{u}$ represents the projection of $\tilde{\mathbf{P}}$ onto the Krylov subspace:

\mathbf{T}_{u}=\mathbf{Q}_{u}^{\top}\tilde{\mathbf{P}}\mathbf{Q}_{u}.

(10)

More precisely, finite-step Lanczos may involve a residual term outside the constructed subspace; however, the polynomial action on the starting vector $\mathbf{r}_{u}$ is exactly represented for degrees $N<K$ under exact arithmetic, as formalized in Theorem IV.1.

IV-B2 Core Idea of Mem-GF

Algorithm 1 Mem-GF

0: interaction matrix

\mathbf{R}\in\mathbb{R}^{|\mathcal{U}|\times|\mathcal{I}|}

, set of users

\mathcal{U}

, set of items

\mathcal{I}

\{\mathbf{s}_{u}\}_{u\in\mathcal{U}}

\tilde{\mathbf{R}}=\mathbf{D}^{\gamma}_{\mathcal{U}}\mathbf{R}\mathbf{D}^{1-\gamma}_{\mathcal{I}}

\bar{\mathbf{R}}=\tilde{\mathbf{R}}^{\circ s}

3: for each

u\in\mathcal{U}

4: /* Lanczos iteration */

\beta_{0}\leftarrow 0,\ \mathbf{q}_{u,0}\leftarrow\mathbf{0},\ \mathbf{q}_{u,1}\leftarrow\mathbf{r}_{u}/\|\mathbf{r}_{u}\|_{2}

6: for

j\leftarrow 1

K

\mathbf{w}_{u}\leftarrow\bar{\mathbf{R}}^{\top}(\bar{\mathbf{R}}\,\mathbf{q}_{u,j})-\beta_{j-1}\,\mathbf{q}_{u,j-1}

\alpha_{j}\leftarrow\mathbf{q}_{u,j}^{\top}\mathbf{w}_{u}

\mathbf{w}_{u}\leftarrow\mathbf{w}_{u}-\alpha_{j}\,\mathbf{q}_{u,j}

10:

\beta_{j}\leftarrow\|\mathbf{w}_{u}\|_{2}

11: if

\beta_{j}\neq 0

then

12:

\mathbf{q}_{u,j+1}\leftarrow\mathbf{w}_{u}/\beta_{j}

13: end if

14: end for

15: Form

\mathbf{Q}_{u}

with

\mathbf{q}_{u,1},\ldots,\mathbf{q}_{u,K}

and

\mathbf{T}_{u}

with

\{\alpha_{j},\beta_{j}\}

16:

f(\mathbf{T}_{u})\leftarrow\sum_{n=0}^{N}a_{n}\,(\mathbf{T}_{u})^{n}

17:

\mathbf{s}_{u}\leftarrow\|\mathbf{r}_{u}\|_{2}\;\mathbf{Q}_{u}\;f(\mathbf{T}_{u})\;\mathbf{e}_{1}

18: end for

19: return

\{\mathbf{s}_{u}\}_{u\in\mathcal{U}}

The Lanczos algorithm [19] proceeds as follows:

•

Initialization: The algorithm begins with the normalized vector $\mathbf{q}_{u,1}=\mathbf{r}_{u}/\|\mathbf{r}_{u}\|_{2}$ , where $\mathbf{r}_{u}$ is the interaction vector for user $u$ . Additional parameters, including $\beta_{0}=0$ and $\mathbf{q}_{u,0}=\mathbf{0}$ , required for iterations are also initialized.

•

Iteration: At each iteration $j$ , the algorithm computes the projection of $\tilde{\mathbf{P}}$ onto the current basis:

	$\displaystyle\mathbf{w}_{u}$	$\displaystyle=\tilde{\mathbf{P}}\mathbf{q}_{u,j}-\beta_{j-1}\mathbf{q}_{u,j-1},$
	$\displaystyle\alpha_{j}$	$\displaystyle=\mathbf{q}_{u,j}^{\top}\mathbf{w}_{u}.$

The residual $\mathbf{w}_{u}$ is then orthogonalized and normalized:

	$\displaystyle\mathbf{w}_{u}$	$\displaystyle\leftarrow\mathbf{w}_{u}-\alpha_{j}\mathbf{q}_{u,j},$
	$\displaystyle\beta_{j}$	$\displaystyle=\\|\mathbf{w}_{u}\\|_{2},$
	$\displaystyle\mathbf{q}_{u,j+1}$	$\displaystyle=\mathbf{w}_{u}/\beta_{j}\quad\text{(if }\beta_{j}\neq 0\text{)}.$

•

Termination: The iterations continue until $j=K$ or $\beta_{j}=0$ , while constructing the tridiagonal matrix $\mathbf{T}_{u}$ using the coefficients $\{\alpha_{j}\}_{j=1}^{K}$ and $\{\beta_{j}\}_{j=1}^{K-1}$ :

\mathbf{T}_{u}=\begin{bmatrix}\alpha_{1}&\beta_{1}&0&\cdots&0\\ \beta_{1}&\alpha_{2}&\beta_{2}&\ddots&\vdots\\ 0&\beta_{2}&\alpha_{3}&\ddots&0\\ \vdots&\ddots&\ddots&\ddots&\beta_{K-1}\\ 0&\cdots&0&\beta_{K-1}&\alpha_{K}\\ \end{bmatrix},

(11)

where $\mathbf{q}_{u,j}\in\mathbb{R}^{|\mathcal{I}|}$ is the $j$ -th vector in the Lanczos basis for user $u$ ; $\alpha_{j}$ represents the local projection coefficient of $\tilde{\mathbf{P}}\,\mathbf{q}_{u,j}$ onto $\mathbf{q}_{u,j}$ ; and $\beta_{j}$ is the norm of the resulting residual, used to orthogonalize $\mathbf{q}_{u,j+1}$ w.r.t. the previous basis vectors. It is worth noting that, instead of directly storing $\tilde{\mathbf{P}}$ , Mem-GF calculates the matrix-vector product $\tilde{\mathbf{P}}\mathbf{Q}_{u}$ using only the adjusted interaction matrix $\bar{\mathbf{R}}$ as $\tilde{\mathbf{P}}\mathbf{Q}_{u}=\bar{\mathbf{R}}^{\top}(\bar{\mathbf{R}}\mathbf{Q}_{u})$ . Since $\bar{\mathbf{R}}\in\mathbb{R}^{|\mathcal{U}|\times|\mathcal{I}|}$ is much smaller and sparser than the full item similarity matrix $\tilde{\mathbf{P}}=\bar{\mathbf{R}}^{\top}\bar{\mathbf{R}}$ , this reduces the memory requirement from storing a full-size item–item matrix to storing only $\bar{\mathbf{R}}$ and the compact Krylov quantities, making Mem-GF viable for large-scale datasets (refer to Figs. 2 and 3).

To see how Mem-GF computes polynomial graph filters, note that $\mathbf{Q}_{u}$ forms a user-specific Krylov subspace, while $\mathbf{T}_{u}$ is the projected representation of $\tilde{\mathbf{P}}$ in that subspace. By the Lanczos initialization, we have $\mathbf{Q}_{u}^{\top}\mathbf{r}_{u}=\|\mathbf{r}_{u}\|_{2}\mathbf{e}_{1}$ . Therefore, for a polynomial graph filter $f(\tilde{\mathbf{P}})=\sum_{n=0}^{N}a_{n}\tilde{\mathbf{P}}^{n}$ with $N<K$ , the Krylov representation gives

f(\tilde{\mathbf{P}})\mathbf{r}_{u}=\|\mathbf{r}_{u}\|_{2}\,\mathbf{Q}_{u}\left(\sum_{n=0}^{N}a_{n}\mathbf{T}_{u}^{n}\right)\mathbf{e}_{1},

(12)

as formally shown in Theorem IV.1. This relation enables polynomial graph filtering without explicitly forming $\tilde{\mathbf{P}}$ .

Thus, our Mem-GF method is finally formulated as follows:

\mathbf{s}_{u}=\|\mathbf{r}_{u}\|_{2}\mathbf{Q}_{u}f(\mathbf{T}_{u})\mathbf{e}_{1},

(13)

where $f(\mathbf{T}_{u})$ represents the user-specific polynomial graph filter in the Krylov subspace and is expressed as

f(\mathbf{T}_{u})=\sum_{n=0}^{N}a_{n}\mathbf{T}_{u}^{n}.

(14)

This formulation ensures that both the graph filter and the signal are user-specific, enabling personalized and efficient recommendation by allocating memory in a batch-based manner for each user’s operations. By reducing the polynomial graph filter computation into smaller, user-specific batches, our method is capable of further optimizing memory usage, ensuring scalability even with limited hardware resources. We next highlight additional advantages offered by the proposed Mem-GF method.

Remark.

First, the user-specific matrices $\mathbf{Q}_{u}$ and $\mathbf{T}_{u}$ can be pre-computed and cached, enabling near-instantaneous inference through lightweight matrix multiplications. This design facilitates deployment in real-world systems with stringent latency requirements. Second, once the Krylov basis $(\mathbf{Q}_{u},\,\mathbf{T}_{u})$ is constructed through $K$ Lanczos iterations, an arbitrary polynomial graph filter of order $K^{\prime}<K$ can be calculated by utilizing only the first $K^{\prime}$ Krylov basis vectors, without further access to the interaction matrix $\bar{\mathbf{R}}$ . Consequently, validation-based filter selection becomes much cheaper than recomputing each filter independently. Such capability is especially valuable in personalized recommendation scenarios, as optimal polynomial filters can vary substantially across individual users [27, 32].

Next, we are interested in analyzing how many Krylov subspace bases are sufficient to guarantee no approximation error. To address this, we provide a theoretical analysis by establishing the following theorem.

Theorem IV.1 (Zero-Error Guarantees in Krylov Subspaces).

Let $\tilde{\mathbf{P}}\in\mathbb{R}^{|\mathcal{I}|\times|\mathcal{I}|}$ be a symmetric matrix (e.g., an item similarity matrix), and let $\mathbf{r}_{u}\in\mathbb{R}^{|\mathcal{I}|}$ be the signal for user $u$ . Run $K$ iterations of the Lanczos algorithm on the pair $(\tilde{\mathbf{P}},\mathbf{r}_{u})$ , yielding an orthonormal Krylov basis $\mathbf{Q}_{u}\in\mathbb{R}^{|\mathcal{I}|\times K}$ and a tridiagonal projected matrix $\mathbf{T}_{u}=\mathbf{Q}_{u}^{\top}\tilde{\mathbf{P}}\mathbf{Q}_{u}\in\mathbb{R}^{K\times K}$ . Consider a polynomial graph filter $f(\tilde{\mathbf{P}})\;=\;\sum_{n=0}^{N}a_{n}\,\tilde{\mathbf{P}}^{n}$ of degree $N.$ If $N<K$ , then the Lanczos algorithm in Mem-GF incurs no approximation error under exact arithmetic.

Proof.

The Lanczos procedure constructs $\mathbf{Q}_{u}=[\mathbf{q}_{u,1},\ldots,\mathbf{q}_{u,K}]$ with $\mathbf{q}_{u,1}=\mathbf{r}_{u}/\|\mathbf{r}_{u}\|_{2}$ and the projected matrix $\mathbf{T}_{u}=\mathbf{Q}_{u}^{\top}\tilde{\mathbf{P}}\mathbf{Q}_{u}$ . By the Krylov construction, for every $0\leq n\leq N<K$ , the vector $\tilde{\mathbf{P}}^{n}\mathbf{r}_{u}$ belongs to $\mathcal{K}_{K}(\tilde{\mathbf{P}},\mathbf{r}_{u})$ and can be represented in the basis $\mathbf{Q}_{u}$ as

\tilde{\mathbf{P}}^{n}\mathbf{r}_{u}=\|\mathbf{r}_{u}\|_{2}\mathbf{Q}_{u}\mathbf{T}_{u}^{n}\mathbf{e}_{1}.

Therefore,

	$\displaystyle f(\tilde{\mathbf{P}})\mathbf{r}_{u}$	$\displaystyle=\sum_{n=0}^{N}a_{n}\tilde{\mathbf{P}}^{n}\mathbf{r}_{u}$
		$\displaystyle=\\|\mathbf{r}_{u}\\|_{2}\mathbf{Q}_{u}\left(\sum_{n=0}^{N}a_{n}\mathbf{T}_{u}^{n}\right)\mathbf{e}_{1}=\\|\mathbf{r}_{u}\\|_{2}\mathbf{Q}_{u}f(\mathbf{T}_{u})\mathbf{e}_{1}.$

Thus, for all $N<K$ , the equality holds with no approximation error. This completes the proof of Theorem IV.1. ∎

Theorem IV.1 implies that, whenever the polynomial degree $N$ is less than the Krylov subspace size $K$ , Mem-GF exactly reproduces $f(\tilde{\mathbf{P}})\mathbf{r}_{u}$ in the original space under exact arithmetic. Thus, setting $K=N+1$ is sufficient to avoid approximation error in the algebraic Krylov representation. In practice, finite-precision computation or choosing $N\geq K$ may affect this exactness; therefore, we tune $K$ and $N$ on the validation set. Separately, severe sparsity or noise in user signals may degrade empirical recommendation quality rather than invalidate the algebraic guarantee itself. We provide experimental verification of this theoretical claim in Section V-F.

IV-B3 High-Order Polynomial Graph Filter Designs

Conventional polynomial GF-based CF methods [31, 15, 32] restrict polynomial filters up to third-order terms to bypass the problem of memory constraints. However, explicitly constructing and storing $\tilde{\mathbf{P}}$ or computing higher-order powers (e.g., $\tilde{\mathbf{P}}^{3}$ ) still requires memory scaling as $\mathcal{O}(|\mathcal{I}|^{2})$ , which is infeasible for large sets of items. In contrast, Mem-GF overcomes the memory bottleneck issue using the user-specific tridiagonal matrix $\mathbf{T}_{u}$ in (10). Since $K\ll|\mathcal{I}|$ for $\mathbf{T}_{u}\in\mathbb{R}^{K\times K}$ , it becomes computationally trivial to raise $\mathbf{T}_{u}$ to higher powers, thereby allowing the order of polynomial filters to be significantly higher.

This flexibility allows Mem-GF to approximate a broad spectrum of target frequency responses. Specifically, we consider employing graph filters having more complex frequency responses. This leads us to propose a new polynomial graph filter that approximates the Gaussian filter [13, 21] that is widely used in image and graph domains, whose frequency response is expressed as $h(\lambda)=e^{-\tau\lambda^{2}}$ , where $\tau$ is the damping factor in the Gaussian filter.³³3In addition to the Gaussian filter, any sophisticated filters can be designed by appropriately adjusting the polynomial coefficients. Due to the fact that the polynomial graph filter $\sum_{n=1}^{N}{a_{n}}\tilde{\mathbf{P}}^{n}$ has a frequency response function of $\sum_{n=1}^{N}{a_{n}}(1-\lambda)^{n}$ [31, 43, 36], we are ready to numerically solve a non-linear least squares (LS) problem to find the polynomial coefficients $\{a_{n}\}_{n=1}^{N}$ in (14) within the maximum polynomial order $N$ , as in [31]. As illustrated in Fig. 4, performing higher-order polynomial approximations (e.g., $N=5$ ) significantly reduces the error compared to lower-order polynomials (e.g., $N=1,2$ ). This demonstrates the clear advantage of high-order polynomials in closely approximating the original function $h(\lambda)=e^{-\tau\lambda^{2}}$ . Therefore, by setting $N=5$ and $\tau=3$ ,⁴⁴4One can arbitrarily set the hyperparameter $\tau$ . Here, increasing $\tau$ results in the further utilization of the low-pass frequency components, and vice versa. we are capable of discovering a new predefined polynomial graph filter having a frequency response function as $h(\lambda)=e^{-3\lambda^{2}}$ as follows:

f(\mathbf{T}_{u})=8\mathbf{T}_{u}+3.1\mathbf{T}_{u}^{2}-11.2\mathbf{T}_{u}^{3}+6.2\mathbf{T}_{u}^{4}-1.1\mathbf{T}_{u}^{5}.

(15)

Finally, in our study, we utilize three predefined LPFs used in [31, 15] as well as our newly devised fifth-order polynomial filter, which are listed as follows:

•

Mem-GF-1 ( $f(\mathbf{T}_{u})=\mathbf{T}_{u}$ ): The linear LPF with the first polynomial order, whose frequency response is $h(\lambda)=1-\lambda$ .
•

Mem-GF-2 ( $f(\mathbf{T}_{u})=2\mathbf{T}_{u}-\mathbf{T}_{u}^{2}$ ): The second-order LPF, whose frequency response is $h(\lambda)=1-\lambda^{2}$
•

Mem-GF-3 ( $f(\mathbf{T}_{u})=\mathbf{T}_{u}+0.01({-\mathbf{T}_{u}^{3}+10\mathbf{T}_{u}^{2}-29\mathbf{T}_{u}})$ ): The third-order LPF which approximates ideal LPF (i.e., $h(\lambda)=\mathbf{1}_{\lambda\leq p}$ with cutoff frequency $p$ ) [31].
•

Mem-GF-5 ( $f(\mathbf{T}_{u})=8\mathbf{T}_{u}+3.1\mathbf{T}_{u}^{2}-11.2\mathbf{T}_{u}^{3}+6.2\mathbf{T}_{u}^{4}-1.1\mathbf{T}_{u}^{5}$ ): The fifth-order LPF, whose frequency response approximates the Gaussian filter $h(\lambda)=e^{-3\lambda^{2}}$ .

The optimal filter and hyperparameters of Mem-GF can be found via hyperparameter tuning on the validation set. Although the aforementioned four graph filters are utilized in our experiments, higher-order graph filters can be utilized and any arbitrary functions can be designed via the same procedures.

IV-B4 Complexity Analysis

We first theoretically analyze the space (i.e., memory) complexity of Mem-GF during preprocessing stage. A central advantage of Mem-GF is its ability to avoid explicitly forming the item similarity matrix $\tilde{\mathbf{P}}=\bar{\mathbf{R}}^{\top}\bar{\mathbf{R}}\in\mathbb{R}^{|\mathcal{I}|\times|\mathcal{I}|}$ . Storing $\tilde{\mathbf{P}}$ would naïvely demand memory of $\mathcal{O}(|\mathcal{I}|^{2})$ , which can be reduced to $\mathrm{nnz}(\tilde{\mathbf{P}})$ in sparse format. However, $\tilde{\mathbf{P}}$ in (6) is often far denser than $\tilde{\mathbf{R}}$ , as each element contains two-hop item-neighbor information. Instead, Mem-GF stores only $\bar{\mathbf{R}}$ , $\mathbf{Q}_{u}$ , and $\mathbf{T}_{u}$ (see (9) and (10)). First, the adjusted interaction matrix in a sparse format has the dimension of $\bar{\mathbf{R}}\in\mathbb{R}^{|\mathcal{U}|\times|\mathcal{I}|}$ . Thus, its space requirement is $\mathcal{O}\bigl(\mathrm{nnz}(\mathbf{R})\bigr)$ . Since $\bar{\mathbf{R}}$ is obtained by a Hadamard power of $\tilde{\mathbf{R}}$ , it shares the same sparsity pattern as $\mathbf{R}$ , implying that $\mathrm{nnz}(\bar{\mathbf{R}})=\mathrm{nnz}(\mathbf{R})$ . Next, we need to store temporary user-specific Lanczos vectors $\mathbf{Q}_{u}\in\mathbb{R}^{|\mathcal{I}|\times K}$ and the tridiagonal matrix $\mathbf{T}_{u}\in\mathbb{R}^{K\times K}$ , which have the dimension linear in $|\mathcal{I}|K$ and $K^{2}$ , respectively. Importantly, from the fact that they are constructed per user (or per batch), the algorithm can overwrite or free this memory once each user’s final predictions are computed. Hence, the overall memory footprint is finally given by

\mathcal{O}\bigl(\mathrm{nnz}(\mathbf{R})+|\mathcal{I}|K+K^{2}),

(16)

rather than $\mathcal{O}(|\mathcal{I}|^{2})$ or $\mathcal{O}(\mathrm{nnz}(\tilde{\mathbf{P}}))$ . This design substantially alleviates the memory bottleneck, making it feasible to handle large-scale datasets on standard hardware resources.

Next, we delve into theoretically analyzing the time (i.e., computational) complexity of Mem-GF during preprocessing. The computational cost of Mem-GF can be broken down into two core stages: (i) constructing the Krylov subspace for each user via the Lanczos algorithm and (ii) performing polynomial GF in the reduced space.

1.

Krylov subspace construction (Lanczos algorithm). For user $u$ , let $\mathbf{r}_{u}\in\mathbb{R}^{|\mathcal{I}|}$ be the interaction vector. We conduct $K$ iterations of the Lanczos algorithm to build the matrix $\mathbf{Q}_{u}\in\mathbb{R}^{|\mathcal{I}|\times K}$ and the tridiagonal matrix $\mathbf{T}_{u}\in\mathbb{R}^{K\times K}$ . Each iteration requires a matrix-vector multiplication with $\tilde{\mathbf{P}}=\bar{\mathbf{R}}^{\top}\bar{\mathbf{R}}$ . Rather than forming $\tilde{\mathbf{P}}$ explicitly, we compute $\tilde{\mathbf{P}}\,\mathbf{q}_{u,j}\;=\;\bar{\mathbf{R}}^{\top}\!\bigl(\bar{\mathbf{R}}\,\mathbf{q}_{u,j}\bigr)$ . Each multiplication with $\bar{\mathbf{R}}$ or $\bar{\mathbf{R}}^{\top}$ can be executed in $\mathcal{O}\!\bigl(\mathrm{nnz}(\bar{\mathbf{R}})\bigr)$ time by exploiting sparsity. Consequently, one Lanczos iteration costs $\mathcal{O}\!\bigl(\mathrm{nnz}(\bar{\mathbf{R}})\bigr)$ . Over $K$ steps, the cost per user is $\mathcal{O}(K\,\mathrm{nnz}(\bar{\mathbf{R}}))$ , which results in $\mathcal{O}(|\mathcal{U}|\,K\,\mathrm{nnz}(\bar{\mathbf{R}}))$ for $|\mathcal{U}|$ total users.
2.

Polynomial GF and score reconstruction. Given $\mathbf{Q}_{u}$ and $\mathbf{T}_{u}$ , we next compute a polynomial graph filter $f(\mathbf{T}_{u})=\sum_{n=0}^{N}a_{n}\,\mathbf{T}_{u}^{n}$ in the reduced space. Specifically, the prediction score $\mathbf{s}_{u}$ is $\|\mathbf{r}_{u}\|_{2}\,\mathbf{Q}_{u}\,f(\mathbf{T}_{u})\,\mathbf{e}_{1}\in\mathbb{R}^{|\mathcal{I}|}$ . Because $\mathbf{T}_{u}$ is of size $K\times K$ , powering $\mathbf{T}_{u}$ up to order $N$ typically takes $\mathcal{O}(K^{2})$ up to $\mathcal{O}(K^{3})$ depending on implementation [41]. In practice, since $K$ is small (e.g., $K=N+1$ due to Theorem IV.1), this step is minor. Finally, multiplying $\mathbf{Q}_{u}f(\mathbf{T}_{u})\mathbf{e}_{1}$ back into the original dimension costs $\mathcal{O}(|\mathcal{I}|K)$ per user. Aggregated across all users, the reconstruction needs $\mathcal{O}\bigl(|\mathcal{U}|\;|\mathcal{I}|\;K\bigr)$ .

Therefore, the total time complexity of Mem-GF is given by

\mathcal{O}\Bigl(|\mathcal{U}|\,K\,\bigl(\mathrm{nnz}(\mathbf{R})+|\mathcal{I}|\bigr)\Bigr).

(17)

V Experimental Evaluations

In this section, we systematically conduct extensive experiments to address the key research questions (RQs) outlined below:

•

RQ1: How memory-efficient is Mem-GF during preprocessing compared to benchmark GF-based recommendation methods?
•

RQ2: How efficient is Mem-GF during inference in terms of runtime and memory consumption compared to benchmark GF-based recommendation methods?
•

RQ3: How does the memory on Mem-GF scale w.r.t. the number of users, items, and interactions?
•

RQ4: How much does Mem-GF improve recommendation accuracy over benchmark recommendation methods?
•

RQ5: How does the number of Krylov subspace bases affect the performance of Mem-GF?
•

RQ6: How do types of polynomial filters affect the performance of Mem-GF?
•

RQ7: How sensitive is Mem-GF to variations in key hyperparameters?

V-A Experimental Settings

TABLE II: The statistics of three datasets.

Dataset	# of users	# of items	# of interactions	Density
Yelp	31,668	38,048	1,561,406	0.130%
Amazon-book	52,643	91,599	2,984,108	0.062%
ML-20M	138,493	26,744	20,000,263	0.540%

Datasets. We carry out experiments on the three benchmark datasets widely adopted for evaluating the performance of recommender systems for CF, which include Yelp, Amazon-book, and MovieLens-20M (ML-20M) [48, 63, 65, 53, 52, 10, 11, 9]. Notably, ML-20M is a large-scale, real-world dataset, consisting of 20 million user interactions. The statistics of the three datasets are summarized in Table II.

Competitors. We compare Mem-GF against twenty-one benchmark CF methods, which can be grouped into two categories. The first group comprises canonical benchmark CF methods built on commonly used architectures, including MF-based methods (MF-BPR [39] and NeuMF [11]), autoencoder-based methods (Multi-DAE, Multi-VAE [23], RecVAE [44], and SVD-AE [12]), GCN-based methods (NGCF [52], LightGCN [10], DGCF [53], SGL [55], SimpleX [28], and SGDE [35]), generative model-based methods (CFGAN [2], DiffRec, L-DiffRec [51], and FlowCF [26]), and link propagation-based method (LinkProp and LinkProp-Multi [4]). The second group consists of recent GF-based CF methods, including GF-CF [43], PGSP [27], FPSR [54], Turbo-CF [31], and HiGSP [59]. Certain recent GF-based approaches, including PolyCF [36] and LanczosNet [24], were excluded from our empirical comparison, as the source code of PolyCF is not publicly available and LanczosNet was not specifically designed for CF tasks with recommendation datasets involving tens of thousands of nodes.

Evaluation protocols. For the Yelp and Amazon-book datasets, we follow the same data splits and performance as those in prior studies [10, 52, 31, 27], directly using each paper’s reported best performance for consistency. For ML-20M, we preprocess the dataset by splitting interactions into a ratio of 80:20 for training and testing, and then further reserve 10% of the training set as a validation set under a 10-core setting (i.e., filter out users with fewer than ten interactions). To assess the performance of the top- $K$ recommendations, we use widely adopted benchmark metrics from the literature [48, 63, 65, 53, 52, 10, 11, 9], including recall and normalized discounted cumulative gain (NDCG), with $K$ set to 20 by default. For each metric, we report the average results over 10 independent runs, except for deterministic methods [46, 43, 27, 31, 12, 59] (including Mem-GF). Unless otherwise specified, all runtime reports for GF-based methods refer to their preprocessing time.

TABLE III: Comparison of memory (in GB) and time (in seconds) consumption during preprocessing among seven training-free GF-based recommendation methods across three datasets. RAM and VRAM values represent the peak memory usage recorded during execution. Entries marked as OOM denote out-of-memory errors, and entries marked as “–” denote unavailable GPU measurements caused by GPU buffer OOM or CPU-only fallback. The best and second-best performers are marked in bold and underlined, respectively. The performance of Turbo-CF and Mem-GF is evaluated according to the maximum order of the polynomial graph filters used.

Dataset	Method	EASE	GF-CF	PGSP	HiGSP	FPSR	Turbo-CF			Mem-GF
Dataset	Method	EASE	GF-CF	PGSP	HiGSP	FPSR	1	2	3	1	2	3	5
Yelp	VRAM (A)	31.2	10.9	–	–	9.3	9.2	25.7	25.7	1.9	2.4	2.7	3.3
	RAM (B)	0.8	2.1	74.1	OOM	0.7	0.7	0.7	0.7	0.7	0.7	0.7	0.7
	Total memory (A)+(B)	32.0	13.0	74.1	OOM	10.0	9.9	26.4	26.4	2.6	3.1	3.4	4.0
	Time	15.68	7.0	1398.0	–	58.7	4.2	140.2	420.7	1.5	2.9	4.3	6.5
	GPU OOM			✓	✓
Amazon-book	VRAM (A)	–	34.5	–	–	32.4	45	–	–	4.4	5.2	5.9	7.3
	RAM (B)	OOM	0.7	OOM	OOM	2.0	0.7	OOM	OOM	0.7	0.7	0.7	0.7
	Total memory (A)+(B)	OOM	35.2	OOM	OOM	34.4	45.7	OOM	OOM	5.1	5.9	6.6	8.0
	Time	–	35.7	–	–	157.1	35.5	–	–	6.6	11.4	16.4	26.4
	GPU OOM	✓	✓	✓	✓			✓	✓
ML-20M	VRAM (A)	–	19.7	–	–	11.5	–	–	–	4.6	4.8	4.8	5.6
	RAM (B)	12.6	1.3	OOM	OOM	4.5	6.6	9.6	12.8	1.2	1.2	1.2	1.2
	Total memory (A)+(B)	12.6	21.0	OOM	OOM	15.9	6.6	9.6	12.8	5.8	6.0	6.0	6.8
	Time	223.0	59.8	–	–	121.1	187.6	2735.5	7432.8	102.2	156.6	209.1	314.1
	GPU OOM	✓		✓	✓		✓	✓	✓

Implementation details. We adopt the optimal hyperparameters for competitors through extensive hyperparameter tuning on the validation set. For all experiments, we use default hyperparameters $\gamma$ , $s$ , and polynomial order $N$ for Mem-GF, unless otherwise stated: $(\gamma,s,N)$ = (0.55, 0.9, 5) for Yelp; (0.5, 0.9, 2) for Amazon-book; and (0.5, 1.1, 2) for ML-20M. We set $K=N+1$ for all datasets. All training-based methods are optimized using the Adam optimizer [17] with a batch size of 1024. Our experiments are conducted on a single server equipped with Intel (R) 12-Core (TM) i7-9700K CPUs @ 3.60 GHz, NVIDIA GeForce RTX A6000 GPUs with 48GB VRAM, and 128GB of RAM.

All GF-based CF algorithms [43, 31, 46, 27, 59] are implemented using PyTorch tensors [33] to ensure a consistent experimental environment. Whenever a method exceeds the size of GPU VRAM, we revert to CPU-only computation leveraging system RAM to complete the process without disrupting the pipeline. For FPSR [54], we use $\tau=0.3$ as a default hyperparameter to generate 6 partitions. For Turbo-CF, we identically use the three predefined filters presented in [31, 15]. Since the filtering matrix is constructed as $\bar{\mathbf{R}}^{\top}\bar{\mathbf{R}}$ , using $\bar{\mathbf{r}}_{u}$ also aligns the user signal with the same normalization and adjustment applied to the item similarity matrix. This alignment is particularly helpful for the ML-20M dataset, where interactions are highly concentrated on popular items, because it mitigates popularity-dominated signal amplification.

To evaluate the scalability of our Mem-GF in Section V-D, we generated sets of synthetic interaction matrices with varying dimensions and sparsity levels. In particular, we employ the random matrix generation function provided by the scipy.random module in the SciPy package [49] to create these matrices. By default, the number of users, items, and nonzero entries (i.e., the number of interactions) was set to 5,000, 5,000, and 225,000, respectively. In order to assess scalability across a wide range of problem sizes, the experiments are carried out according to the following setup:

•

Varying the number of users/items: The number of users and items is systematically increased to each of the following values: 1,000, 5,000, 10,000, 15,000, 20,000, 30,000, 40,000, 50,000, 80,000, and 100,000.
•

Varying the number of interactions: The number of nonzero entries (i.e., $\mathrm{nnz}(R)$ ) is scaled to 225,000, 1,125,000, 2,250,000, 4,500,000, 6,750,000, 9,000,000, and 11,250,000.

For each configuration, we record the corresponding memory usage and runtime of our Mem-GF. These systematic variations allow us to quantify how the performance of our method scales w.r.t. the number of users, items, and interactions.

V-B Preprocessing Efficiency Analysis (RQ1)

Table III summarizes the comparison of Mem-GF with existing GF-based recommendation methods in terms of memory usage and runtime on the three benchmark datasets during preprocessing (i.e., offline computation). Table IV shows the runtime comparison among Mem-GF and three representative CF methods that require model training (NGCF, LightGCN, and DiffRec). Our key observations are as follows:

(i)

Leveraging the low-dimensional Krylov subspace, Mem-GF significantly reduces both memory consumption and computation time compared to other GF-based CF competitors.
(ii)

On Amazon-book, where the set of items is relatively large, most GF-based CF competitors encounter OOM errors on the single GPU device. In contrast, Mem-GF efficiently operates within around 5GB of VRAM under the first-order polynomial filter, demonstrating up to a 574% memory advantage and a 438% runtime speedup compared to the second-best performers. Even with higher-order polynomial filters, Mem-GF maintains a VRAM usage of around 8GB.
(iii)

On ML-20M, the extremely large number of interactions (20M) triggers buffer OOM issues for many competing methods when performing sparse matrix multiplications on the GPU, forcing them to rely solely on CPU computations and resulting in prolonged runtime. On the contrary, Mem-GF seamlessly handles high-order filters without such memory bottlenecks.
(iv)

Table IV demonstrates the benefits of Mem-GF over representative CF methods that require model training in terms of runtime. Because Mem-GF is entirely training-free, it circumvents the costly iterative parameter optimization loops. Notably, on the datasets having a large amount of interactions, such as ML-20M, Mem-GF significantly outperforms two-tower models based on pairwise optimization (e.g., NGCF and LightGCN), highlighting its practical value in scenarios where both accuracy and speed are critical.

TABLE IV: Runtime (in seconds) of Mem-GF and representative competitors that require model training. Here, runtime represents training time for NGCF, LightGCN, and DiffRec, while it refers to processing time for Mem-GF.

	NGCF	LightGCN	DiffRec	Mem-GF
Yelp	$1.5\times 10^{4}$	$1.1\times 10^{4}$	$1.4\times 10^{4}$	$\bf 1.5$
Amazon-book	$7.1\times 10^{4}$	$4.6\times 10^{4}$	$3.7\times 10^{4}$	$\bf 6.6$
ML-20M	$4.6\times 10^{5}$	$3.1\times 10^{5}$	$1.1\times 10^{4}$	$\bf 102.2$
Model training	✓	✓	✓	✗

TABLE V: Inference time and peak GPU memory usage for representative GF-based methods and Mem-GF with different polynomial orders on the Yelp dataset.

Method	Order	Time (s)	VRAM (GB)
GF-CF	-	0.1197	28.79
Turbo-CF	1	0.1348	6.60
Mem-GF-1	1	0.0044	0.57
Mem-GF-2	2	0.0077	0.57
Mem-GF-3	3	0.0123	0.57
Mem-GF-5	5	0.0145	0.57

V-C Inference Efficiency Analysis (RQ2)

Table V reports the single-batch inference cost (i.e., the convolution with graph signals) on the Yelp dataset, as other datasets exhibit similar trends. The comparison assumes that the graph filter ( $H(\mathbf{L})$ in (3)) has already been precomputed and focuses only on runtime and peak GPU memory usage during inference.

(i)

By confining the graph convolution entirely within a low-dimensional Krylov subspace, Mem-GF completes inference in $0.0044$ seconds, yielding up to a 26.2 $\times$ speedup over GF-CF and Turbo-CF, both of which require convolution with a full $|\mathcal{I}|\times|\mathcal{I}|$ graph filter. Even with the fifth-order filter, Mem-GF maintains low latency, supporting real-time recommendation systems.
(ii)

Since Mem-GF avoids explicit construction and storage of the full-size $|\mathcal{I}|\times|\mathcal{I}|$ matrix, it achieves over a 10.5 $\times$ reduction in peak VRAM usage compared to Turbo-CF. The memory footprint remains nearly unaffected by the polynomial order because the compact Krylov basis dominates the filtering cost, whereas GF-CF requires dense filter storage for ideal LPF computations.

V-D Scalability Analysis (RQ3)

To empirically validate our theoretical complexity analysis in Section IV-B4, we conduct experiments on synthetic datasets (i.e., interaction matrices) with varying scales of users, items, and interaction densities. Specifically, in Fig. 5, we present how Mem-GF’s memory usage and runtime for preprocessing scale as $|\mathcal{U}|$ , $|\mathcal{I}|$ , and $\mathrm{nnz}(R)$ grow. By default, the number of users, items, and interactions was set to 5,000, 5,000, and 225,000, respectively. Furthermore, in Fig. 6, we compare Mem-GF with two representative GF-based CF competitors (GF-CF and Turbo-CF) when $|\mathcal{I}|$ increases. For a fair comparison with GF-CF, both Turbo-CF and Mem-GF employ the first-order polynomial graph filter. We highlight the following observations:

(i)

As shown in the left side of Fig. 5, Mem-GF exhibits memory usage growing nearly linearly with $|\mathcal{U}|$ , $|\mathcal{I}|$ , and $\mathrm{nnz}(R)$ . Notably, the memory footprint is significantly reduced compared to the case of forming the item similarity matrix of size $|\mathcal{I}|\times|\mathcal{I}|$ , allowing Mem-GF to make full use of GPU acceleration even on large datasets. We note that the size of RAM does not tend to scale with the increasing data dimension as long as VRAM is sufficiently available.
(ii)

As shown in the right side of Fig. 5, the runtime of Mem-GF also scales linearly as each data dimension grows. This validates the scalability of Mem-GF. Moreover, since Mem-GF efficiently operates within the GPU VRAM, its runtime only takes less than 11 seconds even when $\mathrm{nnz}(R)$ increases up to $10^{7}$ via the exploitation of the GPU resource.
(iii)

In Fig. 6, we observe that Mem-GF is substantially more scalable in both memory and runtime, compared to GF-CF and Turbo-CF. In particular, GF-CF incurs excessive memory overhead due to its dense ideal LPF matrix having dimensionality of $(|\mathcal{I}|,|\mathcal{I}|)$ , thus yielding superlinear complexity in practice.

These results also indicate how sparsity affects Mem-GF: increasing $\mathrm{nnz}(\mathbf{R})$ raises runtime through sparse matrix-vector products, but the growth remains close to linear because the full item similarity matrix is never formed.

TABLE VI: Performance comparison among Mem-GF and competitors. The best and second-best performers are highlighted in bold and underlined, respectively. The improvements of Mem-GF over the best competitors are all statistically significant with

p

-values

\leq 0.01

	Yelp		Amazon-book		ML-20M
Method	Recall	NDCG	Recall	NDCG	Recall	NDCG
MF-BPR	0.0433	0.0354	0.0250	0.0190	0.0219	0.1108
NeuMF	0.0451	0.0363	0.0258	0.0200	0.0288	0.1437
NGCF	0.0579	0.0477	0.0344	0.0263	0.0307	0.1409
LightGCN	0.0649	0.0530	0.0411	0.0315	0.0402	0.1702
DGCF	0.0654	0.0534	0.0422	0.0324	OOM	OOM
SGL	0.0675	0.0555	0.0478	0.0379	0.0388	0.1677
SGDE	0.0693	0.0563	0.0483	0.0391	0.0398	0.1683
Multi-DAE	0.0562	0.0434	0.0410	0.0322	0.0385	0.1666
Multi-VAE	0.0584	0.0450	0.0407	0.0315	0.0402	0.1702
RecVAE	0.0579	0.0432	0.0400	0.0302	0.0414	0.1732
SVD-AE	0.0681	0.0572	0.0602	0.0479	0.0536	0.2379
SimpleX	0.0701	0.0575	0.0583	0.0400	0.0304	0.1451
LinkProp	0.0676	0.0559	0.0684	0.0559	OOM	OOM
LinkProp-Multi	0.0690	0.0571	0.0721	0.0588	OOM	OOM
DiffRec	0.0656	0.0552	0.0514	0.0418	0.0549	0.2258
L-DiffRec	0.0614	0.0518	0.0502	0.0409	0.0470	0.1837
FlowCF	0.0686	0.0587	0.0497	0.0431	0.0531	0.2340
EASE	0.0657	0.0552	0.0710	0.0567	0.0522	0.2264
GF-CF	0.0697	0.0571	0.0710	0.0584	0.0487	0.2076
PGSP	0.0710	0.0583	0.0712	0.0587	OOM	OOM
FPSR	0.0703	0.0584	0.0715	0.0588	0.0493	0.2104
Turbo-CF	0.0693	0.0574	0.0730	0.0611	0.0534	0.2299
HiGSP	OOM	OOM	OOM	OOM	OOM	OOM
Mem-GF (Ours)	0.0705	0.0586	0.0753	0.0620	0.0566	0.2402

V-E Recommendation Accuracy (RQ4)

Table VI summarizes the recommendation accuracy of Mem-GF and various competitors across three distinct datasets. The main findings are as follows:

(i)

Despite its exceptional computational efficiency, Mem-GF achieves the highest accuracy for all datasets (except the recall on Yelp), underscoring its robustness across diverse data characteristics.
(ii)

In particular, the gains of Mem-GF over training-dependent models are indeed substantial across all datasets.
(iii)

Mem-GF consistently exhibits gains over Turbo-CF, another polynomial GF approach. On the Amazon-book dataset, Turbo-CF was restricted only to the first-order linear filter to avoid OOM issues (see Table III), whereas Mem-GF can freely use higher-order filters without incurring any resource scarcity issues on a single device. This design can improve accuracy empirically, not because memory reduction itself guarantees better predictions, but because expressive polynomial filters can be selected within the user-specific Krylov subspace without being constrained by the storage cost of full-size graph filters.

TABLE VII: Recall performance across different values of

K

and polynomial filters. Here, Mem-GF-k denotes an ablation model that omits the Krylov basis construction in Mem-GF. Cases where

K\leq N

are marked in gray.

Yelp
$K$	Mem-GF-1	Mem-GF-2	Mem-GF-3	Mem-GF-5
1	0.0035	0.0035	0.0035	0.0035
2	0.0684	0.0684	0.0684	0.0684
3	0.0684	0.0692	0.0683	0.0695
4	0.0684	0.0692	0.0683	0.0695
5	0.0684	0.0692	0.0683	0.0701
6	0.0684	0.0692	0.0683	0.0703
Mem-GF-k	0.0684	0.0692	0.0683	0.0703
Amazon-book
$K$	Mem-GF-1	Mem-GF-2	Mem-GF-3	Mem-GF-5
1	0.0011	0.0011	0.0011	0.0011
2	0.0710	0.0628	0.0710	0.0699
3	0.0710	0.0754	0.0707	0.0710
4	0.0710	0.0754	0.0707	0.0745
5	0.0710	0.0754	0.0707	0.0739
6	0.0710	0.0754	0.0707	0.0739
Mem-GF-k	0.0710	0.0754	0.0707	0.0739
ML-20M
$K$	Mem-GF-1	Mem-GF-2	Mem-GF-3	Mem-GF-5
1	0.0248	0.0248	0.0248	0.0248
2	0.0554	0.0554	0.0554	0.0554
3	0.0554	0.0566	0.0552	0.0566
4	0.0554	0.0566	0.0516	0.0565
5	0.0554	0.0566	0.0516	0.0564
6	0.0554	0.0566	0.0516	0.0564
Mem-GF-k	0.0554	0.0566	0.0516	0.0564

V-F Empirical Validation of Theorem IV.1 (RQ5)

Table VII summarizes our experiments across all datasets, where we varied the number of Krylov basis vectors ( $K$ ) for polynomial filters of different maximum polynomial orders ( $N$ ). Here, Mem-GF-k denotes an ablation model that omits the Krylov basis construction in Mem-GF. It is observed that, for an $N$ -th order polynomial filter, performance remains consistent with that of Mem-GF-k when $K\geq N+1$ . This finding confirms that setting $K=N+1$ produces no approximation error under the theorem condition. In contrast, the gray entries with $K\leq N$ illustrate regimes where the guarantee does not apply and performance can become unstable.

V-G Analysis of Polynomial Graph Filters (RQ6)

Table VIII shows that the optimal polynomial filter varies depending on datasets; for instance, the fifth-order filter (Mem-GF-5) achieves the highest recall on Yelp, whereas the second-order filter (Mem-GF-2) performs best on Amazon-book and ML-20M. As expected, higher-order filters incur longer runtime. Nonetheless, since the overall computation time remains manageable and the search space is restricted to four predefined filters, it is feasible to perform a validation set-based search. Increasing the polynomial order $N$ can improve filter expressiveness, but a sufficiently large Krylov subspace size $K$ is required; in particular, the exactness guarantee holds under exact arithmetic when $N<K$ , whereas regimes with $K\leq N$ may become unstable.

TABLE VIII: Runtime for preprocessing and recall according to four different types of polynomial graph filters in Mem-GF.

	Yelp		Amazon-book		ML-20M
	Runtime	Recall	Runtime	Recall	Runtime	Recall
Mem-GF-1	1.5s	0.0687	6.6s	0.0730	102.2s	0.0554
Mem-GF-2	2.9s	0.0692	11.4s	0.0753	156.6s	0.0566
Mem-GF-3	4.3s	0.0683	16.4s	0.0693	209.1s	0.0516
Mem-GF-5	6.5s	0.0705	26.4s	0.0739	314.1s	0.0564

V-H Sensitivity Analysis (RQ7)

Fig. 7 visualizes the impact of two key hyperparameters in Mem-GF: the normalization parameter $\gamma$ in (6) and the adjustment parameter $s$ in (9). First, while Yelp attains its highest accuracy at $\gamma=0.55$ , both Amazon-book and ML-20M achieve optimal performance at $\gamma=0.5$ . Notably, using the conventional choice of $\gamma=0.5$ still yields sufficiently robust accuracy across all datasets. Second, the optimal values of $s$ are $0.9$ on both Yelp and Amazon-book, and $1.0$ on ML-20M, indicating that setting closely to $s=1$ generally yields robust performance. In addition, we observe a notable performance drop on Yelp and ML-20M when $s<0.9$ and $s<1.0$ , respectively, whereas, outside these ranges, performance remains relatively stable. This aligns with the findings in [31], highlighting the importance of selecting proper ranges of $s$ to ensure optimal recommendation accuracy.

V-I Practical Deployment and Limitations

Mem-GF can be integrated into real-world recommender systems as a training-free GF-based CF module for candidate generation or scoring, and its output scores can be passed to subsequent ranking or re-ranking modules. Its user-specific Krylov subspace construction and score computation are naturally partitionable across workers or GPUs, since they can be performed independently for different users once the adjusted interaction matrix is available. Fully distributed system-level optimization, including distributed data management and cache-refresh strategies, is left as future work. The main limitations are as follows. First, Mem-GF is most directly suited to item similarity graphs built from user–item interactions; social networks, knowledge graphs, or other heterogeneous graph structures may require additional graph filter designs. Second, extremely sparse or noisy user signals, including cold-start-like cases with very few observed interactions, may degrade recommendation quality because the resulting user-specific Krylov subspace may contain insufficient or noisy preference information. In addition, very high polynomial orders may increase numerical sensitivity or computational overhead when $K$ is not sufficiently large. These issues can be mitigated through validation-based tuning of $K$ , $N$ , $\gamma$ , and $s$ .

VI Conclusions and Outlook

In this paper, we introduced Mem-GF, a training-free and memory-efficient GF method for CF. By operating within user-specific Krylov subspaces, Mem-GF eliminates the need to construct or store the full item similarity matrix, achieving substantial reductions in memory usage and preprocessing/inference time. In addition to scalability, Mem-GF supports expressive polynomial filters with low overhead. We further provided the condition under which Krylov filtering incurs no approximation error. Extensive experiments across three benchmark datasets demonstrated that Mem-GF achieves up to 574% memory reduction, 438% preprocessing speedup, 2620% inference speedup, and state-of-the-art recommendation accuracy. Future work includes extending Mem-GF to heterogeneous graph structures and distributed recommendation systems.

Acknowledgement

This work was supported by the National Research Foundation of Korea (NRF), a South Korea grant funded by the Korea government (MSIT) (RS-2021-NR059723), and by SMEs Technology Innovation Development Program through the Technology Innovation and Promotion Agency (TIPA), funded by the Ministry of SMEs and Startups (RS-2024-00511332).

References

[1] F. Borisyuk, S. He, Y. Ouyang, M. Ramezani, P. Du, X. Hou, C. Jiang, N. Pasumarthy, P. Bannur, B. Tiwana, et al. (2024-Aug.) LiGNN: graph neural networks at linkedin. In Proc. of the 30th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining (KDD 2024), Barcelona, Spain, pp. 4793–4803. Cited by: §I-A.
[2] D.-K. Chae, J.-S. Kang, S.-W. Kim, and J.-T. Lee (2018-Oct.) CFGAN: a generic collaborative filtering framework based on generative adversarial networks. In Proc. 27th ACM Int. Conf. Inf. Knowl. Manag. (CIKM 2018), Torino, Italy, pp. 137–146. Cited by: §V-A.
[3] P. Covington, J. Adams, and E. Sargin (2016-Sep.) Deep neural networks for YouTube recommendations. In Proc. of the 10th ACM Conf. on Recommender Systems (RecSys 2016), Boston, MA, USA, pp. 191–198. Cited by: §I-A.
[4] H.-M. Fu, P. Poirson, K. S. Lee, and C. Wang (2022) Revisiting neighborhood-based link prediction for collaborative filtering. In Proc. ACM Web Conf. (WWW 2022), Lyon, France, pp. 1009–1018. Cited by: §V-A.
[5] M. Fu, H. Qu, Z. Yi, L. Lu, and Y. Liu (2019-Mar.) A novel deep learning-based collaborative filtering model for recommendation system. IEEE Trans. Cybern. 49, pp. 1084–1096. Cited by: §I-A.
[6] C. A. Gomez-Uribe and N. Hunt (2015-Dec.) The Netflix recommender system: algorithms, business value, and innovation. ACM Trans. Manage. Inf. Syst. 6 (4), pp. 1–19. Cited by: §I-A.
[7] Y. Gong, Z. Jiang, Y. Feng, B. Hu, K. Zhao, Q. Liu, and W. Ou (2020-Oct.) EdgeRec: recommender system on edge in mobile Taobao. In Proc. of the 29th ACM Int. Conf. on Information & Knowledge Management (CIKM 2020), Virtual Event, pp. 2477–2484. Cited by: §I-A.
[8] R. He and J. McAuley (2016-Apr.) Ups and downs: modeling the visual evolution of fashion trends with one-class collaborative filtering. In Proc. of the 25th Int. Conf. on World Wide Web (WWW 2016), Montréal, QC, Canada, pp. 507–517. Cited by: §I-A.
[9] R. He and J. McAuley (2016-Feb.) VBPR: visual bayesian personalized ranking from implicit feedback. In Proc. of the 30th AAAI Conf. on Artificial Intelligence (AAAI 2016), Phoenix, AZ, USA, pp. 144–150. Cited by: §II, §V-A, §V-A.
[10] X. He, K. Deng, X. Wang, Y. Li, Y. Zhang, and M. Wang (2020-Jul.) LightGCN: simplifying and powering graph convolution network for recommendation. In Proc. of the 43rd Int. ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR 2020), Virtual Event, pp. 639–648. Cited by: §I-A, §I-A, §II, §V-A, §V-A, §V-A.
[11] X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T.-S. Chua (2017-Apr.) Neural collaborative filtering. In Proc. of the 26th Int. Conf. on World Wide Web (WWW 2017), Perth, Australia, pp. 173–182. Cited by: §I-A, §II, §V-A, §V-A, §V-A.
[12] S. Hong, J. Choi, Y.-C. Lee, S. Kumar, and N. Park (2024-Aug.) SVD-AE: simple autoencoders for collaborative filtering. In Proc. of the 33rd Int. Joint Conf. on Artificial Intelligence (IJCAI 2024), Jeju, Republic of Korea. Cited by: §V-A, §V-A.
[13] K. Ito and K. Xiong (2000-05) Gaussian filters for nonlinear filtering problems. IEEE Trans. Autom. Control 45 (5), pp. 910–927. Cited by: §I-B, §IV-B3.
[14] W. Jiang, Z. He, S. Zhang, T. B. Preußer, K. Zeng, L. Feng, J. Zhang, T. Liu, Y. Li, J. Zhou, et al. (2021-Apr.) MicroRec: efficient recommendation inference by hardware and data structure solutions. In Proc. of the 4th Conf. on Machine Learning and Systems (MLSys 2021), Virtual Event, pp. 845–859. Cited by: §I-B.
[15] C. Kim, Y. Choi, J.-D. Park, and W.-Y. Shin (2025-Apr.) Leveraging member-group relations via multi-view graph filtering for effective group recommendation. In Proc. of the 34th Int. Conf. on World Wide Web (WWW 2025), Sydney, NSW, Australia, pp. 1077–1081. Cited by: §I-A, §I-A, §IV-B1, §IV-B3, §IV-B3, §V-A.
[16] C. Kim, J. Sung, Y. Han, and J. Lee (2025) Graph spectral filtering with chebyshev interpolation for recommendation. In Proc. 48th Int. ACM SIGIR Conf. Res. Dev. Inf. Retr. (SIGIR 2025), pp. 1964–1974. Cited by: §II.
[17] D. P. Kingma and J. Ba (2015) Adam: A method for stochastic optimization. In Proc. 3rd Int. Conf. Learn. Represent. (ICLR 2015), San Diego, CA, pp. 1–15. Cited by: §V-A.
[18] T. N. Kipf and M. Welling (2017-Apr.) Semi-supervised classification with graph convolutional networks. In Proc. of the 5th Int. Conf. on Learning Representations (ICLR 2017), Toulon, France, pp. 1–14. Cited by: §I-A, §I-A.
[19] C. Lanczos (1950-Oct.) An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. J. Res. Natl. Bur. Stand. 45 (4), pp. 255–282. Cited by: §IV-B2, §IV-B.
[20] J. Lee, S. Kang, W.-Y. Shin, J. Choi, N. Park, and D. Lee (2024) Graph signal processing for cross-domain recommendation. arXiv preprint arXiv:2407.12374. Cited by: §I-A.
[21] M. Li, X. Guo, Y. Wang, Y. Wang, and Z. Lin (2022-Jul.) G²CN: graph gaussian convolution networks with concentrated graph filters. In Proc. of the 39th Int. Conf. on Machine Learning (ICML 2022), Baltimore, MD, USA, pp. 12782–12796. Cited by: §I-B, §IV-B3.
[22] D. Lian, H. Wang, Z. Liu, J. Lian, E. Chen, and X. Xie (2020-Apr.) LightRec: a memory and search-efficient recommender system. In Proc. of the 29th Int. Conf. on World Wide Web (WWW 2020), Taipei, Taiwan, pp. 695–705. Cited by: §I-B.
[23] D. Liang, R. G. Krishnan, M. D. Hoffman, and T. Jebara (2018-Apr.) Variational autoencoders for collaborative filtering. In Proc. of the 2018 Int. World Wide Web Conf. (WWW 2018), Lyon, France, pp. 689–698. Cited by: §I-A, §II, §V-A.
[24] R. Liao, Z. Zhao, R. Urtasun, and R. S. Zemel (2019-05) LanczosNet: multi-scale deep graph convolutional networks. Int. Conf. Learn. Represent. (ICLR). Cited by: §II, §V-A.
[25] G. Linden, B. Smith, and J. York (2003-Feb.) Amazon.com recommendations: item-to-item collaborative filtering. IEEE Internet Comput. 7 (1), pp. 76–80. Cited by: §I-A.
[26] C. Liu, Y. Zhang, J. Wang, R. Ying, and J. Caverlee (2025) Flow matching for collaborative filtering. In Proc. 31st ACM SIGKDD Conf. Knowl. Discov. Data Min. (KDD 2025), Toronto, Canada, pp. 1765–1775. Cited by: §V-A.
[27] J. Liu, D. Li, H. Gu, T. Lu, P. Zhang, L. Shang, and N. Gu (2023-Apr.) Personalized graph signal processing for collaborative filtering. In Proc. of the 32nd Int. Conf. on World Wide Web (WWW 2023), Austin, TX, USA, pp. 1264–1272. Cited by: §I-A, §I-A, §II, §IV-A, §IV-A, §V-A, §V-A, §V-A, Remark.
[28] K. Mao, J. Zhu, J. Wang, Q. Dai, Z. Dong, X. Xiao, and X. He (2021-Nov.) SimpleX: a simple and strong baseline for collaborative filtering. In Proc. of the 30th ACM Int. Conf. on Information and Knowledge Management (CIKM 2021), Virtual Event, pp. 1243–1252. Cited by: §V-A.
[29] A. Ortega, P. Frossard, J. Kovačević, J. M. F. Moura, and P. Vandergheynst (2018-05) Graph signal processing: overview, challenges, and applications. Proc. IEEE 106 (5), pp. 808–828. Cited by: Definition III.1, §IV-A.
[30] J.-D. Park, S. Li, X. Cao, and W.-Y. Shin (2023-Aug.) Criteria tell you more than ratings: criteria preference-aware light graph convolution for effective multi-criteria recommendation. In Proc. of the 29th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining (KDD 2023), Long Beach, CA, USA, pp. 1356–1365. Cited by: §I-A.
[31] J.-D. Park, Y.-M. Shin, and W.-Y. Shin (2024-Jul.) Turbo-CF: matrix decomposition-free graph filtering for fast recommendation. In Proc. of the 47th Int. ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR 2024), Washington, DC, USA, pp. 2672–2676. Cited by: §I-A, §I-A, §II, 3rd item, §IV-A, §IV-A, §IV-A, §IV-A, §IV-B1, §IV-B3, §IV-B3, §IV-B3, §V-A, §V-A, §V-A, §V-H.
[32] J.-D. Park, J. Yoo, and W.-Y. Shin (2025-Apr.) Criteria-aware graph filtering: extremely fast yet accurate multi-criteria recommendation. In Proc. of the 34th Int. Conf. on World Wide Web (WWW 2025), Sydney, NSW, Australia, pp. 4482–4493. Cited by: §I-A, §IV-B1, §IV-B3, Remark.
[33] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. (2019) PyTorch: an imperative style, high-performance deep learning library. In Proc. 33rd Conf. Neural Inf. Process. Syst. (NeurIPS 2019), Vancouver, BC, Canada. Cited by: §V-A.
[34] S. Peng, X. Liu, K. Sugiyama, and T. Mine (2024) How powerful is graph filtering for recommendation. In Proc. 30th ACM SIGKDD Conf. Knowl. Discov. Data Min. (KDD 2024), Barcelona, Spain, pp. 2388–2399. Cited by: §II.
[35] S. Peng, K. Sugiyama, and T. Mine (2024-Jan.) Less is more: removing redundancy of graph convolutional networks for recommendation. ACM Trans. Inf. Syst. 42 (3), pp. 1–26. Cited by: §V-A.
[36] Y. Qin, W. Ju, Y. Gu, Z. Qiao, Z. Xiao, and M. Zhang (2025-Jun.) PolyCF: towards optimal spectral graph filters for collaborative filtering. ACM Trans. Inf. Syst. 43 (4), pp. 1–28. Cited by: §I-A, §IV-A, §IV-A, §IV-B3, §V-A.
[37] R. Ramakrishna, H.-T. Wai, and A. Scaglione (2020-Nov.) A user guide to low-pass graph signal processing and its applications: tools and applications. IEEE Signal Process. Mag. 37 (6), pp. 74–85. Cited by: Definition III.3.
[38] N. Rao, H.-F. Yu, P. K. Ravikumar, and I. S. Dhillon (2015-Dec.) Collaborative filtering with graph information: consistency and scalable methods. In Proc. 28th Adv. Neural Inf. Process. Syst. (NIPS 2015), Montreal, Canada, pp. 2107–2115. Cited by: §II.
[39] S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme (2009-Jun.) BPR: Bayesian personalized ranking from implicit feedback. In Proc. of the 25th Conf. on Uncertainty in Artificial Intelligence (UAI 2009), Montreal, QC, Canada, pp. 452–461. Cited by: §V-A.
[40] Y.-S. Roh, J.-Y. Kim, J.-D. Park, and W.-Y. Shin (2026-Jun.) Training-free adjustable polynomial graph filtering for ultra-fast multimodal recommendation. Engineering Applications of Artificial Intelligence 173, pp. 1–14. Cited by: §I-A.
[41] Y. Saad (2003) Iterative methods for sparse linear systems. SIAM. Cited by: item 2.
[42] Y. Saad (2011) Numerical methods for large eigenvalue problems: revised edition. Vol. 66, SIAM. Cited by: §I-B, Definition III.4.
[43] Y. Shen, Y. Wu, Y. Zhang, C. Shan, J. Zhang, B. K. Letaief, and D. Li (2021-Nov.) How powerful is graph convolution for recommendation?. In Proc. of the 30th ACM Int. Conf. on Information and Knowledge Management (CIKM 2021), Gold Coast, Queensland, Australia, pp. 1619–1629. Cited by: §I-A, §I-A, §II, §III-A, §III-A, Definition III.1, Definition III.2, Definition III.3, §IV-A, §IV-A, §IV-A, §IV-A, §IV-B3, §V-A, §V-A, §V-A.
[44] I. Shenbin, A. Alekseev, E. Tutubalina, V. Malykh, and S. I. Nikolenko (2020-Feb.) RecVAE: a new variational autoencoder for top-N recommendations with implicit feedback. In Proc. of the 13th ACM Int. Conf. on Web Search and Data Mining (WSDM 2020), Houston, TX, USA, pp. 528–536. Cited by: §II, §V-A.
[45] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst (2013-05) The emerging field of signal processing on graphs: extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Process. Mag. 30 (3), pp. 83–98. Cited by: §III-A, Definition III.1, Definition III.2.
[46] H. Steck (2019-05) Embarrassingly shallow autoencoders for sparse data. In Proc. of the 28th Int. Conf. on World Wide Web (WWW 2019), San Francisco, CA, USA, pp. 3251–3257. Cited by: §I-A, §II, §V-A, §V-A.
[47] L. N. Trefethen and D. B. III (1997) Numerical linear algebra. SIAM. Cited by: §I-B, Definition III.4.
[48] R. van den Berg, T. N. Kipf, and M. Welling (2018-Aug.) Graph convolutional matrix completion. In Proc. KDD’18 Deep Learning Day, London, United Kingdom. Cited by: §II, §V-A, §V-A.
[49] P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, et al. (2020-Feb.) SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature Methods 17 (3), pp. 261–272. Cited by: §V-A.
[50] H.-T. Wai, S. Segarra, A. E. Ozdaglar, A. Scaglione, and A. Jadbabaie (2020-Jan.) Blind community detection from low-rank excitations of a graph filter. IEEE Trans. Signal Process. 68, pp. 436–451. Cited by: Definition III.3.
[51] W. Wang, Y. Xu, F. Feng, X. Lin, X. He, and T.-S. Chua (2023-Jul.) Diffusion recommender model. In Proc. 46th Int. ACM SIGIR Conf. Res. Dev. Inf. Retr. (SIGIR 2023), Taipei, Taiwan, pp. 832–841. Cited by: §V-A.
[52] X. Wang, X. He, M. Wang, F. Feng, and T.-S. Chua (2019-Jul.) Neural graph collaborative filtering. In Proc. of the 42nd Int. ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR 2019), Paris, France, pp. 165–174. Cited by: §I-A, §II, §V-A, §V-A, §V-A.
[53] X. Wang, H. Jin, A. Zhang, X. He, T. Xu, and T.-S. Chua (2020-Jul.) Disentangled graph collaborative filtering. In Proc. of the 43rd Int. ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR 2020), Virtual Event, pp. 1001–1010. Cited by: §V-A, §V-A, §V-A.
[54] T. Wei, J. Ma, and T. W. S. Chow (2023-Apr.) Fine-tuning partition-aware item similarities for efficient and scalable recommendation. In Proc. of the 32nd Int. Conf. on World Wide Web (WWW 2023), Austin, TX, USA, pp. 823–832. Cited by: §I-A, §V-A, §V-A.
[55] J. Wu, X. Wang, F. Feng, X. He, L. Chen, J. Lian, and X. Xie (2021-Jul.) Self-supervised graph learning for recommendation. In Proc. of the 44th Int. ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR 2021), Virtual Event, pp. 726–735. Cited by: §II, §V-A.
[56] S. Wu, Y. Tang, Y. Zhu, L. Wang, X. Xie, and T. Tan (2019-Jan.) Session-based recommendation with graph neural networks. In Proc. of the 33rd AAAI Conf. on Artificial Intelligence (AAAI 2019), Honolulu, HI, USA, pp. 346–353. Cited by: §I-A.
[57] Y. Wu, C. DuBois, A. X. Zheng, and M. Ester (2016) Collaborative denoising auto-encoders for top-N recommender systems. In Proc. 9th ACM Int. Conf. Web Search Data Mining (WSDM 2016), pp. 153–162. Cited by: §II.
[58] J. Xia, D. Li, H. Gu, J. Liu, T. Lu, and N. Gu (2022-Apr.) FIRE: fast incremental recommendation with graph signal processing. In Proc. of the 31st Int. Conf. on World Wide Web (WWW 2022), Lyon, France, pp. 1808–1819. Cited by: §I-A, §I-A, Definition III.1, Definition III.2, §IV-A, §IV-A.
[59] J. Xia, D. Li, H. Gu, T. Lu, P. Zhang, L. Shang, and N. Gu (2024-05) Hierarchical graph signal processing for collaborative filtering. In Proc. of the 33rd Int. Conf. on World Wide Web (WWW 2024), Singapore, pp. 3229–3240. Cited by: §I-A, §I-A, §II, §IV-A, §IV-A, §V-A, §V-A, §V-A.
[60] L. Xia, C. Huang, J. Shi, and Y. Xu (2023-Apr.) Graph-less collaborative filtering. In Proc. of the 32nd Int. Conf. on World Wide Web (WWW 2023), Austin, TX, USA, pp. 17–27. Cited by: §I-A.
[61] H.-J. Xue, X. Dai, J. Zhang, S. Huang, and J. Chen (2017) Deep matrix factorization models for recommender systems. In Proc. 26th Int. Joint Conf. Artif. Intell. (IJCAI 2017), Melbourne, Australia, pp. 3203–3209. Cited by: §II.
[62] J.-H. Yang, C.-M. Chen, C.-J. Wang, and M.-F. Tsai (2018) HOP-rec: high-order proximity for implicit recommendation. In Proc. 12th ACM Conf. Recommender Syst. (RecSys), Vancouver, BC, Canada, pp. 140–144. Cited by: §II.
[63] R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamilton, and J. Leskovec (2018-Aug.) Graph convolutional neural networks for web-scale recommender systems. In Proc. of the 24th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining (KDD 2018), London, United Kingdom, pp. 974–983. Cited by: §II, §V-A, §V-A.
[64] S. Zhang, Y. Liu, Y. Sun, and N. Shah (2022-Apr.) Graph-less neural networks: teaching old mlps new tricks via distillation. In Proc. of the 10th Int. Conf. on Learning Representations (ICLR 2022), Virtual Event. Cited by: §I-A.
[65] L. Zheng, C.-T. Lu, F. Jiang, J. Zhang, and P. S. Yu (2018-Sep.) Spectral collaborative filtering. In Proc. of the 12th ACM Conf. on Recommender Systems (RecSys 2018), Vancouver, BC, Canada, pp. 311–319. Cited by: §II, §V-A, §V-A.
[66] Y. Zhu and Z. Chen (2022) Mutually-regularized dual collaborative variational auto-encoder for recommendation systems. In WWW 2022, Lyon, France, pp. 2379–2387. Cited by: §II.