License: CC BY-NC-ND 4.0
arXiv:2606.22105v1 [math.ST] 20 Jun 2026

A Generative Model for Extremely Sparse
Edge-Exchangeable Networks

Valentin Kilian
Abstract

We propose a graph generative model for sequences of extremely sparse, edge-exchangeable networks. Models for sparse graphs often face a trade-off between desirable properties like exchangeability and the ability to capture the sparsity observed in real-world networks. While models based on vertex or edge exchangeability have successfully generated sparse graphs, achieving the ”extremely sparse” regime, where the number of edges scales near-linearly with the number of nodes, has remained a challenge. Recently, a novel Completely Random Measure (CRM) was introduced, demonstrating that this rate could be achieved within the vertex-exchangeable framework of Caron and Fox. This paper extends this work by demonstrating that this new CRM can be integrated into the alternative edge-exchangeable framework to achieve extreme sparsity.

Machine Learning, ICML

1 Introduction

The proliferation of large-scale network data across diverse scientific domains, from microbiology and ecology to economics and the social sciences, has revealed a set of unifying structural properties. Among these, sparsity stands out as one of the most fundamental. For a simple graph with NN nodes, the number of possible edges scales quadratically, specifically as (N2)=Θ(N2)\binom{N}{2}=\Theta(N^{2}). However, empirical networks rarely approach this theoretical maximum. We formalize this observation by defining a graph sequence as dense if its edge count scales quadratically with its node count, and sparse if the scaling is sub-quadratic. Real-world networks are overwhelmingly sparse. While they also display other important features like power-law degree distributions, the small-world phenomenon, and community structures, our focus here is squarely on the challenge of modeling sparsity.

Constructing a statistically coherent model for sparse networks is notoriously difficult, as sparsity is often incompatible with desirable modeling assumptions. For instance, nodes exchangeability—the property that a model’s distribution is invariant to the relabelling of nodes—is a desirable statistical property for a generative model. However, the Aldous-Hoover theorem (Aldous, 1981; Hoover, 1979) presents a major obstacle, demonstrating that any graph model satisfying this property must be either dense or empty (see also Orbanz and Roy, 2015).

To circumvent this limitation, several distinct modeling paradigms have been developed. Preferential attachment models (Barabási and Albert, 1999; Bollobás, 2001), for example, abandon exchangeability entirely to generate sparse graphs.

Other lines of research have focused on weakening the exchangeability assumption. One prominent approach, initiated by Caron and Fox (Caron and Fox, 2017), substitutes nodes exchangeability with the more flexible notion of Kallenberg exchangeability (Kallenberg, 2005; Veitch and Roy, 2015; Borgs et al., 2018). This framework has spurred a rich body of work modeling various network phenomena, such as overlapping communities (Todeschini et al., 2020; Miscouridou et al., 2026), clustering (Caron et al., 2023), dynamic evolution (Naik et al., 2022), and core-periphery structures (Naik et al., 2021). An alternative strategy involves replacing nodes exchangeability with edge exchangeability (Crane and Dempsey, 2015, 2016; Broderick and Cai, 2016; Cai et al., 2016). Models in this class also achieve sparsity. Both of these approaches employ Bayesian nonparametric tools like Completely Random Measures (CRMs). These models have been shown to generate graph sequences where the number of edges, EE, scales as N1+ϵN^{1+\epsilon} for 0<ϵ<10<\epsilon<1.

A recent breakthrough by Kilian et al. (2025) has redefined the achievable level of sparsity within the Caron-Fox framework. They introduced a novel CRM that produces graph sequences where E=Θ(N(N))E=\Theta(N\ell(N)), with \ell being a slowly varying function. This behavior is termed extremely sparse. This paper builds directly on that insight. Our central contribution is to adapt the methodology from Kilian et al. (2025) to achieve this same extreme sparsity within the edge-exchangeable model class.

The remainder of this paper is organized as follows. Section 2 provides a review of the edge-exchangeable network model. Section 3 and Section 4 contain our primary results: we first generalize the sparsity theorem from Cai et al. (2016) to encompass the extremely sparse regime, and then we introduce the Beta process with rapid variation, demonstrating its utility in constructing an extremely sparse, edge-exchangeable model. Finally, Section 6 presents simulation studies validating our theoretical findings. All deferred proofs and definitions of asymptotic notation are available in the appendix.

2 Edge-exchangeable graph sequences

The notion of an edge-exchangeable graph appeared around 2015, in the work of Crane and Dempsey (Crane and Dempsey, 2015, 2016) and of Broderick, Cai, and Campbell (Broderick and Cai, 2016; Cai et al., 2016), with some results presented in workshops prior to publication. In what follows, we are inspired by the presentation in Cai et al. (2016). This section provides an introduction to the topic; we refer the reader to the aforementioned references for more details.

2.1 Permutation Invariance to Edge Arrival Order

Let (𝒢t)t(\mathcal{G}_{t})_{t} be a sequence of (multi)graphs, where each graph 𝒢t=(𝒱t,t)\mathcal{G}_{t}=(\mathcal{V}_{t},\mathcal{E}_{t}) consists of a finite set of vertices 𝒱t\mathcal{V}_{t} and a finite multiset of timestamped edges t\mathcal{E}_{t}. We assume the sequence is projective (or growing) i.e., 𝒱t𝒱t+1\mathcal{V}_{t}\subseteq\mathcal{V}_{t+1} and tt+1\mathcal{E}_{t}\subseteq\mathcal{E}_{t+1}.

Each (timestamped) edge (e,s)t(e,s)\in\mathcal{E}_{t} is a tuple whose first entry is a set ee of two vertices in 𝒱t\mathcal{V}_{t} and whose second entry is a timestamp ss corresponding to the step at which this edge first appeared. As in the Caron-Fox model, we are only interested in vertices involved in at least one edge. Consequently, we can define 𝒱t\mathcal{V}_{t} as the set of all vertices that appear in the edges of t\mathcal{E}_{t}, in which case the graph 𝒢t\mathcal{G}_{t} is completely defined by its edge set t\mathcal{E}_{t}.

Definition 2.1 (Cai et al., 2016 Definition 2.4).

Consider a random graph sequence (𝒢t)t(\mathcal{G}_{t})_{t} defined as above with t={(e1,s1),,(em,sm)}\mathcal{E}_{t}=\{(e_{1},s_{1}),\dots,(e_{m},s_{m})\}. The sequence (𝒢t)t(\mathcal{G}_{t})_{t} is (infinitely) edge-exchangeable if for every tt\in\mathbb{N} and for every permutation π\pi of the steps {1,,t}\{1,\dots,t\}, 𝒢t=𝑑𝒢~t\mathcal{G}_{t}\overset{d}{=}\tilde{\mathcal{G}}_{t}, where 𝒢~t\tilde{\mathcal{G}}_{t} has the edge set ~t={(e1,π(s1)),,(em,π(sm))}\tilde{\mathcal{E}}_{t}=\{(e_{1},\pi(s_{1})),\dots,(e_{m},\pi(s_{m}))\}.

In other word, the distribution of the generative model is invariant under any finite permutation of the order in which the edges arrived. This model (and its inherent symmetry) can be connected to partition, feature allocation and trait allocation; these connections are explained in Broderick and Cai (2016) and Campbell et al. (2018).

2.2 A Bayesian Nonparametric Model

We now present a graph generative model that exhibits edge-exchangeability, which we will later prove can generate extremely sparse graphs. This model, referred to as the graph frequency model in Cai et al. (2016), is similar to the Caron-Fox model from Caron and Fox (2017) but with several key differences. These differences explain why one model is edge-exchangeable while the other is Kallenberg-exchangeable. A comparison of these two models is provided in Cai et al. (2016), where the authors also prove that the two notions are distinct.

In the graph frequency model, we consider a countably infinite set of latent vertices, indexed by the positive integers \mathbb{N}. Associated with these vertices is an infinite collection of edge labels {θ{i,j}}\{\theta_{\{i,j\}}\} with values in [0,1][0,1] and edge probabilities {w{i,j}}\{w_{\{i,j\}}\} with values in [0,1][0,1]. For any given (potentially random) choice of these labels and probabilities, we define the measure GG on [0,1][0,1] as:

G={i,j}:i,jw{i,j}δθ{i,j}.G=\sum_{\{i,j\}:i,j\in\mathbb{N}}w_{\{i,j\}}\delta_{\theta_{\{i,j\}}}.

If either the labels {θ{i,j}}\{\theta_{\{i,j\}}\} or the probabilities {w{i,j}}\{w_{\{i,j\}}\} (or both) are random, then GG is a discrete random measure on [0,1][0,1]. Given GG, the graph sequence is constructed recursively. We initialize with 0=\mathcal{E}_{0}=\emptyset. Then, at each step tt, we form a new edge set, FnewtF^{t}_{\text{new}}, by sampling the multiplicity m{i,j}tm^{t}_{\{i,j\}} for every possible edge {i,j}\{i,j\} according to:

m{i,j}tBernoulli(w{i,j})m^{t}_{\{i,j\}}\sim\text{Bernoulli}(w_{\{i,j\}})

We then add m{i,j}tm^{t}_{\{i,j\}} copies of the edge {i,j}\{i,j\} with timestamp tt to FnewtF^{t}_{\text{new}}. Finally, the edge set is updated as t+1=tFnewt\mathcal{E}_{t+1}=\mathcal{E}_{t}\cup F^{t}_{\text{new}}. This process generates multigraphs, potentially adding multiple edges at each time step.

Proposition 2.2 (Cai et al., 2016).

The sequence of multigraphs generated via the preceding method is edge-exchangeable.

Proof.

Conditional on GG, the formation of an edge at any step tt is an independent event with an identical probability distribution. The exchangeability of the time stamps follows directly. ∎

This model can be implemented with a random measure GG constructed from a Poisson point process. Let 𝒲\mathcal{W} be a Poisson process on [0,1][0,1] with a non-atomic, σ\sigma-finite rate measure ν\nu that satisfies ν([0,1])=\nu([0,1])=\infty and 01wν(dw)<\int_{0}^{1}w\,\nu(\mathrm{d}w)<\infty111These two conditions on ν\nu ensure that 𝒲\mathcal{W} is a countably infinite collection of weights in [0,1][0,1] and that their sum w𝒲w<\sum_{w\in\mathcal{W}}w<\infty a.s.. We can then use 𝒲\mathcal{W} to define the set of edge probabilities as w{i,j}=wiwjw_{\{i,j\}}=w_{i}w_{j} for iji\neq j, and w{i,i}=0w_{\{i,i\}}=0 (to prevent self-loops), where each wi𝒲w_{i}\in\mathcal{W}. The edge labels θ{i,j}\theta_{\{i,j\}} can be sampled independently and uniformly from [0,1][0,1]. With this setup, GG becomes a homogeneous CRM on [0,1][0,1] with no deterministic or fixed atomic components (Kingman, 1967, 1993; Lijoi and Prünster, 2010). The choice of the CRM GG, and therefore the choice of the measure ν\nu, strongly influences the properties of the resulting edge-exchangeable graph sequence.

While this model generates multigraphs, it can be readily adapted to produce simple graphs 𝒢¯t=(𝒱¯t,¯t)\overline{\mathcal{G}}_{t}=(\overline{\mathcal{V}}_{t},\overline{\mathcal{E}}_{t}). This is done by setting 𝒱¯t=𝒱t\overline{\mathcal{V}}_{t}=\mathcal{V}_{t} and defining ¯t\overline{\mathcal{E}}_{t} as the set of unique edges in t\mathcal{E}_{t} (i.e., those with a multiplicity of at least one). Although the resulting simple graphs do not inherit the edge-exchangeability property, they retain many characteristics of the original multigraphs, most notably their sparsity. In what follows, we consider only multigraphs.

Refer to caption
Figure 1: Edge-exchangeable graph generated from a RapidBeta process with α=1,τ=0.95,ξ=1.2\alpha=1,\tau=0.95,\xi=1.2, and η=15\eta=15.

3 Extreme Sparsity

We denote the number of vertices and edges in the multigraph, respectively, as:

Nt\displaystyle N_{t} =|𝒱t|=i𝟏jiM{i,j}t>0,\displaystyle=|\mathcal{V}_{t}|=\sum_{i}\mathbf{1}_{\sum_{j\neq i}M^{t}_{\{i,j\}}>0},
Nt(e)\displaystyle N^{(e)}_{t} =|t|=12ijM{i,j}t,\displaystyle=|\mathcal{E}_{t}|=\frac{1}{2}\sum_{i\neq j}M^{t}_{\{i,j\}},

where M{i,j}t=p=1tm{i,j}pM^{t}_{\{i,j\}}=\sum_{p=1}^{t}m^{p}_{\{i,j\}} is the multiplicity of the edge {i,j}\{i,j\} at time tt. The following theorem states that an edge-exchangeable model constructed with a measure that has a rapidly varying tail is extremely sparse.

Theorem 3.1.

Assume that, as w0w\downarrow 0,

ν(w)cw2(w1),\displaystyle\nu(w)\sim c\,w^{-2}\ell(w^{-1}), (1)

where c>0c>0, \ell is slowly varying, and

1(x):=xu1(u)du<\displaystyle\ell_{1}(x):=\int_{x}^{\infty}u^{-1}\ell(u)\,\mathrm{d}u<\infty (2)

for all sufficiently large xx. If 1(x1(x))1(x),\ell_{1}(x\ell_{1}(x))\sim\ell_{1}(x), then

Nt(e)=Θ(Nt1(Nt))a.s.\displaystyle N_{t}^{(e)}=\Theta\!\left(\frac{N_{t}}{\ell_{1}(N_{t})}\right)\qquad\text{a.s.} (3)

In particular, if (x)=(logx)a,\ell(x)=(\log x)^{a}, a<1a<-1, then

Nt(e)=Θ(Nt(logNt)a1)a.s.\displaystyle N_{t}^{(e)}=\Theta\!\left(N_{t}(\log N_{t})^{-a-1}\right)\qquad\text{a.s.} (4)

This result extends Theorem 5.3 in Cai et al. (2016) to the extremely sparse regime. The main difference is that we consider rapidly varying measures, whereas they consider regularly varying measures that are not rapidly varying. This crucial distinction allows us to achieve a better sparsity rate. The proof methods rely on conditioning on the point process 𝒲\mathcal{W}, after which the approach is similar to that of (Janson, 2018).

There exist several families of slowly varying functions that satisfy all the required assumptions in the theorem. The most useful one is the logarithmic family (logx)a{(\log x)^{a}} for a<1a<-1. However, these also include more log-based families, such as the iterated log family p(logk(x))p1i=0k1(logi(x))1p{(\log_{k}(x))}^{-p-1}\prod_{i=0}^{k-1}{(\log_{i}(x))}^{-1} for p>0p>0, where logk\log_{k} is the logarithm iterated kk times, and the logarithmic exponential family a(logx)a1exp((logx)a)a{(\log x)}^{a-1}\exp(-{(\log x)}^{a}) for 0<a<1/20<a<1/2. Additionally, there are Lambert WW-based families, such as the Lambert WW family pW(x)p1+W(x)\frac{pW{(x)}^{-p}}{1+W(x)} for p>0p>0, where WW is the principal branch of the LambertWW function, or more generally the iterated LambertWW family pWk(x)pi=1k11+Wi(x)pW_{k}{(x)}^{-p}\prod_{i=1}^{k}\frac{1}{1+W_{i}(x)}, where WkW_{k} is the iterated principal branch of the Lambert WW function.

4 Beta process with rapid variation

We define the Beta process with rapid variation (RapidBeta) as the process whose rate measure ν\nu on [0,1][0,1] has the density:

ν(w)=ηατταsΓ(1s)w1s(1w)ξ1ds,\nu(w)=\frac{\eta}{\alpha-\tau}\int_{\tau}^{\alpha}\frac{s}{\Gamma(1-s)}w^{-1-s}\left(1-w\right)^{\xi-1}\,\mathrm{d}s, (5)

where the parameters satisfy η,ξ>0\eta,\xi>0 and 0τ<α10\leq\tau<\alpha\leq 1. For the special case where η=ξ=1\eta=\xi=1, this measure recovers the mixture of stable densities from Kilian et al. (2025), restricted to the range [0,1][0,1]. Figure 2 displays the RapidBeta density for α=1\alpha=1 and illustrates the effects of the parameters η\eta, τ\tau, and ξ\xi. The effect of α\alpha can be understood by adapting Proposition 1 from Kilian et al. (2025):

Refer to caption
Figure 2: Density of the RapidBeta rate measure ν\nu evaluated at α=1\alpha=1 and η=1\eta=1. Each parameter controls a distinct structural property of the process. The intensity η\eta acts as a global multiplier, scaling the overall mass. The shape parameter ξ\xi dictates the right-tail behavior. Higher values force the density to decay faster as w1w\to 1. Increasing the truncation parameter τ\tau eliminates the lower-order terms in the asymptotic behavior of ν\nu near zero.
Proposition 4.1.

If α=1\alpha=1, the density of the rate measure satisfies:

ν(w)=Θ(w2(ln(1/w))2)as w0.\nu(w)=\Theta\left(w^{-2}(\ln(1/w))^{-2}\right)\quad\text{as }w\to 0.

If α<1\alpha<1, the density of the rate measure satisfies:

ν(w)=Θ(w1α(ln(1/w))1)as w0.\nu(w)=\Theta\left(w^{-1-\alpha}(\ln(1/w))^{-1}\right)\quad\text{as }w\to 0.

Additionally, one can verify that the two necessary conditions for the Poisson process construction hold: ν([0,1])=\nu([0,1])=\infty and 01wν(dw)<\int_{0}^{1}w\nu(\mathrm{d}w)<\infty (see appendix for the proof). It follows that an edge-exchangeable graph sequence constructed using the RapidBeta measure with α=1\alpha=1 satisfies the condition of Theorem 3.1. Therefore, this construction yields an extremely sparse graph sequence.

While the asymptotic sparsity regime of the RapidBeta construction is entirely determined by α\alpha, the lower endpoint τ\tau plays a non-negligible role at finite sample sizes. Indeed, the Lévy density is obtained by integrating contributions of the form w1sw^{-1-s} over s[τ,α]s\in[\tau,\alpha], so that components with larger ss (i.e., closer to 1) dominate the small-ww behavior responsible for the asymptotic regime. However, when τ\tau is substantially below 1, the mixture includes terms with smaller exponents, which decay more slowly and can materially influence the effective behaviour over the range of weights accessible in finite simulations. As a consequence, although all choices τ<1\tau<1 with α=1\alpha=1 are asymptotically equivalent and lead to extreme sparsity, selecting τ\tau close to 1 concentrates the mixture on exponents near the critical boundary and suppresses lower-order contributions, thereby accelerating the emergence of the extreme-sparse regime in practice.

5 Simulation of the RapidBeta Process

To simulate an edge-exchangeable graph sequence as described in Section 2, we must first generate the Poisson point process

𝒲={wi}\mathcal{W}=\{w_{i}\}

When 𝒲\mathcal{W} is a RapidBeta process, the associated Lévy measure ν\nu satisfies ν([0,1])=\nu([0,1])=\infty, implying that 𝒲\mathcal{W} is almost surely infinite. Consequently, exact simulation is impossible, and we instead consider a truncated version.

5.1 Truncated RapidBeta Process

For a fixed ε>0\varepsilon>0, define the truncated Poisson process

𝒲ε={wi𝒲:wi>ε},\mathcal{W}_{\varepsilon}=\{w_{i}\in\mathcal{W}:w_{i}>\varepsilon\}, (6)

which contains only finitely many atoms a.s. Let us define the intensity measure λ\lambda as

λ(ds,dx)=ηατsΓ(1s)x1s(1x)ξ1dsdx,\lambda(\mathrm{d}s,\mathrm{d}x)=\frac{\eta}{\alpha-\tau}\frac{s}{\Gamma(1-s)}x^{-1-s}\left(1-x\right)^{\xi-1}\mathrm{d}s\mathrm{d}x, (7)

Sampling from 𝒲\mathcal{W} is equivalent to sampling a 2D Poisson point process on [τ,α]×[0,1][\tau,\alpha]\times[0,1] with intensity λ\lambda, and discarding the first dimension. Sampling from 𝒲ε\mathcal{W}_{\varepsilon} is performed analogously but with the 2D Poisson point process taken on [τ,α]×[ε,1][\tau,\alpha]\times[\varepsilon,1].

5.2 Partitioned Thinning Scheme

To construct an efficient exact sampler for 𝒲ϵ\mathcal{W}_{\epsilon}, we partition [ε,1][\varepsilon,1] into two regions:

Aε=[ε,1δ],B=(1δ,1],\displaystyle A_{\varepsilon}=[\varepsilon,1-\delta],\qquad B=(1-\delta,1], (8)

where δ(0,1)\delta\in(0,1) is fixed. On AεA_{\varepsilon}, we use the bound

λ(ds,dx)ηατBAx1αdsdx,\lambda(\mathrm{d}s,\mathrm{d}x)\leq\frac{\eta}{\alpha-\tau}B_{A}x^{-1-\alpha}\,\mathrm{d}s\,\mathrm{d}x, (9)

where BA=max{1,δξ1}B_{A}=\max\{1,\delta^{\xi-1}\}. On BB, we use the bound

λ(ds,dx)ηατ(1δ)1α(1x)ξ1dsdx.\lambda(\mathrm{d}s,\mathrm{d}x)\leq\frac{\eta}{\alpha-\tau}(1-\delta)^{-1-\alpha}(1-x)^{\xi-1}\,\mathrm{d}s\,\mathrm{d}x. (10)

These bounds define valid dominating measures on each region, allowing exact sampling via thinning as described in detail in Algorithm 1.

Algorithm 1 Exact sampler for the truncated weight set 𝒲ε={wi>ε}\mathcal{W}_{\varepsilon}=\{w_{i}>\varepsilon\}
0: Parameters η,ξ,τ,α\eta,\xi,\tau,\alpha, truncation level ε\varepsilon, partition parameter δ\delta
1:BAmax{1,δξ1}B_{A}\leftarrow\max\{1,\delta^{\xi-1}\}
2: ComputeΛA=ηBAε1δx1αdx,\Lambda_{A}=\eta B_{A}\int_{\varepsilon}^{1-\delta}x^{-1-\alpha}\,\mathrm{d}x,ΛB=η(1δ)1α1δ1(1x)ξ1dx\Lambda_{B}=\eta(1-\delta)^{-1-\alpha}\int_{1-\delta}^{1}(1-x)^{\xi-1}\,\mathrm{d}x
3: Sample NAPoisson(ΛA)N_{A}\sim\mathrm{Poisson}(\Lambda_{A}) and NBPoisson(ΛB)N_{B}\sim\mathrm{Poisson}(\Lambda_{B})
4:for k=1,,NAk=1,\dots,N_{A} do
5:  Sample SUnif[τ,α]S\sim\mathrm{Unif}[\tau,\alpha]
6:  Sample WW with density proportional to w1αw^{-1-\alpha} on (ε,1δ](\varepsilon,1-\delta]:UUnif[0,1]U\sim\mathrm{Unif}[0,1]W=(εαU(εα(1δ)α))1/αW=\left(\varepsilon^{-\alpha}-U\left(\varepsilon^{-\alpha}-(1-\delta)^{-\alpha}\right)\right)^{-1/\alpha}
7:  Accept with probability SΓ(1S)WαS(1W)ξ1BA\frac{S}{\Gamma(1-S)}\frac{W^{\alpha-S}(1-W)^{\xi-1}}{B_{A}}
8:  if accepted then
9:   Add weight WW to 𝒲ε\mathcal{W}_{\varepsilon}
10:  end if
11:end for
12:for k=1,,NBk=1,\dots,N_{B} do
13:  Sample SUnif[τ,α]S\sim\mathrm{Unif}[\tau,\alpha]
14:  Sample UUnif[0,1]U\sim\mathrm{Unif}[0,1], set W=1δU1/ξW=1-\delta U^{1/\xi}
15:  Accept with probability SΓ(1S)W1S(1δ)1α\frac{S}{\Gamma(1-S)}\frac{W^{-1-S}}{(1-\delta)^{-1-\alpha}}
16:  if accepted then
17:   Add weight WW to 𝒲ε\mathcal{W}_{\varepsilon}
18:  end if
19:end for
20:return 𝒲ε\mathcal{W}_{\varepsilon}

5.3 Correctness

Theorem 5.1.

The set of weights 𝒲ε\mathcal{W}_{\varepsilon} produced by the thinning procedure described above is a realization of a Poisson point process on [ε,1][\varepsilon,1] with intensity ν(w)dw\nu(w)\,\mathrm{d}w i.e. the ε\varepsilon-truncated RapidBeta process.

5.4 Truncation error induced by the ε\varepsilon-approximation

The ε\varepsilon-approximation replaces the full weight collection 𝒲={wi}\mathcal{W}=\{w_{i}\} by the truncated set 𝒲ε\mathcal{W}_{\varepsilon}. A natural measure of the approximation error is the discarded total mass

Rε:=wi𝒲:wiεwi.\displaystyle R_{\varepsilon}:=\sum_{w_{i}\in\mathcal{W}:w_{i}\leq\varepsilon}w_{i}. (11)
Proposition 5.2 (Discarded mass under truncation).

Assume α=1\alpha=1, then, as ε0\varepsilon\rightarrow 0,

𝔼[Rε]=Θ(1ln(1/ε)).\displaystyle\mathbb{E}[R_{\varepsilon}]=\Theta\!\left(\frac{1}{\ln(1/\varepsilon)}\right). (12)

In particular, the total mass discarded by truncating the CRM at level ε\varepsilon vanishes at a logarithmic rate.

5.5 Diagnostics

To asses the quality of our algorithm in practice we perform two diagnostics. The first diagnostic compares the empirical tail counts N(>x)=i𝟏Wi>xN(>x)=\sum_{i}\mathbf{1}_{W_{i}>x}, averaged over repeated samples, to the theoretical tail measure ν¯(x)=x1ν(w)dw\bar{\nu}(x)=\int_{x}^{1}\nu(w)\mathrm{d}w, leveraging the property that 𝔼[N(>x)]=ν¯(x)\mathbb{E}[N(>x)]=\bar{\nu}(x) for a Poisson process. Agreement between empirical and theoretical curves indicates that the marginal tail behavior and intensity are correctly captured. The second diagnostic examines the transformed order statistics by plotting ν¯(W(k))\bar{\nu}(W_{(k)}) against the rank k. For a correctly specified Poisson point process, these transformed values align with the identity line, reflecting the fact that the ordered points map to approximately unit-rate arrivals. Together, these diagnostics provide complementary validation: the former targets first-moment properties of the tail, while the latter probes the global distributional and structural consistency of the point process. In practice, our algorithm provides good performance; see Figure 3.

Refer to caption
Refer to caption
Figure 3: Diagnostics of the performance of our sampling algorithm, computed over 5 repetitions for a RapidBeta process with α=1,τ=0.95,ξ=1.2\alpha=1,\tau=0.95,\xi=1.2, and η=15\eta=15 and ε=107\varepsilon=10^{-7}. We observe good performance.

6 Simulations

Once the weight generation procedure is available, we can generate the sequence of graphs following the graph frequency model (see Algorithm 2).

Algorithm 2 Graph Frequency Generative Model
1:Input: Weights 𝐰=(w1,,wN)\mathbf{w}=(w_{1},\dots,w_{N}); Time step TT.
2:Output: An N×NN\times N edge multiplicity matrix 𝐀\mathbf{A}.
3: Let NN be the number of nodes, i.e., the length of 𝐰\mathbf{w}.
4: Initialize 𝐀\mathbf{A} as an N×NN\times N matrix of zeros.
5:for i=1,,Ni=1,\dots,N do
6:  for j=i+1,,Nj=i+1,\dots,N do
7:   pijwiwjp_{ij}\leftarrow w_{i}w_{j} {Connection probability}
8:   AijBinomial(T,pij)A_{ij}\sim\text{Binomial}(T,p_{ij}) {Number of edges}
9:   AjiAijA_{ji}\leftarrow A_{ij} {Ensure symmetry for an undirected graph}
10:  end for
11:end for
12:return 𝐀\mathbf{A}

We compare our method to graph sequences generated by a three-parameter Beta process, as described in (Cai et al., 2016). We sample the weights of the three-parameter Beta process using a Poisson thinning construction based on its Lévy measure. Recall that the Beta process is a discrete random measure whose atom weights are generated by a Poisson point process with Lévy density

ν(dw)=γΓ(1+β)Γ(1α)Γ(β+α)w1α(1w)β+α1dw\nu(\mathrm{d}w)=\frac{\gamma\Gamma(1+\beta)}{\Gamma(1-\alpha)\Gamma(\beta+\alpha)}w^{-1-\alpha}(1-w)^{\beta+\alpha-1}\mathrm{d}w (13)

where α(0,1)\alpha\in(0,1) and β,γ>0\beta,\gamma>0. We simulate the associated Poisson process by first drawing candidate jumps from a dominating stable Lévy density proportional to w1αw^{-1-\alpha}, whose inverse tail admits a closed-form expression. Each proposed jump is then accepted with probability (1w)β+α1(1-w)^{\beta+\alpha-1}, yielding an exact thinning scheme whenever β+α10\beta+\alpha-1\geq 0. This approach avoids the numerical instabilities of stick-breaking representations (Broderick et al., 2012) when α\alpha is close to one, while preserving the correct distribution of large weights.

The simulations are displayed in Figure 4, as expected, the RapidBeta process produces sparser graph sequences than the three Beta process. We also provide a comparison to the Barabási-Albert model (Barabási and Albert, 1999), which produces even sparser sequences (Nt(e)=Θ(Nt)N^{(e)}_{t}=\Theta(N_{t})) but lacks exchangeability.

Refer to caption
Refer to caption
Figure 4: (Right: number of edges (Nt(e)N_{t}^{(e)}) versus number of nodes (NtN_{t}) for the RapidBeta model (\bullet) was simulated with parameters α=1,τ=0.95,ξ=1.2\alpha=1,\tau=0.95,\xi=1.2, and η=15\eta=15. This is compared against the three-parameter Beta process model (\blacksquare) with parameters γ=4,β=4,α=0.7\gamma=4,\beta=4,\alpha=0.7, and the Barabási–Albert model (\blacktriangle). Simulations were run for tt ranging from 30 to 300. Each point represents the mean over ten independent graph sequence samples. Left: same plot with all data shifted so the first points coincide which eases the comparison of the slopes in log-log scale.

The code used to generate the edge-exchangeable graphs can be found at: https://github.com/ValentinKil/ExtremelySparseEdgesExchangeable.git.

7 Conclusion

In this work, we presented a graph generative model for edge-exchangeable networks. We first showed that the extremely sparse regime can be achieved through a sequence of edge-exchangeable graphs. We then proposed a practical generative model that enables the simulation of these sequences. Our experimental results demonstrate that, despite necessary computational approximations, the model effectively captures the extremely sparse regime.

While Li and Campbell (2021) provide rigorous quantitative bounds on the error for fixed-rank truncation of edge-exchangeable networks, equivalent theoretical guarantees for the threshold ε\varepsilon-truncation employed in our sampler are not yet available. Extending their error analysis to weight-based thresholds is a promising direction for future work.

Extreme sparsity for edge-exchangeable simple graphs has previously been studied by Janson (2018), who proposed two examples that reach exact linear dependence (Nt(e)=Θ(Nt)N^{(e)}_{t}=\Theta(N_{t})). However, the produced graphs exhibit some unwanted properties, such as bounded degrees or a structure composed mostly of stars. By relaxing the strict linearity requirement and accepting linear scaling up to a slowly varying function, we are able to generate a much richer, more realistic internal graph structure with the RapidBeta process; indeed, experiments show the emergence of a giant connected component. A direct application of Theorem 14 in Eriksson (2025) demonstrates that the RapidBeta graphs are almost surely not eventually forever connected, meaning there will always be some edges that do not join the giant component. To the best of our knowledge, there are no more precise results regarding the giant component of edge-exchangeable graphs, although Eriksson (2025) suggests some interesting directions.

Other properties of the generative model, such as the power-law behaviour of the degree distribution, the clustering coefficient, and the small-world property, remain to be explored. Additionally, developing inference procedures for RapidBeta edge-exchangeable graphs is a compelling avenue for future research.

Acknowledgements

I would like to thank François Caron for his insightful advice and thorough proofreading of this manuscript. I am funded by the Clarendon Scholarship.

References

  • D. J. Aldous (1981) Representations for partially exchangeable arrays of random variables. Journal of Multivariate Analysis 11 (4), pp. 581–598. External Links: ISSN 0047-259X, Document Cited by: §1.
  • A. Barabási and R. Albert (1999) Emergence of scaling in random networks. Science 286 (5439), pp. 509–512. Cited by: §1, §6.
  • N. H. Bingham, C. M. Goldie, and J. L. Teugels (1987) Regular variation. Vol. 27, Cambridge university press. Cited by: §G.4, §G.4, Proposition G.3, Proposition G.5, Theorem G.6, Lemma G.9, Appendix G.
  • B. Bollobás (2001) Random Graphs. Cambridge University Press. External Links: ISBN 978-0-521-79722-1 Cited by: §1.
  • C. Borgs, J. T. Chayes, H. Cohn, and N. Holden (2018) Sparse exchangeable graphs and their limits via graphon processes. JMLR 18 (210), pp. 1–71. Cited by: §1.
  • T. Broderick and D. Cai (2016) Edge-exchangeable graphs and sparsity. arXiv:1603.06898. External Links: 1603.06898 Cited by: §1, §2.1, §2.
  • T. Broderick, M. I. Jordan, and J. Pitman (2012) Beta Processes, Stick-Breaking and Power Laws. Bayesian Analysis 7 (2), pp. 439–476. External Links: ISSN 1936-0975, 1931-6690 Cited by: §6.
  • D. Cai, T. Campbell, and T. Broderick (2016) Edge-exchangeable graphs and sparsity. In Advances in Neural Information Processing Systems, Vol. 29. Cited by: §1, §1, §2.2, Definition 2.1, Proposition 2.2, §2, §3, §6.
  • T. Campbell, D. Cai, and T. Broderick (2018) Exchangeable trait allocations. Electronic Journal of Statistics 12 (2), pp. 2290–2322. External Links: ISSN 1935-7524, 1935-7524 Cited by: §2.1.
  • F. Caron and E. Fox (2017) Sparse graphs using exchangeable random measures. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 79, pp. 1295–1366. Cited by: §1, §2.2.
  • F. Caron, F. Panero, and J. Rousseau (2023) On sparsity, power-law, and clustering properties of graphex processes. Advances in Applied Probability 55 (4), pp. 1211–1253. External Links: ISSN 0001-8678, 1475-6064 Cited by: §1.
  • H. Crane and W. Dempsey (2015) A framework for statistical network modeling. arXiv:1509.08185. External Links: 1509.08185 Cited by: §1, §2.
  • H. Crane and W. Dempsey (2016) Edge exchangeable models for network data. arXiv:1603.04571. External Links: 1603.04571 Cited by: §1, §2.
  • E. Eriksson (2025) Edge Exchangeable Graphs: Connectedness, Gaussianity and Completeness. arXiv:2501.09511. External Links: 2501.09511, Document Cited by: §7.
  • W. Feller (1971) An introduction to probability theory and its applications. Vol. 2, John Wiley & Sons. Cited by: §G.4.
  • D. A. Freedman (1975) On Tail Probabilities for Martingales. The Annals of Probability 3 (1), pp. 100–118. External Links: ISSN 0091-1798, 2168-894X, Document Cited by: Theorem H.2.
  • A. Gnedin, B. Hansen, and J. Pitman (2007) Notes on the occupancy problem with infinitely many boxes: general asymptotics and power laws. Probab. Surv 4 (146-171), pp. 88. Cited by: Lemma G.8.
  • D. N. Hoover (1979) Relations on Probability Spaces and Arrays of Random Variables. Institute for Advanced Study, Princeton. External Links: NJ08540 Cited by: §1.
  • S. Janson (2018) On Edge Exchangeable Random Graphs. Journal of Statistical Physics 173 (3), pp. 448–484. External Links: Document, ISSN 1572-9613 Cited by: Appendix C, §3, §7.
  • O. Kallenberg (2005) Probabilistic Symmetries and Invariance Principles. Springer Science & Business Media. External Links: ISBN 978-0-387-28861-1 Cited by: §1.
  • V. Kilian, B. Guedj, and F. Caron (2025) Rapidly Varying Completely Random Measures for Modeling Extremely Sparse Networks. arXiv:2505.13206. Cited by: §1, §4.
  • J.F.C. Kingman (1967) Completely random measures. Pacific Journal of Mathematics 21 (1), pp. 59–78. Cited by: §2.2.
  • J.F.C. Kingman (1993) Poisson processes. Vol. 3, Oxford University Press, USA. Cited by: §F.1.1, §2.2.
  • X. Li and T. Campbell (2021) Truncated simulation and inference in edge-exchangeable networks. Electronic Journal of Statistics 15 (2), pp. 5117–5157. External Links: ISSN 1935-7524, 1935-7524 Cited by: §7.
  • A. Lijoi and I. Prünster (2010) Models beyond the Dirichlet process. In Bayesian Nonparametrics, N. L. Hjort, C. Holmes, P. Müller, and S. G. Walker (Eds.), Cited by: §2.2.
  • X. Miscouridou, F. Panero, and A. Laos (2026) Dynamic sparse graphs with overlapping communities. arXiv:2512.10717. Cited by: §1.
  • M. Mitzenmacher and E. Upfal (2005) Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press. External Links: ISBN 978-0-521-83540-4 Cited by: Appendix H.
  • C. Naik, F. Caron, J. Rousseau, Y. W. Teh, and K. Palla (2022) Bayesian Nonparametrics for Sparse Dynamic Networks. arXiv. External Links: 1607.01624, Document Cited by: §1.
  • C. Naik, F. Caron, and J. Rousseau (2021) Sparse networks with core-periphery structure. Electronic Journal of Statistics 15 (1), pp. 1814–1868. External Links: ISSN 1935-7524, 1935-7524 Cited by: §1.
  • P. Orbanz and D. M. Roy (2015) Bayesian Models of Graphs, Arrays and Other Exchangeable Random Structures. IEEE Transactions on Pattern Analysis and Machine Intelligence 37 (2), pp. 437–461. External Links: ISSN 0162-8828, 2160-9292, Document Cited by: §1.
  • A. Todeschini, X. Miscouridou, and F. Caron (2020) Exchangeable random measures for sparse and modular graphs with overlapping communities. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82 (2), pp. 487–520. External Links: Document, ISSN 1467-9868 Cited by: §1.
  • J. Tropp (2011) Freedman’s inequality for matrix martingales. Electronic Communications in Probability 16 (none). External Links: ISSN 1083-589X, Document Cited by: Theorem H.2.
  • V. Veitch and D. M. Roy (2015) The Class of Random Graphs Arising from Exchangeable Random Measures. arXiv:1512.03099. Cited by: §1.

Appendix A Asymptotic notation

Throughout this article, we write:

  • f(x)=xaO(g(x))f(x)\underset{x\to a}{=}O(g(x)) if there exists C>0C>0 such that, for xx in a neighbourhood of aa, |f(x)|C|g(x)||f(x)|\leq C|g(x)|;

  • f(x)=xaΩ(g(x))f(x)\underset{x\to a}{=}\Omega(g(x)) if there exists C>0C>0 such that, for xx in a neighbourhood of aa, |f(x)|C|g(x)||f(x)|\geq C|g(x)|;

  • f(x)=xaΘ(g(x))f(x)\underset{x\to a}{=}\Theta(g(x)) if both f(x)=xaO(g(x))f(x)\underset{x\to a}{=}O(g(x)) and f(x)=xaΩ(g(x))f(x)\underset{x\to a}{=}\Omega(g(x)) hold;

  • f(x)xag(x)f(x)\underset{x\to a}{\sim}g(x) if limxaf(x)/g(x)=1\lim_{x\to a}f(x)/g(x)=1.

where a{,+}a\in\mathbb{R}\cup\{-\infty,+\infty\}. When f(x)f(x) and g(x)g(x) are random variables, the notation indicates that the relation holds almost surely. This usage of \sim should not be confused with the standard distributional notation X𝒟X\sim\mathcal{D}, which indicates that the random variable XX has distribution 𝒟\mathcal{D}.

Appendix B Main lemma

Lemma B.1.

Let (𝒢t)t0(\mathcal{G}_{t})_{t\geq 0} be the edge-exchangeable multigraph sequence constructed from the weights 𝒲={wi}i1\mathcal{W}=\{w_{i}\}_{i\geq 1}, and let (Nt)t0(N_{t})_{t\geq 0} and (Nt(e))t0(N^{(e)}_{t})_{t\geq 0} denote its associated vertex and edge counts. For t1t\geq 1, define the quenched means

μ𝒲(t)\displaystyle\mu_{\mathcal{W}}(t) :=𝔼(Nt𝒲)=i[1ji(1wiwj)t],\displaystyle:=\mathbb{E}\left(N_{t}\mid\mathcal{W}\right)=\sum_{i}\left[1-\prod_{j\neq i}(1-w_{i}w_{j})^{t}\right],
μ𝒲(e)(t)\displaystyle\mu^{(e)}_{\mathcal{W}}(t) :=𝔼(Nt(e)𝒲)=ti<jwiwj.\displaystyle:=\mathbb{E}\left(N_{t}^{(e)}\mid\mathcal{W}\right)=t\sum_{i<j}w_{i}w_{j}.

Then, for almost every realization of 𝒲\mathcal{W} such that, for every ϵ>0\epsilon>0,

t=1exp{ϵ2μ𝒲(t)24(2μ𝒲(e)(t)+ϵμ𝒲(t)/3)}\displaystyle\sum_{t=1}^{\infty}\exp\left\{-\frac{\epsilon^{2}\mu_{\mathcal{W}}(t)^{2}}{4(2\mu^{(e)}_{\mathcal{W}}(t)+\epsilon\mu_{\mathcal{W}}(t)/3)}\right\} <,\displaystyle<\infty, (14)

we have

Nta.s.𝔼(Nt𝒲),Nt(e)a.s.𝔼(Nt(e)𝒲).\displaystyle N_{t}\overset{\mathrm{a.s.}}{\sim}\mathbb{E}\left(N_{t}\mid\mathcal{W}\right),\qquad N_{t}^{(e)}\overset{\mathrm{a.s.}}{\sim}\mathbb{E}\left(N_{t}^{(e)}\mid\mathcal{W}\right).
Proof.

Fix a realization of 𝒲\mathcal{W} such that

iwi<.\displaystyle\sum_{i}w_{i}<\infty. (15)

This event has probability one by the assumptions on ν\nu. Throughout the proof, we work conditionally on this realization of 𝒲\mathcal{W}, so the numbers (wi)i1(w_{i})_{i\geq 1} are deterministic.

Write

pij:=wiwj,i<j.\displaystyle p_{ij}:=w_{i}w_{j},\qquad i<j. (16)

Since iwi<\sum_{i}w_{i}<\infty, we have

i<jpij12(iwi)2<.\displaystyle\sum_{i<j}p_{ij}\leq\frac{1}{2}\left(\sum_{i}w_{i}\right)^{2}<\infty. (17)

In particular, the conditional means below are finite for every fixed tt.

Conditionally on 𝒲\mathcal{W}, the edge multiplicities M{i,j}tM^{t}_{\{i,j\}}, i<ji<j, are independent and satisfy

M{i,j}tBinom(t,pij).\displaystyle M^{t}_{\{i,j\}}\sim\operatorname{Binom}(t,p_{ij}). (18)

Therefore

Nt(e)=12ijM{i,j}t=i<jM{i,j}t.\displaystyle N_{t}^{(e)}=\frac{1}{2}\sum_{i\neq j}M^{t}_{\{i,j\}}=\sum_{i<j}M^{t}_{\{i,j\}}. (19)

Hence

μ𝒲(e)(t)=𝔼(Nt(e)𝒲)=ti<jpij.\displaystyle\mu^{(e)}_{\mathcal{W}}(t)=\mathbb{E}\left(N_{t}^{(e)}\mid\mathcal{W}\right)=t\sum_{i<j}p_{ij}. (20)
Edge count.

For each i<ji<j, write

M{i,j}t=s=1tBij(s),\displaystyle M^{t}_{\{i,j\}}=\sum_{s=1}^{t}B^{(s)}_{ij}, (21)

where the variables Bij(s)B^{(s)}_{ij} are independent Bernoulli random variables with success probability pijp_{ij}. Thus, Nt(e)N_{t}^{(e)} is a countable sum of independent Bernoulli random variables with total mean μ𝒲(e)(t)\mu^{(e)}_{\mathcal{W}}(t). Applying Theorem H.1, we obtain, for every ϵ(0,1)\epsilon\in(0,1),

(|Nt(e)μ𝒲(e)(t)|>ϵμ𝒲(e)(t)|𝒲)\displaystyle\mathbb{P}\left(\left|N_{t}^{(e)}-\mu^{(e)}_{\mathcal{W}}(t)\right|>\epsilon\mu^{(e)}_{\mathcal{W}}(t)\,\middle|\,\mathcal{W}\right) 2exp{ϵ23μ𝒲(e)(t)}\displaystyle\leq 2\exp\left\{-\frac{\epsilon^{2}}{3}\mu^{(e)}_{\mathcal{W}}(t)\right\}
2exp{ϵ23ti<jpij}.\displaystyle\leq 2\exp\left\{-\frac{\epsilon^{2}}{3}t\sum_{i<j}p_{ij}\right\}.

Since i<jpij>0\sum_{i<j}p_{ij}>0 almost surely, the right-hand side is summable in tt. Hence, by the Borel–Cantelli lemma,

Nt(e)μ𝒲(e)(t)1\displaystyle\frac{N_{t}^{(e)}}{\mu^{(e)}_{\mathcal{W}}(t)}\longrightarrow 1 (22)

almost surely, conditionally on 𝒲\mathcal{W}.

Vertex count.

The vertex indicators are not conditionally independent. Indeed, the indicators that vertices ii and jj are active both depend on the edge variable M{i,j}tM^{t}_{\{i,j\}}. Therefore, one cannot prove the vertex part using the same Chernoff bound as for the edge count.

For i1i\geq 1, define

Ai,t:=𝟏{jiM{i,j}t>0}.\displaystyle A_{i,t}:=\mathbf{1}\left\{\sum_{j\neq i}M^{t}_{\{i,j\}}>0\right\}. (23)

Then

Nt=iAi,t,μ𝒲(t)=i𝔼(Ai,t𝒲)=i[1ji(1pij)t].\displaystyle N_{t}=\sum_{i}A_{i,t},\qquad\mu_{\mathcal{W}}(t)=\sum_{i}\mathbb{E}(A_{i,t}\mid\mathcal{W})=\sum_{i}\left[1-\prod_{j\neq i}(1-p_{ij})^{t}\right]. (24)

We shall use a martingale bounded-differences argument.

First, truncate the edge set to the finite collection

m:={(i,j):1i<jm}.\displaystyle\mathcal{I}_{m}:=\{(i,j):1\leq i<j\leq m\}. (25)

Let Nt(m)N_{t}(m) be the number of active vertices in the graph obtained by keeping only the edge indicators with (i,j)m(i,j)\in\mathcal{I}_{m}, and let

μ𝒲(t,m):=𝔼(Nt(m)𝒲).\displaystyle\mu_{\mathcal{W}}(t,m):=\mathbb{E}\left(N_{t}(m)\mid\mathcal{W}\right). (26)

The variables {M{i,j}t:(i,j)m}\{M^{t}_{\{i,j\}}:(i,j)\in\mathcal{I}_{m}\} are independent. Reveal them one at a time, in any deterministic order, and let (Zk)(Z_{k}) be the Doob martingale

Zk:=𝔼(Nt(m)|𝒲,Me1t,,Mekt),\displaystyle Z_{k}:=\mathbb{E}\left(N_{t}(m)\,\middle|\,\mathcal{W},M^{t}_{e_{1}},\ldots,M^{t}_{e_{k}}\right), (27)

where e1,,e|m|e_{1},\ldots,e_{|\mathcal{I}_{m}|} is the chosen ordering of m\mathcal{I}_{m}.

Changing a single edge multiplicity can change the number of active vertices by at most 22, because only the two endpoints of that edge can change their active/inactive status. Hence, the martingale increments satisfy

|ZkZk1|2.\displaystyle|Z_{k}-Z_{k-1}|\leq 2. (28)

Moreover, if ek=(i,j)e_{k}=(i,j), then the conditional variance of the kkth martingale increment is bounded by

𝔼[(ZkZk1)2Z0:k1]4tpij.\displaystyle\mathbb{E}[(Z_{k}-Z_{k-1})^{2}\mid Z_{0:k-1}]\leq 4tp_{ij}. (29)

Indeed, conditionally on the previously revealed variables, the only remaining randomness in this step is the binomial variable Mi,jtM^{t}_{{i,j}}. The functional Nt(m)N_{t}(m) depends on this variable only through whether Mi,jt=0M^{t}_{{i,j}}=0 or Mi,jt>0M^{t}_{{i,j}}>0. Thus the conditional expectation after revealing Mi,jtM^{t}_{{i,j}} can take two possible values, whose difference is at most 22, since changing the status of the edge i,j{i,j} can affect only the two endpoints ii and jj.

Therefore, the predictable quadratic variation of the martingale is bounded by

(i,j)m4tpij4μ𝒲(e)(t).\displaystyle\sum_{(i,j)\in\mathcal{I}_{m}}4tp_{ij}\leq 4\mu^{(e)}_{\mathcal{W}}(t). (30)

We can therefore apply Freedman’s inequality (Theorem H.2) to the martingale

Yk:=Zkμ𝒲(t,m),\displaystyle Y_{k}:=Z_{k}-\mu_{\mathcal{W}}(t,m), (31)

with R=2R=2 and σ2=4μ𝒲(e)(t)\sigma^{2}=4\mu^{(e)}_{\mathcal{W}}(t). For every s>0s>0,

(Nt(m)μ𝒲(t,m)s𝒲)exp{s22(4μ𝒲(e)(t)+2s/3)}.\displaystyle\mathbb{P}\!\left(N_{t}(m)-\mu_{\mathcal{W}}(t,m)\geq s\mid\mathcal{W}\right)\leq\exp\left\{-\frac{s^{2}}{2\left(4\mu^{(e)}_{\mathcal{W}}(t)+2s/3\right)}\right\}. (32)

Applying the same argument to Yk-Y_{k} yields the lower-tail bound

(Nt(m)μ𝒲(t,m)s𝒲)exp{s22(4μ𝒲(e)(t)+2s/3)}.\displaystyle\mathbb{P}\!\left(N_{t}(m)-\mu_{\mathcal{W}}(t,m)\leq-s\mid\mathcal{W}\right)\leq\exp\left\{-\frac{s^{2}}{2\left(4\mu^{(e)}_{\mathcal{W}}(t)+2s/3\right)}\right\}. (33)

Combining the two inequalities, we obtain

(|Nt(m)μ𝒲(t,m)|s|𝒲)2exp{s22(4μ𝒲(e)(t)+2s/3)}.\displaystyle\mathbb{P}\left(\left|N_{t}(m)-\mu_{\mathcal{W}}(t,m)\right|\geq s\,\middle|\,\mathcal{W}\right)\leq 2\exp\left\{-\frac{s^{2}}{2\left(4\mu^{(e)}_{\mathcal{W}}(t)+2s/3\right)}\right\}. (34)

Since

Nt(m)μ𝒲(t,m)Ntμ𝒲(t)almost surely,\displaystyle N_{t}(m)-\mu_{\mathcal{W}}(t,m)\to N_{t}-\mu_{\mathcal{W}}(t)\qquad\text{almost surely}, (35)

it follows that

{|Ntμ𝒲(t)|>s}lim infm{|Nt(m)μ𝒲(t,m)|>s}.\displaystyle\{|N_{t}-\mu_{\mathcal{W}}(t)|>s\}\subseteq\liminf_{m\to\infty}\{|N_{t}(m)-\mu_{\mathcal{W}}(t,m)|>s\}. (36)

Therefore, by Fatou’s lemma,

(|Ntμ𝒲(t)|>s|𝒲)lim infm(|Nt(m)μ𝒲(t,m)|>s|𝒲).\displaystyle\mathbb{P}\!\left(|N_{t}-\mu_{\mathcal{W}}(t)|>s\,\middle|\,\mathcal{W}\right)\leq\liminf_{m\to\infty}\mathbb{P}\!\left(|N_{t}(m)-\mu_{\mathcal{W}}(t,m)|>s\,\middle|\,\mathcal{W}\right). (37)

Applying the previous Freedman bound yields

(|Ntμ𝒲(t)|>s|𝒲)2exp{s22(4μ𝒲(e)(t)+2s/3)}.\displaystyle\mathbb{P}\!\left(|N_{t}-\mu_{\mathcal{W}}(t)|>s\,\middle|\,\mathcal{W}\right)\leq 2\exp\!\left\{-\frac{s^{2}}{2\bigl(4\mu^{(e)}_{\mathcal{W}}(t)+2s/3\bigr)}\right\}. (38)

Taking s=ϵμ𝒲(t)s=\epsilon\mu_{\mathcal{W}}(t), we obtain

(|Ntμ𝒲(t)|>ϵμ𝒲(t)|𝒲)2exp{ϵ2μ𝒲(t)22(4μ𝒲(e)(t)+2ϵμ𝒲(t)/3)}.\displaystyle\mathbb{P}\left(\left|N_{t}-\mu_{\mathcal{W}}(t)\right|>\epsilon\mu_{\mathcal{W}}(t)\,\middle|\,\mathcal{W}\right)\leq 2\exp\left\{-\frac{\epsilon^{2}\mu_{\mathcal{W}}(t)^{2}}{2\left(4\mu^{(e)}_{\mathcal{W}}(t)+2\epsilon\mu_{\mathcal{W}}(t)/3\right)}\right\}. (39)

By the summability assumption (14) and the Borel–Cantelli lemma,

Ntμ𝒲(t)1\displaystyle\frac{N_{t}}{\mu_{\mathcal{W}}(t)}\longrightarrow 1 (40)

almost surely, conditionally on 𝒲\mathcal{W}. ∎

Appendix C Proof of Theorem 3.1

Conditionally on the weights 𝒲\mathcal{W}, our model is closely related to the rank-1 edge-exchangeable multigraph defined by (Janson, 2018). As a result, the proof strategy is formally parallel to that of (Janson, 2018): edge variables are independent, while vertex indicators are dependent only through shared incident edges.

We prove the following sharper result.

Theorem C.1.

Assume that, as w0w\downarrow 0, we have ν(w)cw2(w1),\nu(w)\sim c\,w^{-2}\ell(w^{-1}), where c>0c>0, \ell is slowly varying, and

1(x):=xu1(u)du<\displaystyle\ell_{1}(x):=\int_{x}^{\infty}u^{-1}\ell(u)\,\mathrm{d}u<\infty (41)

for all sufficiently large xx. Let

S:=iwi,L:=i<jwiwj.\displaystyle S:=\sum_{i}w_{i},\qquad L:=\sum_{i<j}w_{i}w_{j}. (42)

Then 0<S<0<S<\infty and 0<L<0<L<\infty a.s., and

Nt(e)\displaystyle N_{t}^{(e)} tLa.s.\displaystyle\sim tL\qquad\text{a.s.} (43)
Nt\displaystyle N_{t} cSt1(t)a.s.\displaystyle\sim cS\,t\ell_{1}(t)\qquad\text{a.s.} (44)

Consequently, if ϕ(x):=x1(x)\phi(x):=x\ell_{1}(x), then

Nt(e)=Θ(ϕ1(Nt))a.s.N_{t}^{(e)}=\Theta\!\left(\phi^{-1}(N_{t})\right)\qquad\text{a.s.} (45)

If moreover 1(x1(x))1(x),\ell_{1}(x\ell_{1}(x))\sim\ell_{1}(x), then

Nt(e)=Θ(Nt1(Nt))a.s.N_{t}^{(e)}=\Theta\!\left(\frac{N_{t}}{\ell_{1}(N_{t})}\right)\qquad\text{a.s.} (46)

In particular, if (x)=(logx)a\ell(x)=(\log x)^{a} with a<1a<-1, then

Nt(e)=Θ(Nt(logNt)a1)a.s.N_{t}^{(e)}=\Theta\!\left(N_{t}(\log N_{t})^{-a-1}\right)\qquad\text{a.s.} (47)
Proof.

We write 𝒲={wi}i1\mathcal{W}=\{w_{i}\}_{i\geq 1} for the Poisson process of weights. Since 01wν(dw)<,\int_{0}^{1}w\,\nu(\mathrm{d}w)<\infty, we have almost surely, 0<S:=iwi<0<S:=\sum_{i}w_{i}<\infty. Moreover,

0<L:=i<jwiwj=12(S2iwi2)<a.s.\displaystyle 0<L:=\sum_{i<j}w_{i}w_{j}=\frac{1}{2}\left(S^{2}-\sum_{i}w_{i}^{2}\right)<\infty\qquad\text{a.s.} (48)
Preliminary.

We first evaluate the asymptotic behaviour of the realised Poisson process.

Tλ:=i(1eλwi),λ>0.\displaystyle T_{\lambda}:=\sum_{i}\left(1-e^{-\lambda w_{i}}\right),\qquad\lambda>0. (49)

Let us show that

Tλcλ1(λ)a.s.T_{\lambda}\sim c\lambda\ell_{1}(\lambda)\qquad\text{a.s.} (50)

Let m(λ):=𝔼[Tλ]m(\lambda):=\mathbb{E}[T_{\lambda}]. By Campbell’s theorem,

m(λ)=0(1eλw)ν(dw).\displaystyle m(\lambda)=\int_{0}^{\infty}\left(1-e^{-\lambda w}\right)\nu(\mathrm{d}w). (51)

For any ε>0\varepsilon>0, the integral over [ε,)[\varepsilon,\infty) is bounded by ν([ε,))<\nu([\varepsilon,\infty))<\infty. Since λ1(λ)\lambda\ell_{1}(\lambda)\to\infty as λ\lambda\to\infty, this bounded term is o(λ1(λ))o(\lambda\ell_{1}(\lambda)). Thus, it is sufficient to analyse the behaviour near 0.

By hypothesis, for any δ>0\delta>0, there exists ε>0\varepsilon>0 such that for all w(0,ε]w\in(0,\varepsilon]:

(1δ)cw2(w1)ν(w)(1+δ)cw2(w1).\displaystyle(1-\delta)cw^{-2}\ell(w^{-1})\leq\nu(w)\leq(1+\delta)cw^{-2}\ell(w^{-1}). (52)

Let I(λ)=0ε(1eλw)w2(w1)dwI(\lambda)=\int_{0}^{\varepsilon}\left(1-e^{-\lambda w}\right)w^{-2}\ell(w^{-1})\,\mathrm{d}w. With the change of variables u=w1u=w^{-1}, we obtain

I(λ)=1/ε(1eλ/u)(u)du.\displaystyle I(\lambda)=\int_{1/\varepsilon}^{\infty}\left(1-e^{-\lambda/u}\right)\ell(u)\,\mathrm{d}u. (53)

We partition this integral at u=λu=\lambda, assuming that λ\lambda is sufficiently large so that λ>1/ε\lambda>1/\varepsilon.

For the first part, u[1/ε,λ]u\in[1/\varepsilon,\lambda], we use the bound 1eλ/u11-e^{-\lambda/u}\leq 1 to obtain

1/ελ(1eλ/u)(u)du1/ελ(u)du.\displaystyle\int_{1/\varepsilon}^{\lambda}\left(1-e^{-\lambda/u}\right)\ell(u)\,\mathrm{d}u\leq\int_{1/\varepsilon}^{\lambda}\ell(u)\,\mathrm{d}u. (54)

Let us show that this upper bound is o(λ1(λ))o(\lambda\ell_{1}(\lambda)).First by Proposition G.3, 1/ελ(u)duλ(λ).\int_{1/\varepsilon}^{\lambda}\ell(u)\,\mathrm{d}u\sim\lambda\ell(\lambda). We must now establish that limλ(λ)/1(λ)=0\lim_{\lambda\to\infty}\ell(\lambda)/\ell_{1}(\lambda)=0. Recall that 1(λ)=λu1(u)du\ell_{1}(\lambda)=\int_{\lambda}^{\infty}u^{-1}\ell(u)\,\mathrm{d}u, and by assumption, this integral is finite. According to Proposition G.5, since \ell is slowly varying and its tail integral 1\ell_{1} converges, 1\ell_{1} is also slowly varying and satisfies:

limλ1(λ)(λ)=limλ1(λ)λ(u)udu=.\displaystyle\lim_{\lambda\to\infty}\frac{\ell_{1}(\lambda)}{\ell(\lambda)}=\lim_{\lambda\to\infty}\frac{1}{\ell(\lambda)}\int_{\lambda}^{\infty}\frac{\ell(u)}{u}\,\mathrm{d}u=\infty. (55)

By inverting this relationship, we conclude that the first integral is bounded by o(λ1(λ))o(\lambda\ell_{1}(\lambda)).

For the second part, u>λu>\lambda, we apply the inequalities xx2/21exxx-x^{2}/2\leq 1-e^{-x}\leq x for x0x\geq 0. Setting x=λ/ux=\lambda/u, we get:

λ(λuλ22u2)(u)duλ(1eλ/u)(u)duλλu(u)du.\displaystyle\int_{\lambda}^{\infty}\left(\frac{\lambda}{u}-\frac{\lambda^{2}}{2u^{2}}\right)\ell(u)\,\mathrm{d}u\leq\int_{\lambda}^{\infty}\left(1-e^{-\lambda/u}\right)\ell(u)\,\mathrm{d}u\leq\int_{\lambda}^{\infty}\frac{\lambda}{u}\ell(u)\,\mathrm{d}u. (56)

The upper bound evaluates exactly to λ1(λ)\lambda\ell_{1}(\lambda). For the lower bound, Proposition G.3, yields λu2(u)duλ1(λ)\int_{\lambda}^{\infty}u^{-2}\ell(u)\,\mathrm{d}u\sim\lambda^{-1}\ell(\lambda). Therefore, the subtractive error term scales as λ22O(λ1(λ))=O(λ(λ))\frac{\lambda^{2}}{2}O(\lambda^{-1}\ell(\lambda))=O(\lambda\ell(\lambda)), which is o(λ1(λ))o(\lambda\ell_{1}(\lambda)).

Consequently,

I(λ)λ1(λ),\displaystyle I(\lambda)\sim\lambda\ell_{1}(\lambda), (57)

and it follows that

m(λ)cλ1(λ).\displaystyle m(\lambda)\sim c\lambda\ell_{1}(\lambda). (58)

To establish Tλm(λ)T_{\lambda}\sim m(\lambda), we derive a specific concentration bound for TλT_{\lambda}. Since TλT_{\lambda} is a functional of the Poisson point process 𝒲\mathcal{W}, its moment generating function is given by Campbell’s theorem: for any uu\in\mathbb{R},

𝔼[exp(uTλ)]=exp(0(eu(1eλw)1)ν(dw)).\displaystyle\mathbb{E}[\exp(uT_{\lambda})]=\exp\left(\int_{0}^{\infty}\left(e^{u(1-e^{-\lambda w})}-1\right)\nu(\mathrm{d}w)\right). (59)

For any w0w\geq 0, the term x=1eλwx=1-e^{-\lambda w} lies in the interval [0,1][0,1]. By the convexity of the function xeuxx\mapsto e^{ux}, we have the bound eux1x(eu1)e^{ux}-1\leq x(e^{u}-1). Applying this inequality to the integrand yields:

𝔼[exp(uTλ)]exp((eu1)0(1eλw)ν(dw))=exp(m(λ)(eu1)).\displaystyle\mathbb{E}[\exp(uT_{\lambda})]\leq\exp\left((e^{u}-1)\int_{0}^{\infty}(1-e^{-\lambda w})\nu(\mathrm{d}w)\right)=\exp\left(m(\lambda)(e^{u}-1)\right). (60)

So the moment generating function of TλT_{\lambda} is bounded above by that of a Poisson random variable with parameter m(λ)m(\lambda). Consequently, Chernoff’s inequality yields, for any η(0,1)\eta\in(0,1),

(|Tλm(λ)|>ηm(λ))2exp(η2m(λ)3).\displaystyle\mathbb{P}\!\left(|T_{\lambda}-m(\lambda)|>\eta m(\lambda)\right)\leq 2\exp\!\left(-\frac{\eta^{2}m(\lambda)}{3}\right). (61)

Let us define a geometric grid λk=(1+γ)k\lambda_{k}=(1+\gamma)^{k} for a fixed γ>0\gamma>0. Since m(λ)m(\lambda) is regularly varying with index 1, it grows asymptotically faster than x1δx^{1-\delta} for any δ(0,1)\delta\in(0,1). Thus, m(λk)m(\lambda_{k}) grows exponentially with kk, meaning the sequence of probabilities (|Tλkm(λk)|>ηm(λk))\mathbb{P}\left(|T_{\lambda_{k}}-m(\lambda_{k})|>\eta m(\lambda_{k})\right) decays exponentially and is therefore summable. By the Borel-Cantelli lemma, Tλkm(λk)T_{\lambda_{k}}\sim m(\lambda_{k}) almost surely as kk\to\infty.

To extend this convergence to the continuous parameter λ\lambda, we rely on the monotonicity of TλT_{\lambda}. For any λ>0\lambda>0, there exists an index kk such that λkλλk+1\lambda_{k}\leq\lambda\leq\lambda_{k+1}. As TλT_{\lambda} is nondecreasing and mm is strictly increasing in λ\lambda, we have

Tλkm(λk+1)Tλm(λ)Tλk+1m(λk).\displaystyle\frac{T_{\lambda_{k}}}{m(\lambda_{k+1})}\leq\frac{T_{\lambda}}{m(\lambda)}\leq\frac{T_{\lambda_{k+1}}}{m(\lambda_{k})}. (62)

Taking the limit as λ\lambda\to\infty (and therefore kk\to\infty), and utilizing the regular variation of mm which guarantees limkm(λk+1)/m(λk)=1+γ\lim_{k\to\infty}m(\lambda_{k+1})/m(\lambda_{k})=1+\gamma, we obtain almost surely:

11+γlim infλTλm(λ)lim supλTλm(λ)1+γ.\displaystyle\frac{1}{1+\gamma}\leq\liminf_{\lambda\to\infty}\frac{T_{\lambda}}{m(\lambda)}\leq\limsup_{\lambda\to\infty}\frac{T_{\lambda}}{m(\lambda)}\leq 1+\gamma. (63)

Since this holds for any arbitrarily small γ>0\gamma>0, taking the intersection of these almost sure events over a countable sequence γ0\gamma\downarrow 0 proves that Tλm(λ)T_{\lambda}\sim m(\lambda) a.s. This establishes (50).

Edge count.

By the first part of Lemma B.1, we have established that, almost surely,

Nt(e)𝔼(Nt(e)𝒲)=tL.\displaystyle N_{t}^{(e)}\sim\mathbb{E}(N_{t}^{(e)}\mid\mathcal{W})=tL. (64)

This is exactly Equation 43.

Vertex count.

We now study the number of active vertices. Conditionally on 𝒲\mathcal{W}, the probability that vertex ii is active after tt interactions is

pi(t)=1ji(1wiwj)t.\displaystyle p_{i}^{(t)}=1-\prod_{j\neq i}(1-w_{i}w_{j})^{t}. (65)

Hence μ𝒲(t):=𝔼[Nt𝒲]=ipi(t).\mu_{\mathcal{W}}(t):=\mathbb{E}[N_{t}\mid\mathcal{W}]=\sum_{i}p_{i}^{(t)}. We shall show that

μ𝒲(t)cSt1(t)a.s.\mu_{\mathcal{W}}(t)\sim cS\,t\ell_{1}(t)\qquad\text{a.s.} (66)

Fix ε>0\varepsilon>0. The number of atoms with wi>εw_{i}>\varepsilon is finite almost surely, and their total contribution to μ𝒲(t)\mu_{\mathcal{W}}(t) is O(1)O(1), which is negligible compared to t1(t)t\ell_{1}(t). Thus, it is enough to consider atoms with wiεw_{i}\leq\varepsilon. For such atoms, write

Ri:=jilog(1wiwj).\displaystyle R_{i}:=-\sum_{j\neq i}\log(1-w_{i}w_{j}). (67)

Then pi(t)=1etRi.p_{i}^{(t)}=1-e^{-tR_{i}}. To bound RiR_{i}, we use the inequalities xlog(1x)x1xx\leq-\log(1-x)\leq\frac{x}{1-x}, valid for x[0,1/2)x\in[0,1/2). Applying the lower bound with x=wiwjx=w_{i}w_{j} yields:

Rijiwiwj=wijiwj=wi(Swi).\displaystyle R_{i}\geq\sum_{j\neq i}w_{i}w_{j}=w_{i}\sum_{j\neq i}w_{j}=w_{i}(S-w_{i}). (68)

For the upper bound, recall that we are only considering atoms where wiεw_{i}\leq\varepsilon. Because the total sum of the weights is SS, every individual weight satisfies wjSw_{j}\leq S. This implies that for any pair, the product is bounded by wiwjεSw_{i}w_{j}\leq\varepsilon S. Since S<S<\infty almost surely, we can choose ε\varepsilon sufficiently small such that εS1/2\varepsilon S\leq 1/2. Applying the second inequality to the sum gives:

Rijiwiwj1wiwjjiwiwj1εS11εSjiwiwj=11εSwi(Swi).R_{i}\leq\sum_{j\neq i}\frac{w_{i}w_{j}}{1-w_{i}w_{j}}\leq\sum_{j\neq i}\frac{w_{i}w_{j}}{1-\varepsilon S}\leq\frac{1}{1-\varepsilon S}\sum_{j\neq i}w_{i}w_{j}=\frac{1}{1-\varepsilon S}w_{i}(S-w_{i}).

In total

wi(Swi)Ri11εSwi(Swi),\displaystyle w_{i}(S-w_{i})\leq R_{i}\leq\frac{1}{1-\varepsilon S}w_{i}(S-w_{i}), (69)

Since wiεw_{i}\leq\varepsilon, it follows that SwiSεS-w_{i}\geq S-\varepsilon. Thus,

Ri(Sε)wi.\displaystyle R_{i}\geq(S-\varepsilon)w_{i}. (70)

We also have SwiSS-w_{i}\leq S, and since ε0\varepsilon\downarrow 0,

11εS=1+oε(1).\displaystyle\frac{1}{1-\varepsilon S}=1+o_{\varepsilon}(1). (71)

Hence,

Ri(1+oε(1))Swi.\displaystyle R_{i}\leq(1+o_{\varepsilon}(1))Sw_{i}. (72)

As the function x1etxx\mapsto 1-e^{-tx} is strictly monotonically increasing for t>0t>0, we have

1et(Sε)wipi(t)1et(1+oε(1))Swi.\displaystyle 1-e^{-t(S-\varepsilon)w_{i}}\leq p_{i}^{(t)}\leq 1-e^{-t(1+o_{\varepsilon}(1))Sw_{i}}. (73)

Summing these inequalities over all atoms satisfying wiεw_{i}\leq\varepsilon, we obtain

wiε(1et(Sε)wi)wiεpi(t)wiε(1et(1+oε(1))Swi).\displaystyle\sum_{w_{i}\leq\varepsilon}\left(1-e^{-t(S-\varepsilon)w_{i}}\right)\leq\sum_{w_{i}\leq\varepsilon}p_{i}^{(t)}\leq\sum_{w_{i}\leq\varepsilon}\left(1-e^{-t(1+o_{\varepsilon}(1))Sw_{i}}\right). (74)

Recall that {wi>ε}\{w_{i}>\varepsilon\} is almost surely finite. Consequently,

wiεpi(t)\displaystyle\sum_{w_{i}\leq\varepsilon}p_{i}^{(t)} =μ𝒲(t)+O(1),\displaystyle=\mu_{\mathcal{W}}(t)+O(1),
wiε(1eλwi)\displaystyle\sum_{w_{i}\leq\varepsilon}(1-e^{-\lambda w_{i}}) =Tλ+O(1),\displaystyle=T_{\lambda}+O(1),

for every λ>0\lambda>0 as 1eλwi11-e^{-\lambda w_{i}}\leq 1. By substituting λ=t(Sε)\lambda=t(S-\varepsilon) into the left-hand side, λ=t(1+oε(1))S\lambda=t(1+o_{\varepsilon}(1))S into the right-hand side, and incorporating the O(1)O(1) terms, we obtain the final bounded inequalities:

Tt(Sε)+O(1)μ𝒲(t)Tt(1+oε(1))S+O(1).\displaystyle T_{t(S-\varepsilon)}+O(1)\leq\mu_{\mathcal{W}}(t)\leq T_{t(1+o_{\varepsilon}(1))S}+O(1). (75)

Using (50) and the slow variation of 1\ell_{1}, we obtain

lim inftμ𝒲(t)t1(t)c(Sε),\displaystyle\liminf_{t\to\infty}\frac{\mu_{\mathcal{W}}(t)}{t\ell_{1}(t)}\geq c(S-\varepsilon), (76)

and

lim suptμ𝒲(t)t1(t)c(1+oε(1))S.\displaystyle\limsup_{t\to\infty}\frac{\mu_{\mathcal{W}}(t)}{t\ell_{1}(t)}\leq c(1+o_{\varepsilon}(1))S. (77)

Letting ε0\varepsilon\downarrow 0 proves (66).

Now, we apply Lemma B.1 conditionally on 𝒲\mathcal{W}. Recall that the summability condition requires t=1exp{ϵ2μ(t)24(2μ(e)(t)+ϵ/3×μ(t))}<\sum_{t=1}^{\infty}\exp\left\{-\frac{\epsilon^{2}\mu(t)^{2}}{4(2\mu^{(e)}(t)+\epsilon/3\times\mu(t))}\right\}<\infty. We know that μ𝒲(t)cSt1(t)\mu_{\mathcal{W}}(t)\sim cS\,t\ell_{1}(t) and μ𝒲(e)(t)tL\mu_{\mathcal{W}}^{(e)}(t)\sim tL. Because 1(t)0\ell_{1}(t)\to 0, μ(t)\mu(t) grows strictly slower than μ(e)(t)\mu^{(e)}(t), meaning the denominator of the exponent is dominated by tLtL. Consequently, the exponent scales asymptotically as Θ(t1(t)2)-\Theta(t\ell_{1}(t)^{2}). To lower bound this growth, we apply Lemma G.9 to the slowly varying function 1\ell_{1}, which states that limnln(1(t))ln(t)=0\lim_{n\to\infty}\frac{\ln(\ell_{1}(t))}{\ln(t)}=0. This means that for any δ(0,1/2)\delta\in(0,1/2), and for all sufficiently large tt,

t1(t)2>t12δ.\displaystyle t\ell_{1}(t)^{2}>t^{1-2\delta}. (78)

Thus the summability condition holds. Therefore

Ntμ𝒲(t)cSt1(t)a.s.\displaystyle N_{t}\sim\mu_{\mathcal{W}}(t)\sim cS\,t\ell_{1}(t)\qquad\text{a.s.} (79)
Sparsity results.

We now prove Equation 45. Let ϕ(x):=x1(x).\phi(x):=x\ell_{1}(x). Since 1\ell_{1} is slowly varying, ϕ\phi is regularly varying with index 11. According to Theorem G.6, any regularly varying function with index α>0\alpha>0 possesses an asymptotic inverse that is regularly varying with index 1/α1/\alpha. Therefore, ϕ\phi has an asymptotic inverse ϕ1\phi^{-1} which is regularly varying with index 11, satisfying ϕ1(ϕ(x))x\phi^{-1}(\phi(x))\sim x as xx\to\infty. We have proven

NtcSϕ(t)a.s.\displaystyle N_{t}\sim cS\,\phi(t)\qquad\text{a.s.} (80)

Because ϕ1\phi^{-1} is regularly varying with index 11, it preserves asymptotic equivalence and satisfies ϕ1(ax)aϕ1(x)\phi^{-1}(ax)\sim a\phi^{-1}(x) for any constant a>0a>0. Applying ϕ1\phi^{-1} to both sides yields:

ϕ1(Nt)ϕ1(cSϕ(t))cSϕ1(ϕ(t))cSta.s.\displaystyle\phi^{-1}(N_{t})\sim\phi^{-1}(cS\phi(t))\sim cS\phi^{-1}(\phi(t))\sim cSt\qquad\text{a.s.} (81)

Since cc and SS are strictly positive and finite almost surely, we can rearrange this asymptotic equivalence to isolate nn:

t1cSϕ1(Nt)a.s.\displaystyle t\sim\frac{1}{cS}\phi^{-1}(N_{t})\qquad\text{a.s.} (82)

This implies the exact asymptotic bound:

t=Θ(ϕ1(Nt))a.s.\displaystyle t=\Theta\!\left(\phi^{-1}(N_{t})\right)\qquad\text{a.s.} (83)

Since Nt(e)tLN_{t}^{(e)}\sim tL, it follows that

Nt(e)=Θ(ϕ1(Nt))a.s.\displaystyle N_{t}^{(e)}=\Theta\!\left(\phi^{-1}(N_{t})\right)\qquad\text{a.s.} (84)

We now prove Equation 46: assume that 1(x1(x))1(x).\ell_{1}(x\ell_{1}(x))\sim\ell_{1}(x). Since NtcSt1(t),N_{t}\sim cS\,t\ell_{1}(t), slow variation gives

1(Nt)1(t1(t))1(t).\displaystyle\ell_{1}(N_{t})\sim\ell_{1}(t\ell_{1}(t))\sim\ell_{1}(t). (85)

Therefore

Nt1(Nt)cSt1(t)1(t)=cSt.\displaystyle\frac{N_{t}}{\ell_{1}(N_{t})}\sim\frac{cS\,t\ell_{1}(t)}{\ell_{1}(t)}=cS\,t. (86)

Since Nt(e)tLN_{t}^{(e)}\sim tL, we conclude that

Nt(e)=Θ(Nt1(Nt))a.s.\displaystyle N_{t}^{(e)}=\Theta\!\left(\frac{N_{t}}{\ell_{1}(N_{t})}\right)\qquad\text{a.s.} (87)

Finally, let us prove Equation 47. If

(x)=(logx)a,a<1,\displaystyle\ell(x)=(\log x)^{a},\qquad a<-1, (88)

then

1(x)=xu1(logu)adu=1a+1(logx)a+1.\displaystyle\ell_{1}(x)=\int_{x}^{\infty}u^{-1}(\log u)^{a}\,\mathrm{d}u=\frac{-1}{a+1}(\log x)^{a+1}. (89)

Moreover, 1(x1(x))1(x).\ell_{1}(x\ell_{1}(x))\sim\ell_{1}(x). Hence

Nt(e)=Θ(Nt(logNt)a+1)=Θ(Nt(logNt)a1),\displaystyle N_{t}^{(e)}=\Theta\!\left(\frac{N_{t}}{(\log N_{t})^{a+1}}\right)=\Theta\!\left(N_{t}(\log N_{t})^{-a-1}\right), (90)

Appendix D Proof of Proposition 4.1

When η=1\eta=1, consider

(ατ)xα+1ν(x)\displaystyle(\alpha-\tau)x^{\alpha+1}\nu(x) =(1x)ξ1ταsΓ(1s)xαsds\displaystyle=(1-x)^{\xi-1}\int_{\tau}^{\alpha}\frac{s}{\Gamma(1-s)}x^{\alpha-s}\mathrm{d}s
=(1x)ξ10αταuΓ(1+uα)euln(1/x)du\displaystyle=(1-x)^{\xi-1}\int_{0}^{\alpha-\tau}\frac{\alpha-u}{\Gamma(1+u-\alpha)}e^{-u\ln(1/x)}\mathrm{d}u

using the change of variables u=αsu=\alpha-s. Let g(z)=0αταuΓ(1+uα)euzdug(z)=\int_{0}^{\alpha-\tau}\frac{\alpha-u}{\Gamma(1+u-\alpha)}e^{-uz}\mathrm{d}u. We have, as u0u\to 0,

αuΓ(1+uα){uif α=1,αΓ(1α)if α<1.\displaystyle\frac{\alpha-u}{\Gamma(1+u-\alpha)}\sim\left\{\begin{array}[]{ll}u&\text{if }\alpha=1,\\ \frac{\alpha}{\Gamma(1-\alpha)}&\text{if }\alpha<1.\end{array}\right.

Using the Tauberian Theorem G.7, we obtain, as zz\to\infty

g(z){z2if α=1,z1αΓ(1α)if α<1.g(z)\sim\left\{\begin{array}[]{ll}z^{-2}&\text{if }\alpha=1,\\ \frac{z^{-1}\alpha}{\Gamma(1-\alpha)}&\text{if }\alpha<1.\end{array}\right.

It follows that, as x0x\to 0,

(ατ)xα+1ν(x)=g(ln1/x){ln2(1/x)if α=1,ln1(1/x)αΓ(1α)if α<1.(\alpha-\tau)x^{\alpha+1}\nu(x)=g(\ln 1/x)\sim\left\{\begin{array}[]{ll}\ln^{-2}(1/x)&\text{if }\alpha=1,\\ \frac{\ln^{-1}(1/x)\alpha}{\Gamma(1-\alpha)}&\text{if }\alpha<1.\end{array}\right.

The result when η>0\eta>0 follows immediately as η\eta is a multiplicative constant.

Appendix E Conditions

We now prove that the RapidBeta measure satisfies the assumption ν([0,1])=\nu([0,1])=\infty and 01wν(dw)<\int_{0}^{1}w\nu(\mathrm{d}w)<\infty.

Divergence of 01ν(w)dw\int_{0}^{1}\nu(w)\mathrm{d}w

The integral is:

01ν(w)dw=01(ηατταsΓ(1s)w1s(1w)ξ1𝑑s)dw\displaystyle\int_{0}^{1}\nu(w)\mathrm{d}w=\int_{0}^{1}\left(\frac{\eta}{\alpha-\tau}\int_{\tau}^{\alpha}\frac{s}{\Gamma(1-s)}w^{-1-s}(1-w)^{\xi-1}ds\right)\mathrm{d}w (91)

By Fubini’s theorem, we can swap the order of integration:

ηατταsΓ(1s)(01w1s(1w)ξ1dw)𝑑s\displaystyle\frac{\eta}{\alpha-\tau}\int_{\tau}^{\alpha}\frac{s}{\Gamma(1-s)}\left(\int_{0}^{1}w^{-1-s}(1-w)^{\xi-1}\mathrm{d}w\right)ds (92)

The inner integral over ww is the Beta function, B(x,y)=01tx1(1t)y1dtB(x,y)=\int_{0}^{1}t^{x-1}(1-t)^{y-1}\mathrm{d}t. We identify the parameters xx by setting x1=1sx-1=-1-s, which gives x=sx=-s.

The Beta integral converges only if its parameters are positive. This requires x=s>0x=-s>0, which implies s<0s<0.

However, the given constraint is 0τ<α10\leq\tau<\alpha\leq 1. For any s[τ,α]s\in[\tau,\alpha], we have s0s\geq 0. This violates the necessary condition s<0s<0. Because the inner integral

01w1s(1w)ξ1dw\displaystyle\int_{0}^{1}w^{-1-s}(1-w)^{\xi-1}\mathrm{d}w (93)

diverges for all s0s\geq 0 in the domain, the entire expression for 01ν(w)dw\int_{0}^{1}\nu(w)\mathrm{d}w diverges.

Convergence of 01wν(w)dw\int_{0}^{1}w\nu(w)\,\mathrm{d}w.

We want to show that 01wν(w)dw<.\int_{0}^{1}w\nu(w)\,\mathrm{d}w<\infty. By definition of ν\nu, we have

01wν(w)dw=01w(ηατταsΓ(1s)w1s(1w)ξ1ds)dw.\displaystyle\int_{0}^{1}w\nu(w)\,\mathrm{d}w=\int_{0}^{1}w\left(\frac{\eta}{\alpha-\tau}\int_{\tau}^{\alpha}\frac{s}{\Gamma(1-s)}w^{-1-s}(1-w)^{\xi-1}\,\mathrm{d}s\right)\,\mathrm{d}w. (94)

Combining the powers of ww gives

01wν(w)dw=ηατ01ταsΓ(1s)ws(1w)ξ1dsdw.\displaystyle\int_{0}^{1}w\nu(w)\,\mathrm{d}w=\frac{\eta}{\alpha-\tau}\int_{0}^{1}\int_{\tau}^{\alpha}\frac{s}{\Gamma(1-s)}w^{-s}(1-w)^{\xi-1}\,\mathrm{d}s\,\mathrm{d}w. (95)

By Tonelli’s theorem,

01wν(w)dw=ηατταsΓ(1s)(01ws(1w)ξ1dw)ds.\displaystyle\int_{0}^{1}w\nu(w)\,\mathrm{d}w=\frac{\eta}{\alpha-\tau}\int_{\tau}^{\alpha}\frac{s}{\Gamma(1-s)}\left(\int_{0}^{1}w^{-s}(1-w)^{\xi-1}\,\mathrm{d}w\right)\,\mathrm{d}s. (96)

For s<1s<1, the inner integral is the Beta integral with parameters 1s1-s and ξ\xi. Since 1s>01-s>0 and ξ>0\xi>0, it is finite and satisfies

01ws(1w)ξ1dw=B(1s,ξ)=Γ(1s)Γ(ξ)Γ(1s+ξ).\displaystyle\int_{0}^{1}w^{-s}(1-w)^{\xi-1}\,\mathrm{d}w=B(1-s,\xi)=\frac{\Gamma(1-s)\Gamma(\xi)}{\Gamma(1-s+\xi)}. (97)

Therefore, for s<1s<1,

sΓ(1s)01ws(1w)ξ1dw=sΓ(1s)Γ(1s)Γ(ξ)Γ(1s+ξ)=sΓ(ξ)Γ(ξ+1s).\displaystyle\frac{s}{\Gamma(1-s)}\int_{0}^{1}w^{-s}(1-w)^{\xi-1}\,\mathrm{d}w=\frac{s}{\Gamma(1-s)}\frac{\Gamma(1-s)\Gamma(\xi)}{\Gamma(1-s+\xi)}=\frac{s\Gamma(\xi)}{\Gamma(\xi+1-s)}. (98)

Thus,

01wν(w)dw=ηΓ(ξ)ατταsΓ(ξ+1s)ds.\displaystyle\int_{0}^{1}w\nu(w)\,\mathrm{d}w=\frac{\eta\Gamma(\xi)}{\alpha-\tau}\int_{\tau}^{\alpha}\frac{s}{\Gamma(\xi+1-s)}\,\mathrm{d}s. (99)

This identity is immediate when α<1\alpha<1. If α=1\alpha=1, the Beta integral itself is singular at the single point s=1s=1, but this point has Lebesgue measure zero. Moreover, ssΓ(ξ)Γ(ξ+1s)s\mapsto\frac{s\Gamma(\xi)}{\Gamma(\xi+1-s)} has a continuous extension to s=1s=1, since

lims1sΓ(ξ)Γ(ξ+1s)=Γ(ξ)Γ(ξ)=1.\displaystyle\lim_{s\uparrow 1}\frac{s\Gamma(\xi)}{\Gamma(\xi+1-s)}=\frac{\Gamma(\xi)}{\Gamma(\xi)}=1. (100)

Hence the endpoint s=1s=1 does not affect the value or finiteness of the integral. Finally, the function ssΓ(ξ+1s)s\mapsto\frac{s}{\Gamma(\xi+1-s)} is continuous on the closed interval [τ,α][\tau,\alpha]. Indeed, for all s[τ,α]s\in[\tau,\alpha],

ξ+1sξ>0,\displaystyle\xi+1-s\geq\xi>0, (101)

and the gamma function is finite, strictly positive, and continuous on (0,)(0,\infty). Consequently,

ταsΓ(ξ+1s)ds<.\displaystyle\int_{\tau}^{\alpha}\frac{s}{\Gamma(\xi+1-s)}\,\mathrm{d}s<\infty. (102)

Therefore,

01wν(w)dw=ηΓ(ξ)ατταsΓ(ξ+1s)ds<.\displaystyle\int_{0}^{1}w\nu(w)\,\mathrm{d}w=\frac{\eta\Gamma(\xi)}{\alpha-\tau}\int_{\tau}^{\alpha}\frac{s}{\Gamma(\xi+1-s)}\,\mathrm{d}s<\infty. (103)

Appendix F Sampling algorithm

F.1 Proofs

F.1.1 Proof of Theorem 5.1

On each region, AεA_{\varepsilon} and BB, the proposal intensity dominates the target intensity. By the thinning theorem for Poisson point processes (Kingman, 1993, Section 5.1), the accepted points on each region form independent Poisson point processes with an intensity equal to the target intensity λ(ds,dw)\lambda(\mathrm{d}s,\mathrm{d}w) restricted to that region.

Since AεA_{\varepsilon} and BB are disjoint, their superposition yields a Poisson point process on [τ,α]×[ε,1][\tau,\alpha]\times[\varepsilon,1] with intensity λ(ds,dw)\lambda(\mathrm{d}s,\mathrm{d}w). Finally, marginalising out the latent dimension ss (which corresponds to the projection property of Poisson processes) yields a Poisson point process on [ε,1][\varepsilon,1] with intensity:

ν(w)dw=(ταλ(s,w)ds)dw.\displaystyle\nu(w)\,\mathrm{d}w=\left(\int_{\tau}^{\alpha}\lambda(s,w)\,\mathrm{d}s\right)\mathrm{d}w. (104)

This establishes that 𝒲ε\mathcal{W}_{\varepsilon} is a realisation of the ε\varepsilon-truncated RapidBeta process.

F.1.2 Proof of Proposition 5.2

By Campbell’s theorem for Poisson point processes,

𝔼[Rε]=0εwν(w)dw.\displaystyle\mathbb{E}[R_{\varepsilon}]=\int_{0}^{\varepsilon}w\,\nu(w)\,\mathrm{d}w. (105)

Using Proposition 4.1 with α=1\alpha=1, there exist constants 0<c1<c2<0<c_{1}<c_{2}<\infty and ε0(0,1)\varepsilon_{0}\in(0,1) such that for all 0<w<ε00<w<\varepsilon_{0},

c1w2(ln(1/w))2ν(w)c2w2(ln(1/w))2.\displaystyle c_{1}\,w^{-2}\big(\ln(1/w)\big)^{-2}\leq\nu(w)\leq c_{2}\,w^{-2}\big(\ln(1/w)\big)^{-2}. (106)

Hence, for all sufficiently small ε\varepsilon,

c10ε1w(ln(1/w))2dw𝔼[Rε]c20ε1w(ln(1/w))2dw.\displaystyle c_{1}\int_{0}^{\varepsilon}\frac{1}{w\big(\ln(1/w)\big)^{2}}\,\mathrm{d}w\leq\mathbb{E}[R_{\varepsilon}]\leq c_{2}\int_{0}^{\varepsilon}\frac{1}{w\big(\ln(1/w)\big)^{2}}\,\mathrm{d}w. (107)

Now make the change of variables u=ln(1/w)u=\ln(1/w), so that du=dw/w\mathrm{d}u=-\mathrm{d}w/w. Then

0ε1w(ln(1/w))2dw=ln(1/ε)u2du=1ln(1/ε).\displaystyle\int_{0}^{\varepsilon}\frac{1}{w\big(\ln(1/w)\big)^{2}}\,\mathrm{d}w=\int_{\ln(1/\varepsilon)}^{\infty}u^{-2}\,\mathrm{d}u=\frac{1}{\ln(1/\varepsilon)}. (108)

Combining the upper and lower bounds yields

𝔼[Rε]=Θ(1ln(1/ε)),\displaystyle\mathbb{E}[R_{\varepsilon}]=\Theta\!\left(\frac{1}{\ln(1/\varepsilon)}\right), (109)

as claimed.

Appendix G Background on rapidly varying function

Most of the background material in this section originates from the book of Bingham et al. (1987).

G.1 Definitions

Definition G.1 (Slowly varying function).

A function :(0,)(0,)\ell:(0,\infty)\to(0,\infty) is slowly varying at infinity if for all c>0c>0,

(cx)(x)1 as x.\frac{\ell(cx)}{\ell(x)}\to 1\text{ as }x\to\infty.

Examples of slowly varying functions include lna\ln^{a}, for aa\in{\mathbb{R}}, and functions converging to a constant c>0c>0.

Definition G.2 (Regularly varying function).

A function :(0,)(0,)\ell:(0,\infty)\to(0,\infty) is regularly varying at infinity with exponent ρ\rho\in{\mathbb{R}} if f(x)=xρ(x)f(x)=x^{\rho}\ell(x) for some slowly varying function \ell. We note RρR_{\rho} the set of all functions regularly varying at infinity with exponent ρ\rho. A function ff is regularly varying at 0 if f(1/x)f(1/x) is regularly varying at infinity that is, f(x)=xρ(1/x)f(x)=x^{-\rho}\ell(1/x) for some ρ\rho\in{\mathbb{R}}.

G.2 Karamata theorems

The following propositions and corollaries relate to integrals of regularly varying functions.

Proposition G.3 (Karamata theorem).

See (Bingham et al., 1987, Propositions 1.5.8 and 1.5.10). Let U(t)=tρ(t)U(t)=t^{\rho}\ell(t) for some locally bounded slowly varying function \ell. Then

  • (i)

    If ρ>1\rho>-1

    0xU(t)𝑑t11+ρxρ+1(x) as x.\int_{0}^{x}U(t)dt\sim\frac{1}{1+\rho}x^{\rho+1}\ell(x)\text{ as }x\to\infty.
  • (ii)

    If ρ<1\rho<-1

    xU(t)𝑑t11+ρxρ+1(x) as x.\int_{x}^{\infty}U(t)dt\sim-\frac{1}{1+\rho}x^{\rho+1}\ell(x)\text{ as }x\to\infty.

The following corollaries will be useful.

Proposition G.4.

Let U(x)=xα(1/x)U(x)=x^{\alpha}\ell(1/x) for some locally bounded slowly varying function \ell.

  • (i)

    If α>1\alpha>-1

    0xU(t)dt1α+1x1+α(1/x) as x0.\int_{0}^{x}U(t)\mathrm{d}t\sim\frac{1}{\alpha+1}x^{1+\alpha}\ell(1/x)\text{ as }x\to 0.
  • (ii)

    If α<1\alpha<-1

    xU(t)dt1α+1x1+α(1/x) as x0.\int_{x}^{\infty}U(t)\mathrm{d}t\sim-\frac{1}{\alpha+1}x^{1+\alpha}\ell(1/x)\text{ as }x\to 0.

If \ell is slowly varying and α>1\alpha>-1 then 0xtα(1/t)𝑑t\int_{0}^{x}t^{\alpha}\ell(1/t)dt converges and

x1+α(1/x)0xtα(1/t)𝑑tα+1 as x0.\displaystyle\frac{x^{1+\alpha}\ell(1/x)}{\int_{0}^{x}t^{\alpha}\ell(1/t)dt}\rightarrow\alpha+1\text{ as }x\rightarrow 0. (110)
Proof.

Let ρ<1\rho<-1. We have xtρ(t)dt=01/xtρ2(1/t)dt\int_{x}^{\infty}t^{\rho}\ell(t)\mathrm{d}t=\int_{0}^{1/x}t^{-\rho-2}\ell(1/t)\mathrm{d}t. Writing α=ρ2>1\alpha=-\rho-2>-1, we obtain, using Proposition G.3(ii)

xα1(x)01/xtα(1/t)dtρ+1 as x,\displaystyle\frac{x^{-\alpha-1}\ell(x)}{\int_{0}^{1/x}t^{\alpha}\ell(1/t)\mathrm{d}t}\rightarrow\rho+1\text{ as }x\rightarrow\infty, (111)

or

x1+α(1/x)0xtα(1/t)𝑑tρ+1 as x0.\displaystyle\frac{x^{1+\alpha}\ell(1/x)}{\int_{0}^{x}t^{\alpha}\ell(1/t)dt}\rightarrow\rho+1\text{ as }x\rightarrow 0. (112)

The case α<1\alpha<-1 follows similarly, using Proposition G.3(i). ∎

Proposition G.5 (Proposition 1.5.9b, Bingham et al., 1987).

Let \ell be slowly varying and suppose x(t)dt/t<\int_{x}^{\infty}\ell(t)\mathrm{d}t/t<\infty. Then x(t)dt/t\int_{x}^{\infty}\ell(t)\mathrm{d}t/t is slowly varying and

1(x)x(t)dt/tas x.\displaystyle\frac{1}{\ell(x)}\int_{x}^{\infty}\ell(t)\mathrm{d}t/t\to\infty\quad\text{as }x\to\infty. (113)

G.3 Asymptotic inverse

Theorem G.6 (Theorem 1.5.12, Bingham et al., 1987).

Let fRαf\in R_{\alpha} with α>0\alpha>0. Then there exists gR1/αg\in R_{1/\alpha} with

f(g(x))g(f(x))xas x.\displaystyle f(g(x))\sim g(f(x))\sim x\quad\text{as }x\to\infty. (114)

Here gg is unique to within asymptotic equivalence, and one version of gg is the generalized inverse f1(x)=inf{y:f(y)>x}f^{-1}(x)=\inf\{y:f(y)>x\}.

G.4 Tauberian theorem

The following theorem is a variation of Bingham et al. (1987, Theorem 1.7.6 p.46), where the two limits at 0 and infinity are exchanged. The proof is similar. See also Feller (1971, Chapter XIII).

Theorem G.7 (Tauberian theorem).

Assume U(x)0U(x)\geq 0, c0c\geq 0, ρ>1\rho>-1, U^(s)=s0esxU(x)dx\widehat{U}(s)=s\int_{0}^{\infty}e^{-sx}U(x)\mathrm{d}x convergent for s>0s>0, and \ell a slowly varying function. Then

U(x)cxρ(1/x)/Γ(1+ρ) as x0\displaystyle U(x)\sim cx^{\rho}\ell(1/x)/\Gamma(1+\rho)\text{ as }x\rightarrow 0 (115)

implies

U^(s)csρ(s) as s.\displaystyle\widehat{U}(s)\sim cs^{-\rho}\ell(s)\text{ as }s\rightarrow\infty. (116)
Proof.

Write V(x)=0xU(y)dyV(x)=\int_{0}^{x}U(y)\mathrm{d}y (this is finite for any xx as U^(s)\widehat{U}(s) is convergent for any ss), then VV is non-decreasing and by Proposition G.4

V(x)cρ+1xρ+1(1/x)/Γ(1+ρ) as x0.\displaystyle V(x)\sim\frac{c}{\rho+1}x^{\rho+1}\ell(1/x)/\Gamma(1+\rho)\text{ as }x\rightarrow 0. (117)

Then by Bingham et al. (1987, Theorem 1.7.1, p.38), this is equivalent to

V^(s)=0esxdV(x)csρ1(s) as s.\displaystyle\widehat{V}(s)=\int_{0}^{\infty}e^{-sx}\mathrm{d}V(x)\sim cs^{-\rho-1}\ell(s)\text{ as }s\rightarrow\infty. (118)

Finally, note that V^(s)=U^(s)s\widehat{V}(s)=\frac{\widehat{U}(s)}{s}. Thus the above equation is equivalent to

U^(s)csρ(s) as s.\displaystyle\widehat{U}(s)\sim cs^{-\rho}\ell(s)\text{ as }s\rightarrow\infty. (119)

G.5 Other useful results on regular variation

Lemma G.8 (Lemma 14, Gnedin et al., 2007).

For \ell slowly varying, the relation

xν(du)x1(x1),x0\int_{x}^{\infty}\nu(du)\sim x^{-1}\ell(x^{-1}),~~x\rightarrow 0

implies

0xuν(du)1(x1),x0\int_{0}^{x}u\nu(du)\sim\ell_{1}(x^{-1}),~~x\rightarrow 0

with 1(x)=o((x))\ell_{1}(x)=o(\ell(x)) is another slowly varying function defined by 1(y)=yu1(u)𝑑u\ell_{1}(y)=\int_{y}^{\infty}u^{-1}\ell(u)du.

Lemma G.9 (Proposition 1.3.6, Bingham et al., 1987).

If \ell varies slowly then ln((x))ln(x)x0\frac{\ln(\ell(x))}{\ln(x)}\underset{x\rightarrow\infty}{\longrightarrow}0.

Appendix H More background results

Theorem H.1.

Let X1,X2,X_{1},X_{2},\ldots be a sequence of independent Bernoulli random variables with parameters p1,p2,p_{1},p_{2},\ldots such that μ=kpk<\mu=\sum_{k}p_{k}<\infty. Let

S=kXk.\displaystyle S=\sum_{k}X_{k}. (120)

Then, for every ϵ(0,1)\epsilon\in(0,1),

(|Sμ|>ϵμ)2exp(μϵ23).\displaystyle\mathbb{P}(|S-\mu|>\epsilon\mu)\leq 2\exp\left(-\mu\frac{\epsilon^{2}}{3}\right). (121)
Proof.

For n>0n>0, let

Sn=k=1nXkandμn=𝔼(Sn)=k=1npk.\displaystyle S_{n}=\sum_{k=1}^{n}X_{k}\qquad\text{and}\qquad\mu_{n}=\mathbb{E}(S_{n})=\sum_{k=1}^{n}p_{k}. (122)

The sequence (μn)(\mu_{n}) converges to μ\mu.

Since

k=1(Xk=1)=μ<,\displaystyle\sum_{k=1}^{\infty}\mathbb{P}(X_{k}=1)=\mu<\infty, (123)

the first Borel–Cantelli lemma implies that the events {Xk=1}\{X_{k}=1\} occur only finitely often almost surely. Consequently, SS is almost surely finite, and the partial sums SnS_{n} converge almost surely to SS.

Fix ϵ(0,1)\epsilon\in(0,1) and consider the events

An\displaystyle A_{n} ={|Snμn|ϵμ},\displaystyle=\{|S_{n}-\mu_{n}|\geq\epsilon\mu\},
A\displaystyle A ={|Sμ|ϵμ}.\displaystyle=\{|S-\mu|\geq\epsilon\mu\}.

For any sample path such that Sn(ω)S(ω)S_{n}(\omega)\to S(\omega), if |Sμ|ϵμ|S-\mu|\geq\epsilon\mu, then for all sufficiently large nn,

|Snμn|ϵμ\displaystyle|S_{n}-\mu_{n}|\geq\epsilon\mu (124)

also holds, since μnμ\mu_{n}\to\mu. Therefore, the indicator functions satisfy

𝟏Alim infn𝟏Analmost surely.\displaystyle\mathbf{1}_{A}\leq\liminf_{n\to\infty}\mathbf{1}_{A_{n}}\qquad\text{almost surely}. (125)

Taking expectations on both sides and applying Fatou’s lemma yields

(A)=𝔼[𝟏A]𝔼[lim infn𝟏An]lim infn𝔼[𝟏An]=lim infn(An).\displaystyle\mathbb{P}(A)=\mathbb{E}[\mathbf{1}_{A}]\leq\mathbb{E}\!\left[\liminf_{n\to\infty}\mathbf{1}_{A_{n}}\right]\leq\liminf_{n\to\infty}\mathbb{E}[\mathbf{1}_{A_{n}}]=\liminf_{n\to\infty}\mathbb{P}(A_{n}). (126)

By Corollary 4.6 of Mitzenmacher and Upfal (2005), for every n>0n>0,

(An)2exp(μnϵ2/3).\displaystyle\mathbb{P}(A_{n})\leq 2\exp(-\mu_{n}\epsilon^{2}/3). (127)

Consequently,

lim infn(An)lim infn2exp(μnϵ2/3)=limn2exp(μnϵ2/3)=2exp(μϵ2/3).\displaystyle\liminf_{n\to\infty}\mathbb{P}(A_{n})\leq\liminf_{n\to\infty}2\exp(-\mu_{n}\epsilon^{2}/3)=\lim_{n\to\infty}2\exp(-\mu_{n}\epsilon^{2}/3)=2\exp(-\mu\epsilon^{2}/3). (128)

Combining the previous inequalities completes the proof. ∎

Theorem H.2 (Tropp 2011, Theorem 1.1; Freedman 1975, Theorem 1.6).

Consider a real-valued martingale (Yk)k0(Y_{k})_{k\geq 0} with difference sequence (Xk)k0(X_{k})_{k\geq 0}, and assume that the difference sequence is uniformly bounded:

XkRalmost surely for all k0.\displaystyle X_{k}\leq R\qquad\text{almost surely for all }k\geq 0. (129)

Define the predictable quadratic variation process by

Wk:=j=1k𝔼[Xj2|X0:j1],k0.\displaystyle W_{k}:=\sum_{j=1}^{k}\mathbb{E}\!\left[X_{j}^{2}\,\middle|\,X_{0:j-1}\right],\qquad k\geq 0. (130)

Then, for all t0t\geq 0 and σ2>0\sigma^{2}>0,

{k0:Ykt and Wkσ2}exp(t2/2σ2+Rt/3).\displaystyle\mathbb{P}\left\{\exists k\geq 0:Y_{k}\geq t\text{ and }W_{k}\leq\sigma^{2}\right\}\leq\exp\left(-\frac{t^{2}/2}{\sigma^{2}+Rt/3}\right). (131)