A Generative Model for Extremely Sparse
Edge-Exchangeable Networks

Valentin Kilian

Abstract

We propose a graph generative model for sequences of extremely sparse, edge-exchangeable networks. Models for sparse graphs often face a trade-off between desirable properties like exchangeability and the ability to capture the sparsity observed in real-world networks. While models based on vertex or edge exchangeability have successfully generated sparse graphs, achieving the ”extremely sparse” regime, where the number of edges scales near-linearly with the number of nodes, has remained a challenge. Recently, a novel Completely Random Measure (CRM) was introduced, demonstrating that this rate could be achieved within the vertex-exchangeable framework of Caron and Fox. This paper extends this work by demonstrating that this new CRM can be integrated into the alternative edge-exchangeable framework to achieve extreme sparsity.

Machine Learning, ICML

1 Introduction

The proliferation of large-scale network data across diverse scientific domains, from microbiology and ecology to economics and the social sciences, has revealed a set of unifying structural properties. Among these, sparsity stands out as one of the most fundamental. For a simple graph with $N$ nodes, the number of possible edges scales quadratically, specifically as $\binom{N}{2}=\Theta(N^{2})$ . However, empirical networks rarely approach this theoretical maximum. We formalize this observation by defining a graph sequence as dense if its edge count scales quadratically with its node count, and sparse if the scaling is sub-quadratic. Real-world networks are overwhelmingly sparse. While they also display other important features like power-law degree distributions, the small-world phenomenon, and community structures, our focus here is squarely on the challenge of modeling sparsity.

Constructing a statistically coherent model for sparse networks is notoriously difficult, as sparsity is often incompatible with desirable modeling assumptions. For instance, nodes exchangeability—the property that a model’s distribution is invariant to the relabelling of nodes—is a desirable statistical property for a generative model. However, the Aldous-Hoover theorem (Aldous, 1981; Hoover, 1979) presents a major obstacle, demonstrating that any graph model satisfying this property must be either dense or empty (see also Orbanz and Roy, 2015).

To circumvent this limitation, several distinct modeling paradigms have been developed. Preferential attachment models (Barabási and Albert, 1999; Bollobás, 2001), for example, abandon exchangeability entirely to generate sparse graphs.

Other lines of research have focused on weakening the exchangeability assumption. One prominent approach, initiated by Caron and Fox (Caron and Fox, 2017), substitutes nodes exchangeability with the more flexible notion of Kallenberg exchangeability (Kallenberg, 2005; Veitch and Roy, 2015; Borgs et al., 2018). This framework has spurred a rich body of work modeling various network phenomena, such as overlapping communities (Todeschini et al., 2020; Miscouridou et al., 2026), clustering (Caron et al., 2023), dynamic evolution (Naik et al., 2022), and core-periphery structures (Naik et al., 2021). An alternative strategy involves replacing nodes exchangeability with edge exchangeability (Crane and Dempsey, 2015, 2016; Broderick and Cai, 2016; Cai et al., 2016). Models in this class also achieve sparsity. Both of these approaches employ Bayesian nonparametric tools like Completely Random Measures (CRMs). These models have been shown to generate graph sequences where the number of edges, $E$ , scales as $N^{1+\epsilon}$ for $0<\epsilon<1$ .

A recent breakthrough by Kilian et al. (2025) has redefined the achievable level of sparsity within the Caron-Fox framework. They introduced a novel CRM that produces graph sequences where $E=\Theta(N\ell(N))$ , with $\ell$ being a slowly varying function. This behavior is termed extremely sparse. This paper builds directly on that insight. Our central contribution is to adapt the methodology from Kilian et al. (2025) to achieve this same extreme sparsity within the edge-exchangeable model class.

The remainder of this paper is organized as follows. Section 2 provides a review of the edge-exchangeable network model. Section 3 and Section 4 contain our primary results: we first generalize the sparsity theorem from Cai et al. (2016) to encompass the extremely sparse regime, and then we introduce the Beta process with rapid variation, demonstrating its utility in constructing an extremely sparse, edge-exchangeable model. Finally, Section 6 presents simulation studies validating our theoretical findings. All deferred proofs and definitions of asymptotic notation are available in the appendix.

2 Edge-exchangeable graph sequences

The notion of an edge-exchangeable graph appeared around 2015, in the work of Crane and Dempsey (Crane and Dempsey, 2015, 2016) and of Broderick, Cai, and Campbell (Broderick and Cai, 2016; Cai et al., 2016), with some results presented in workshops prior to publication. In what follows, we are inspired by the presentation in Cai et al. (2016). This section provides an introduction to the topic; we refer the reader to the aforementioned references for more details.

2.1 Permutation Invariance to Edge Arrival Order

Let $(\mathcal{G}_{t})_{t}$ be a sequence of (multi)graphs, where each graph $\mathcal{G}_{t}=(\mathcal{V}_{t},\mathcal{E}_{t})$ consists of a finite set of vertices $\mathcal{V}_{t}$ and a finite multiset of timestamped edges $\mathcal{E}_{t}$ . We assume the sequence is projective (or growing) i.e., $\mathcal{V}_{t}\subseteq\mathcal{V}_{t+1}$ and $\mathcal{E}_{t}\subseteq\mathcal{E}_{t+1}$ .

Each (timestamped) edge $(e,s)\in\mathcal{E}_{t}$ is a tuple whose first entry is a set $e$ of two vertices in $\mathcal{V}_{t}$ and whose second entry is a timestamp $s$ corresponding to the step at which this edge first appeared. As in the Caron-Fox model, we are only interested in vertices involved in at least one edge. Consequently, we can define $\mathcal{V}_{t}$ as the set of all vertices that appear in the edges of $\mathcal{E}_{t}$ , in which case the graph $\mathcal{G}_{t}$ is completely defined by its edge set $\mathcal{E}_{t}$ .

Definition 2.1 (Cai et al., 2016 Definition 2.4).

Consider a random graph sequence $(\mathcal{G}_{t})_{t}$ defined as above with $\mathcal{E}_{t}=\{(e_{1},s_{1}),\dots,(e_{m},s_{m})\}$ . The sequence $(\mathcal{G}_{t})_{t}$ is (infinitely) edge-exchangeable if for every $t\in\mathbb{N}$ and for every permutation $\pi$ of the steps $\{1,\dots,t\}$ , $\mathcal{G}_{t}\overset{d}{=}\tilde{\mathcal{G}}_{t}$ , where $\tilde{\mathcal{G}}_{t}$ has the edge set $\tilde{\mathcal{E}}_{t}=\{(e_{1},\pi(s_{1})),\dots,(e_{m},\pi(s_{m}))\}$ .

In other word, the distribution of the generative model is invariant under any finite permutation of the order in which the edges arrived. This model (and its inherent symmetry) can be connected to partition, feature allocation and trait allocation; these connections are explained in Broderick and Cai (2016) and Campbell et al. (2018).

2.2 A Bayesian Nonparametric Model

We now present a graph generative model that exhibits edge-exchangeability, which we will later prove can generate extremely sparse graphs. This model, referred to as the graph frequency model in Cai et al. (2016), is similar to the Caron-Fox model from Caron and Fox (2017) but with several key differences. These differences explain why one model is edge-exchangeable while the other is Kallenberg-exchangeable. A comparison of these two models is provided in Cai et al. (2016), where the authors also prove that the two notions are distinct.

In the graph frequency model, we consider a countably infinite set of latent vertices, indexed by the positive integers $\mathbb{N}$ . Associated with these vertices is an infinite collection of edge labels $\{\theta_{\{i,j\}}\}$ with values in $[0,1]$ and edge probabilities $\{w_{\{i,j\}}\}$ with values in $[0,1]$ . For any given (potentially random) choice of these labels and probabilities, we define the measure $G$ on $[0,1]$ as:

G=\sum_{\{i,j\}:i,j\in\mathbb{N}}w_{\{i,j\}}\delta_{\theta_{\{i,j\}}}.

If either the labels $\{\theta_{\{i,j\}}\}$ or the probabilities $\{w_{\{i,j\}}\}$ (or both) are random, then $G$ is a discrete random measure on $[0,1]$ . Given $G$ , the graph sequence is constructed recursively. We initialize with $\mathcal{E}_{0}=\emptyset$ . Then, at each step $t$ , we form a new edge set, $F^{t}_{\text{new}}$ , by sampling the multiplicity $m^{t}_{\{i,j\}}$ for every possible edge $\{i,j\}$ according to:

m^{t}_{\{i,j\}}\sim\text{Bernoulli}(w_{\{i,j\}})

We then add $m^{t}_{\{i,j\}}$ copies of the edge $\{i,j\}$ with timestamp $t$ to $F^{t}_{\text{new}}$ . Finally, the edge set is updated as $\mathcal{E}_{t+1}=\mathcal{E}_{t}\cup F^{t}_{\text{new}}$ . This process generates multigraphs, potentially adding multiple edges at each time step.

Proposition 2.2 (Cai et al., 2016).

The sequence of multigraphs generated via the preceding method is edge-exchangeable.

Proof.

Conditional on $G$ , the formation of an edge at any step $t$ is an independent event with an identical probability distribution. The exchangeability of the time stamps follows directly. ∎

This model can be implemented with a random measure $G$ constructed from a Poisson point process. Let $\mathcal{W}$ be a Poisson process on $[0,1]$ with a non-atomic, $\sigma$ -finite rate measure $\nu$ that satisfies $\nu([0,1])=\infty$ and $\int_{0}^{1}w\,\nu(\mathrm{d}w)<\infty$ ¹¹1These two conditions on $\nu$ ensure that $\mathcal{W}$ is a countably infinite collection of weights in $[0,1]$ and that their sum $\sum_{w\in\mathcal{W}}w<\infty$ a.s.. We can then use $\mathcal{W}$ to define the set of edge probabilities as $w_{\{i,j\}}=w_{i}w_{j}$ for $i\neq j$ , and $w_{\{i,i\}}=0$ (to prevent self-loops), where each $w_{i}\in\mathcal{W}$ . The edge labels $\theta_{\{i,j\}}$ can be sampled independently and uniformly from $[0,1]$ . With this setup, $G$ becomes a homogeneous CRM on $[0,1]$ with no deterministic or fixed atomic components (Kingman, 1967, 1993; Lijoi and Prünster, 2010). The choice of the CRM $G$ , and therefore the choice of the measure $\nu$ , strongly influences the properties of the resulting edge-exchangeable graph sequence.

While this model generates multigraphs, it can be readily adapted to produce simple graphs $\overline{\mathcal{G}}_{t}=(\overline{\mathcal{V}}_{t},\overline{\mathcal{E}}_{t})$ . This is done by setting $\overline{\mathcal{V}}_{t}=\mathcal{V}_{t}$ and defining $\overline{\mathcal{E}}_{t}$ as the set of unique edges in $\mathcal{E}_{t}$ (i.e., those with a multiplicity of at least one). Although the resulting simple graphs do not inherit the edge-exchangeability property, they retain many characteristics of the original multigraphs, most notably their sparsity. In what follows, we consider only multigraphs.

Refer to caption — Figure 1: Edge-exchangeable graph generated from a RapidBeta process with $\alpha=1,\tau=0.95,\xi=1.2$ , and $\eta=15$ .

3 Extreme Sparsity

We denote the number of vertices and edges in the multigraph, respectively, as:

	$\displaystyle N_{t}$	$\displaystyle=\|\mathcal{V}_{t}\|=\sum_{i}\mathbf{1}_{\sum_{j\neq i}M^{t}_{\{i,j\}}>0},$
	$\displaystyle N^{(e)}_{t}$	$\displaystyle=\|\mathcal{E}_{t}\|=\frac{1}{2}\sum_{i\neq j}M^{t}_{\{i,j\}},$

where $M^{t}_{\{i,j\}}=\sum_{p=1}^{t}m^{p}_{\{i,j\}}$ is the multiplicity of the edge $\{i,j\}$ at time $t$ . The following theorem states that an edge-exchangeable model constructed with a measure that has a rapidly varying tail is extremely sparse.

Theorem 3.1.

Assume that, as $w\downarrow 0$ ,

\displaystyle\nu(w)\sim c\,w^{-2}\ell(w^{-1}),

(1)

where $c>0$ , $\ell$ is slowly varying, and

\displaystyle\ell_{1}(x):=\int_{x}^{\infty}u^{-1}\ell(u)\,\mathrm{d}u<\infty

(2)

for all sufficiently large $x$ . If $\ell_{1}(x\ell_{1}(x))\sim\ell_{1}(x),$ then

\displaystyle N_{t}^{(e)}=\Theta\!\left(\frac{N_{t}}{\ell_{1}(N_{t})}\right)\qquad\text{a.s.}

(3)

In particular, if $\ell(x)=(\log x)^{a},$ $a<-1$ , then

\displaystyle N_{t}^{(e)}=\Theta\!\left(N_{t}(\log N_{t})^{-a-1}\right)\qquad\text{a.s.}

(4)

This result extends Theorem 5.3 in Cai et al. (2016) to the extremely sparse regime. The main difference is that we consider rapidly varying measures, whereas they consider regularly varying measures that are not rapidly varying. This crucial distinction allows us to achieve a better sparsity rate. The proof methods rely on conditioning on the point process $\mathcal{W}$ , after which the approach is similar to that of (Janson, 2018).

There exist several families of slowly varying functions that satisfy all the required assumptions in the theorem. The most useful one is the logarithmic family ${(\log x)^{a}}$ for $a<-1$ . However, these also include more log-based families, such as the iterated log family $p{(\log_{k}(x))}^{-p-1}\prod_{i=0}^{k-1}{(\log_{i}(x))}^{-1}$ for $p>0$ , where $\log_{k}$ is the logarithm iterated $k$ times, and the logarithmic exponential family $a{(\log x)}^{a-1}\exp(-{(\log x)}^{a})$ for $0<a<1/2$ . Additionally, there are Lambert $W$ -based families, such as the Lambert $W$ family $\frac{pW{(x)}^{-p}}{1+W(x)}$ for $p>0$ , where $W$ is the principal branch of the Lambert $W$ function, or more generally the iterated Lambert $W$ family $pW_{k}{(x)}^{-p}\prod_{i=1}^{k}\frac{1}{1+W_{i}(x)}$ , where $W_{k}$ is the iterated principal branch of the Lambert $W$ function.

4 Beta process with rapid variation

We define the Beta process with rapid variation (RapidBeta) as the process whose rate measure $\nu$ on $[0,1]$ has the density:

\nu(w)=\frac{\eta}{\alpha-\tau}\int_{\tau}^{\alpha}\frac{s}{\Gamma(1-s)}w^{-1-s}\left(1-w\right)^{\xi-1}\,\mathrm{d}s,

(5)

where the parameters satisfy $\eta,\xi>0$ and $0\leq\tau<\alpha\leq 1$ . For the special case where $\eta=\xi=1$ , this measure recovers the mixture of stable densities from Kilian et al. (2025), restricted to the range $[0,1]$ . Figure 2 displays the RapidBeta density for $\alpha=1$ and illustrates the effects of the parameters $\eta$ , $\tau$ , and $\xi$ . The effect of $\alpha$ can be understood by adapting Proposition 1 from Kilian et al. (2025):

Proposition 4.1.

If $\alpha=1$ , the density of the rate measure satisfies:

\nu(w)=\Theta\left(w^{-2}(\ln(1/w))^{-2}\right)\quad\text{as }w\to 0.

If $\alpha<1$ , the density of the rate measure satisfies:

\nu(w)=\Theta\left(w^{-1-\alpha}(\ln(1/w))^{-1}\right)\quad\text{as }w\to 0.

Additionally, one can verify that the two necessary conditions for the Poisson process construction hold: $\nu([0,1])=\infty$ and $\int_{0}^{1}w\nu(\mathrm{d}w)<\infty$ (see appendix for the proof). It follows that an edge-exchangeable graph sequence constructed using the RapidBeta measure with $\alpha=1$ satisfies the condition of Theorem 3.1. Therefore, this construction yields an extremely sparse graph sequence.

While the asymptotic sparsity regime of the RapidBeta construction is entirely determined by $\alpha$ , the lower endpoint $\tau$ plays a non-negligible role at finite sample sizes. Indeed, the Lévy density is obtained by integrating contributions of the form $w^{-1-s}$ over $s\in[\tau,\alpha]$ , so that components with larger $s$ (i.e., closer to 1) dominate the small- $w$ behavior responsible for the asymptotic regime. However, when $\tau$ is substantially below 1, the mixture includes terms with smaller exponents, which decay more slowly and can materially influence the effective behaviour over the range of weights accessible in finite simulations. As a consequence, although all choices $\tau<1$ with $\alpha=1$ are asymptotically equivalent and lead to extreme sparsity, selecting $\tau$ close to 1 concentrates the mixture on exponents near the critical boundary and suppresses lower-order contributions, thereby accelerating the emergence of the extreme-sparse regime in practice.

5 Simulation of the RapidBeta Process

To simulate an edge-exchangeable graph sequence as described in Section 2, we must first generate the Poisson point process

\mathcal{W}=\{w_{i}\}

When $\mathcal{W}$ is a RapidBeta process, the associated Lévy measure $\nu$ satisfies $\nu([0,1])=\infty$ , implying that $\mathcal{W}$ is almost surely infinite. Consequently, exact simulation is impossible, and we instead consider a truncated version.

5.1 Truncated RapidBeta Process

For a fixed $\varepsilon>0$ , define the truncated Poisson process

\mathcal{W}_{\varepsilon}=\{w_{i}\in\mathcal{W}:w_{i}>\varepsilon\},

(6)

which contains only finitely many atoms a.s. Let us define the intensity measure $\lambda$ as

\lambda(\mathrm{d}s,\mathrm{d}x)=\frac{\eta}{\alpha-\tau}\frac{s}{\Gamma(1-s)}x^{-1-s}\left(1-x\right)^{\xi-1}\mathrm{d}s\mathrm{d}x,

(7)

Sampling from $\mathcal{W}$ is equivalent to sampling a 2D Poisson point process on $[\tau,\alpha]\times[0,1]$ with intensity $\lambda$ , and discarding the first dimension. Sampling from $\mathcal{W}_{\varepsilon}$ is performed analogously but with the 2D Poisson point process taken on $[\tau,\alpha]\times[\varepsilon,1]$ .

5.2 Partitioned Thinning Scheme

To construct an efficient exact sampler for $\mathcal{W}_{\epsilon}$ , we partition $[\varepsilon,1]$ into two regions:

\displaystyle A_{\varepsilon}=[\varepsilon,1-\delta],\qquad B=(1-\delta,1],

(8)

where $\delta\in(0,1)$ is fixed. On $A_{\varepsilon}$ , we use the bound

\lambda(\mathrm{d}s,\mathrm{d}x)\leq\frac{\eta}{\alpha-\tau}B_{A}x^{-1-\alpha}\,\mathrm{d}s\,\mathrm{d}x,

(9)

where $B_{A}=\max\{1,\delta^{\xi-1}\}$ . On $B$ , we use the bound

\lambda(\mathrm{d}s,\mathrm{d}x)\leq\frac{\eta}{\alpha-\tau}(1-\delta)^{-1-\alpha}(1-x)^{\xi-1}\,\mathrm{d}s\,\mathrm{d}x.

(10)

These bounds define valid dominating measures on each region, allowing exact sampling via thinning as described in detail in Algorithm 1.

Algorithm 1 Exact sampler for the truncated weight set

\mathcal{W}_{\varepsilon}=\{w_{i}>\varepsilon\}

0: Parameters

\eta,\xi,\tau,\alpha

, truncation level

\varepsilon

, partition parameter

\delta

B_{A}\leftarrow\max\{1,\delta^{\xi-1}\}

2: Compute

\Lambda_{A}=\eta B_{A}\int_{\varepsilon}^{1-\delta}x^{-1-\alpha}\,\mathrm{d}x,

\Lambda_{B}=\eta(1-\delta)^{-1-\alpha}\int_{1-\delta}^{1}(1-x)^{\xi-1}\,\mathrm{d}x

3: Sample

N_{A}\sim\mathrm{Poisson}(\Lambda_{A})

and

N_{B}\sim\mathrm{Poisson}(\Lambda_{B})

4: for

k=1,\dots,N_{A}

5: Sample

S\sim\mathrm{Unif}[\tau,\alpha]

6: Sample

W

with density proportional to

w^{-1-\alpha}

(\varepsilon,1-\delta]

U\sim\mathrm{Unif}[0,1]

W=\left(\varepsilon^{-\alpha}-U\left(\varepsilon^{-\alpha}-(1-\delta)^{-\alpha}\right)\right)^{-1/\alpha}

7: Accept with probability

\frac{S}{\Gamma(1-S)}\frac{W^{\alpha-S}(1-W)^{\xi-1}}{B_{A}}

8: if accepted then

9: Add weight

W

\mathcal{W}_{\varepsilon}

10: end if

11: end for

12: for

k=1,\dots,N_{B}

13: Sample

S\sim\mathrm{Unif}[\tau,\alpha]

14: Sample

U\sim\mathrm{Unif}[0,1]

, set

W=1-\delta U^{1/\xi}

15: Accept with probability

\frac{S}{\Gamma(1-S)}\frac{W^{-1-S}}{(1-\delta)^{-1-\alpha}}

16: if accepted then

17: Add weight

W

\mathcal{W}_{\varepsilon}

18: end if

19: end for

20: return

\mathcal{W}_{\varepsilon}

5.3 Correctness

Theorem 5.1.

The set of weights $\mathcal{W}_{\varepsilon}$ produced by the thinning procedure described above is a realization of a Poisson point process on $[\varepsilon,1]$ with intensity $\nu(w)\,\mathrm{d}w$ i.e. the $\varepsilon$ -truncated RapidBeta process.

5.4 Truncation error induced by the $\varepsilon$ -approximation

The $\varepsilon$ -approximation replaces the full weight collection $\mathcal{W}=\{w_{i}\}$ by the truncated set $\mathcal{W}_{\varepsilon}$ . A natural measure of the approximation error is the discarded total mass

\displaystyle R_{\varepsilon}:=\sum_{w_{i}\in\mathcal{W}:w_{i}\leq\varepsilon}w_{i}.

(11)

Proposition 5.2 (Discarded mass under truncation).

Assume $\alpha=1$ , then, as $\varepsilon\rightarrow 0$ ,

\displaystyle\mathbb{E}[R_{\varepsilon}]=\Theta\!\left(\frac{1}{\ln(1/\varepsilon)}\right).

(12)

In particular, the total mass discarded by truncating the CRM at level $\varepsilon$ vanishes at a logarithmic rate.

5.5 Diagnostics

To asses the quality of our algorithm in practice we perform two diagnostics. The first diagnostic compares the empirical tail counts $N(>x)=\sum_{i}\mathbf{1}_{W_{i}>x}$ , averaged over repeated samples, to the theoretical tail measure $\bar{\nu}(x)=\int_{x}^{1}\nu(w)\mathrm{d}w$ , leveraging the property that $\mathbb{E}[N(>x)]=\bar{\nu}(x)$ for a Poisson process. Agreement between empirical and theoretical curves indicates that the marginal tail behavior and intensity are correctly captured. The second diagnostic examines the transformed order statistics by plotting $\bar{\nu}(W_{(k)})$ against the rank k. For a correctly specified Poisson point process, these transformed values align with the identity line, reflecting the fact that the ordered points map to approximately unit-rate arrivals. Together, these diagnostics provide complementary validation: the former targets first-moment properties of the tail, while the latter probes the global distributional and structural consistency of the point process. In practice, our algorithm provides good performance; see Figure 3.

6 Simulations

Once the weight generation procedure is available, we can generate the sequence of graphs following the graph frequency model (see Algorithm 2).

Algorithm 2 Graph Frequency Generative Model

1: Input: Weights

\mathbf{w}=(w_{1},\dots,w_{N})

; Time step

T

2: Output: An

N\times N

edge multiplicity matrix

\mathbf{A}

3: Let

N

be the number of nodes, i.e., the length of

\mathbf{w}

4: Initialize

\mathbf{A}

as an

N\times N

matrix of zeros.

5: for

i=1,\dots,N

6: for

j=i+1,\dots,N

p_{ij}\leftarrow w_{i}w_{j}

{Connection probability}

A_{ij}\sim\text{Binomial}(T,p_{ij})

{Number of edges}

A_{ji}\leftarrow A_{ij}

{Ensure symmetry for an undirected graph}

10: end for

11: end for

12: return

\mathbf{A}

We compare our method to graph sequences generated by a three-parameter Beta process, as described in (Cai et al., 2016). We sample the weights of the three-parameter Beta process using a Poisson thinning construction based on its Lévy measure. Recall that the Beta process is a discrete random measure whose atom weights are generated by a Poisson point process with Lévy density

\nu(\mathrm{d}w)=\frac{\gamma\Gamma(1+\beta)}{\Gamma(1-\alpha)\Gamma(\beta+\alpha)}w^{-1-\alpha}(1-w)^{\beta+\alpha-1}\mathrm{d}w

(13)

where $\alpha\in(0,1)$ and $\beta,\gamma>0$ . We simulate the associated Poisson process by first drawing candidate jumps from a dominating stable Lévy density proportional to $w^{-1-\alpha}$ , whose inverse tail admits a closed-form expression. Each proposed jump is then accepted with probability $(1-w)^{\beta+\alpha-1}$ , yielding an exact thinning scheme whenever $\beta+\alpha-1\geq 0$ . This approach avoids the numerical instabilities of stick-breaking representations (Broderick et al., 2012) when $\alpha$ is close to one, while preserving the correct distribution of large weights.

The simulations are displayed in Figure 4, as expected, the RapidBeta process produces sparser graph sequences than the three Beta process. We also provide a comparison to the Barabási-Albert model (Barabási and Albert, 1999), which produces even sparser sequences ( $N^{(e)}_{t}=\Theta(N_{t})$ ) but lacks exchangeability.

The code used to generate the edge-exchangeable graphs can be found at: https://github.com/ValentinKil/ExtremelySparseEdgesExchangeable.git.

7 Conclusion

In this work, we presented a graph generative model for edge-exchangeable networks. We first showed that the extremely sparse regime can be achieved through a sequence of edge-exchangeable graphs. We then proposed a practical generative model that enables the simulation of these sequences. Our experimental results demonstrate that, despite necessary computational approximations, the model effectively captures the extremely sparse regime.

While Li and Campbell (2021) provide rigorous quantitative bounds on the error for fixed-rank truncation of edge-exchangeable networks, equivalent theoretical guarantees for the threshold $\varepsilon$ -truncation employed in our sampler are not yet available. Extending their error analysis to weight-based thresholds is a promising direction for future work.

Extreme sparsity for edge-exchangeable simple graphs has previously been studied by Janson (2018), who proposed two examples that reach exact linear dependence ( $N^{(e)}_{t}=\Theta(N_{t})$ ). However, the produced graphs exhibit some unwanted properties, such as bounded degrees or a structure composed mostly of stars. By relaxing the strict linearity requirement and accepting linear scaling up to a slowly varying function, we are able to generate a much richer, more realistic internal graph structure with the RapidBeta process; indeed, experiments show the emergence of a giant connected component. A direct application of Theorem 14 in Eriksson (2025) demonstrates that the RapidBeta graphs are almost surely not eventually forever connected, meaning there will always be some edges that do not join the giant component. To the best of our knowledge, there are no more precise results regarding the giant component of edge-exchangeable graphs, although Eriksson (2025) suggests some interesting directions.

Other properties of the generative model, such as the power-law behaviour of the degree distribution, the clustering coefficient, and the small-world property, remain to be explored. Additionally, developing inference procedures for RapidBeta edge-exchangeable graphs is a compelling avenue for future research.

Acknowledgements

I would like to thank François Caron for his insightful advice and thorough proofreading of this manuscript. I am funded by the Clarendon Scholarship.

References

D. J. Aldous (1981) Representations for partially exchangeable arrays of random variables. Journal of Multivariate Analysis 11 (4), pp. 581–598. External Links: ISSN 0047-259X, Document Cited by: §1.
A. Barabási and R. Albert (1999) Emergence of scaling in random networks. Science 286 (5439), pp. 509–512. Cited by: §1, §6.
N. H. Bingham, C. M. Goldie, and J. L. Teugels (1987) Regular variation. Vol. 27, Cambridge university press. Cited by: §G.4, §G.4, Proposition G.3, Proposition G.5, Theorem G.6, Lemma G.9, Appendix G.
B. Bollobás (2001) Random Graphs. Cambridge University Press. External Links: ISBN 978-0-521-79722-1 Cited by: §1.
C. Borgs, J. T. Chayes, H. Cohn, and N. Holden (2018) Sparse exchangeable graphs and their limits via graphon processes. JMLR 18 (210), pp. 1–71. Cited by: §1.
T. Broderick and D. Cai (2016) Edge-exchangeable graphs and sparsity. arXiv:1603.06898. External Links: 1603.06898 Cited by: §1, §2.1, §2.
T. Broderick, M. I. Jordan, and J. Pitman (2012) Beta Processes, Stick-Breaking and Power Laws. Bayesian Analysis 7 (2), pp. 439–476. External Links: ISSN 1936-0975, 1931-6690 Cited by: §6.
D. Cai, T. Campbell, and T. Broderick (2016) Edge-exchangeable graphs and sparsity. In Advances in Neural Information Processing Systems, Vol. 29. Cited by: §1, §1, §2.2, Definition 2.1, Proposition 2.2, §2, §3, §6.
T. Campbell, D. Cai, and T. Broderick (2018) Exchangeable trait allocations. Electronic Journal of Statistics 12 (2), pp. 2290–2322. External Links: ISSN 1935-7524, 1935-7524 Cited by: §2.1.
F. Caron and E. Fox (2017) Sparse graphs using exchangeable random measures. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 79, pp. 1295–1366. Cited by: §1, §2.2.
F. Caron, F. Panero, and J. Rousseau (2023) On sparsity, power-law, and clustering properties of graphex processes. Advances in Applied Probability 55 (4), pp. 1211–1253. External Links: ISSN 0001-8678, 1475-6064 Cited by: §1.
H. Crane and W. Dempsey (2015) A framework for statistical network modeling. arXiv:1509.08185. External Links: 1509.08185 Cited by: §1, §2.
H. Crane and W. Dempsey (2016) Edge exchangeable models for network data. arXiv:1603.04571. External Links: 1603.04571 Cited by: §1, §2.
E. Eriksson (2025) Edge Exchangeable Graphs: Connectedness, Gaussianity and Completeness. arXiv:2501.09511. External Links: 2501.09511, Document Cited by: §7.
W. Feller (1971) An introduction to probability theory and its applications. Vol. 2, John Wiley & Sons. Cited by: §G.4.
D. A. Freedman (1975) On Tail Probabilities for Martingales. The Annals of Probability 3 (1), pp. 100–118. External Links: ISSN 0091-1798, 2168-894X, Document Cited by: Theorem H.2.
A. Gnedin, B. Hansen, and J. Pitman (2007) Notes on the occupancy problem with infinitely many boxes: general asymptotics and power laws. Probab. Surv 4 (146-171), pp. 88. Cited by: Lemma G.8.
D. N. Hoover (1979) Relations on Probability Spaces and Arrays of Random Variables. Institute for Advanced Study, Princeton. External Links: NJ08540 Cited by: §1.
S. Janson (2018) On Edge Exchangeable Random Graphs. Journal of Statistical Physics 173 (3), pp. 448–484. External Links: Document, ISSN 1572-9613 Cited by: Appendix C, §3, §7.
O. Kallenberg (2005) Probabilistic Symmetries and Invariance Principles. Springer Science & Business Media. External Links: ISBN 978-0-387-28861-1 Cited by: §1.
V. Kilian, B. Guedj, and F. Caron (2025) Rapidly Varying Completely Random Measures for Modeling Extremely Sparse Networks. arXiv:2505.13206. Cited by: §1, §4.
J.F.C. Kingman (1967) Completely random measures. Pacific Journal of Mathematics 21 (1), pp. 59–78. Cited by: §2.2.
J.F.C. Kingman (1993) Poisson processes. Vol. 3, Oxford University Press, USA. Cited by: §F.1.1, §2.2.
X. Li and T. Campbell (2021) Truncated simulation and inference in edge-exchangeable networks. Electronic Journal of Statistics 15 (2), pp. 5117–5157. External Links: ISSN 1935-7524, 1935-7524 Cited by: §7.
A. Lijoi and I. Prünster (2010) Models beyond the Dirichlet process. In Bayesian Nonparametrics, N. L. Hjort, C. Holmes, P. Müller, and S. G. Walker (Eds.), Cited by: §2.2.
X. Miscouridou, F. Panero, and A. Laos (2026) Dynamic sparse graphs with overlapping communities. arXiv:2512.10717. Cited by: §1.
M. Mitzenmacher and E. Upfal (2005) Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press. External Links: ISBN 978-0-521-83540-4 Cited by: Appendix H.
C. Naik, F. Caron, J. Rousseau, Y. W. Teh, and K. Palla (2022) Bayesian Nonparametrics for Sparse Dynamic Networks. arXiv. External Links: 1607.01624, Document Cited by: §1.
C. Naik, F. Caron, and J. Rousseau (2021) Sparse networks with core-periphery structure. Electronic Journal of Statistics 15 (1), pp. 1814–1868. External Links: ISSN 1935-7524, 1935-7524 Cited by: §1.
P. Orbanz and D. M. Roy (2015) Bayesian Models of Graphs, Arrays and Other Exchangeable Random Structures. IEEE Transactions on Pattern Analysis and Machine Intelligence 37 (2), pp. 437–461. External Links: ISSN 0162-8828, 2160-9292, Document Cited by: §1.
A. Todeschini, X. Miscouridou, and F. Caron (2020) Exchangeable random measures for sparse and modular graphs with overlapping communities. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82 (2), pp. 487–520. External Links: Document, ISSN 1467-9868 Cited by: §1.
J. Tropp (2011) Freedman’s inequality for matrix martingales. Electronic Communications in Probability 16 (none). External Links: ISSN 1083-589X, Document Cited by: Theorem H.2.
V. Veitch and D. M. Roy (2015) The Class of Random Graphs Arising from Exchangeable Random Measures. arXiv:1512.03099. Cited by: §1.

Appendix A Asymptotic notation

Throughout this article, we write:

•

$f(x)\underset{x\to a}{=}O(g(x))$ if there exists $C>0$ such that, for $x$ in a neighbourhood of $a$ , $|f(x)|\leq C|g(x)|$ ;
•

$f(x)\underset{x\to a}{=}\Omega(g(x))$ if there exists $C>0$ such that, for $x$ in a neighbourhood of $a$ , $|f(x)|\geq C|g(x)|$ ;
•

$f(x)\underset{x\to a}{=}\Theta(g(x))$ if both $f(x)\underset{x\to a}{=}O(g(x))$ and $f(x)\underset{x\to a}{=}\Omega(g(x))$ hold;
•

$f(x)\underset{x\to a}{\sim}g(x)$ if $\lim_{x\to a}f(x)/g(x)=1$ .

where $a\in\mathbb{R}\cup\{-\infty,+\infty\}$ . When $f(x)$ and $g(x)$ are random variables, the notation indicates that the relation holds almost surely. This usage of $\sim$ should not be confused with the standard distributional notation $X\sim\mathcal{D}$ , which indicates that the random variable $X$ has distribution $\mathcal{D}$ .

Appendix B Main lemma

Lemma B.1.

Let $(\mathcal{G}_{t})_{t\geq 0}$ be the edge-exchangeable multigraph sequence constructed from the weights $\mathcal{W}=\{w_{i}\}_{i\geq 1}$ , and let $(N_{t})_{t\geq 0}$ and $(N^{(e)}_{t})_{t\geq 0}$ denote its associated vertex and edge counts. For $t\geq 1$ , define the quenched means

	$\displaystyle\mu_{\mathcal{W}}(t)$	$\displaystyle:=\mathbb{E}\left(N_{t}\mid\mathcal{W}\right)=\sum_{i}\left[1-\prod_{j\neq i}(1-w_{i}w_{j})^{t}\right],$
	$\displaystyle\mu^{(e)}_{\mathcal{W}}(t)$	$\displaystyle:=\mathbb{E}\left(N_{t}^{(e)}\mid\mathcal{W}\right)=t\sum_{i<j}w_{i}w_{j}.$

Then, for almost every realization of $\mathcal{W}$ such that, for every $\epsilon>0$ ,

\displaystyle\sum_{t=1}^{\infty}\exp\left\{-\frac{\epsilon^{2}\mu_{\mathcal{W}}(t)^{2}}{4(2\mu^{(e)}_{\mathcal{W}}(t)+\epsilon\mu_{\mathcal{W}}(t)/3)}\right\}

\displaystyle<\infty,

(14)

we have

\displaystyle N_{t}\overset{\mathrm{a.s.}}{\sim}\mathbb{E}\left(N_{t}\mid\mathcal{W}\right),\qquad N_{t}^{(e)}\overset{\mathrm{a.s.}}{\sim}\mathbb{E}\left(N_{t}^{(e)}\mid\mathcal{W}\right).

Proof.

Fix a realization of $\mathcal{W}$ such that

\displaystyle\sum_{i}w_{i}<\infty.

(15)

This event has probability one by the assumptions on $\nu$ . Throughout the proof, we work conditionally on this realization of $\mathcal{W}$ , so the numbers $(w_{i})_{i\geq 1}$ are deterministic.

Write

\displaystyle p_{ij}:=w_{i}w_{j},\qquad i<j.

(16)

Since $\sum_{i}w_{i}<\infty$ , we have

\displaystyle\sum_{i<j}p_{ij}\leq\frac{1}{2}\left(\sum_{i}w_{i}\right)^{2}<\infty.

(17)

In particular, the conditional means below are finite for every fixed $t$ .

Conditionally on $\mathcal{W}$ , the edge multiplicities $M^{t}_{\{i,j\}}$ , $i<j$ , are independent and satisfy

\displaystyle M^{t}_{\{i,j\}}\sim\operatorname{Binom}(t,p_{ij}).

(18)

Therefore

\displaystyle N_{t}^{(e)}=\frac{1}{2}\sum_{i\neq j}M^{t}_{\{i,j\}}=\sum_{i<j}M^{t}_{\{i,j\}}.

(19)

Hence

\displaystyle\mu^{(e)}_{\mathcal{W}}(t)=\mathbb{E}\left(N_{t}^{(e)}\mid\mathcal{W}\right)=t\sum_{i<j}p_{ij}.

(20)

Edge count.

For each $i<j$ , write

\displaystyle M^{t}_{\{i,j\}}=\sum_{s=1}^{t}B^{(s)}_{ij},

(21)

where the variables $B^{(s)}_{ij}$ are independent Bernoulli random variables with success probability $p_{ij}$ . Thus, $N_{t}^{(e)}$ is a countable sum of independent Bernoulli random variables with total mean $\mu^{(e)}_{\mathcal{W}}(t)$ . Applying Theorem H.1, we obtain, for every $\epsilon\in(0,1)$ ,

	$\displaystyle\mathbb{P}\left(\left\|N_{t}^{(e)}-\mu^{(e)}_{\mathcal{W}}(t)\right\|>\epsilon\mu^{(e)}_{\mathcal{W}}(t)\,\middle\|\,\mathcal{W}\right)$	$\displaystyle\leq 2\exp\left\{-\frac{\epsilon^{2}}{3}\mu^{(e)}_{\mathcal{W}}(t)\right\}$
		$\displaystyle\leq 2\exp\left\{-\frac{\epsilon^{2}}{3}t\sum_{i<j}p_{ij}\right\}.$

Since $\sum_{i<j}p_{ij}>0$ almost surely, the right-hand side is summable in $t$ . Hence, by the Borel–Cantelli lemma,

\displaystyle\frac{N_{t}^{(e)}}{\mu^{(e)}_{\mathcal{W}}(t)}\longrightarrow 1

(22)

almost surely, conditionally on $\mathcal{W}$ .

Vertex count.

The vertex indicators are not conditionally independent. Indeed, the indicators that vertices $i$ and $j$ are active both depend on the edge variable $M^{t}_{\{i,j\}}$ . Therefore, one cannot prove the vertex part using the same Chernoff bound as for the edge count.

For $i\geq 1$ , define

\displaystyle A_{i,t}:=\mathbf{1}\left\{\sum_{j\neq i}M^{t}_{\{i,j\}}>0\right\}.

(23)

Then

\displaystyle N_{t}=\sum_{i}A_{i,t},\qquad\mu_{\mathcal{W}}(t)=\sum_{i}\mathbb{E}(A_{i,t}\mid\mathcal{W})=\sum_{i}\left[1-\prod_{j\neq i}(1-p_{ij})^{t}\right].

(24)

We shall use a martingale bounded-differences argument.

First, truncate the edge set to the finite collection

\displaystyle\mathcal{I}_{m}:=\{(i,j):1\leq i<j\leq m\}.

(25)

Let $N_{t}(m)$ be the number of active vertices in the graph obtained by keeping only the edge indicators with $(i,j)\in\mathcal{I}_{m}$ , and let

\displaystyle\mu_{\mathcal{W}}(t,m):=\mathbb{E}\left(N_{t}(m)\mid\mathcal{W}\right).

(26)

The variables $\{M^{t}_{\{i,j\}}:(i,j)\in\mathcal{I}_{m}\}$ are independent. Reveal them one at a time, in any deterministic order, and let $(Z_{k})$ be the Doob martingale

\displaystyle Z_{k}:=\mathbb{E}\left(N_{t}(m)\,\middle|\,\mathcal{W},M^{t}_{e_{1}},\ldots,M^{t}_{e_{k}}\right),

(27)

where $e_{1},\ldots,e_{|\mathcal{I}_{m}|}$ is the chosen ordering of $\mathcal{I}_{m}$ .

Changing a single edge multiplicity can change the number of active vertices by at most $2$ , because only the two endpoints of that edge can change their active/inactive status. Hence, the martingale increments satisfy

\displaystyle|Z_{k}-Z_{k-1}|\leq 2.

(28)

Moreover, if $e_{k}=(i,j)$ , then the conditional variance of the $k$ th martingale increment is bounded by

\displaystyle\mathbb{E}[(Z_{k}-Z_{k-1})^{2}\mid Z_{0:k-1}]\leq 4tp_{ij}.

(29)

Indeed, conditionally on the previously revealed variables, the only remaining randomness in this step is the binomial variable $M^{t}_{{i,j}}$ . The functional $N_{t}(m)$ depends on this variable only through whether $M^{t}_{{i,j}}=0$ or $M^{t}_{{i,j}}>0$ . Thus the conditional expectation after revealing $M^{t}_{{i,j}}$ can take two possible values, whose difference is at most $2$ , since changing the status of the edge ${i,j}$ can affect only the two endpoints $i$ and $j$ .

Therefore, the predictable quadratic variation of the martingale is bounded by

\displaystyle\sum_{(i,j)\in\mathcal{I}_{m}}4tp_{ij}\leq 4\mu^{(e)}_{\mathcal{W}}(t).

(30)

We can therefore apply Freedman’s inequality (Theorem H.2) to the martingale

\displaystyle Y_{k}:=Z_{k}-\mu_{\mathcal{W}}(t,m),

(31)

with $R=2$ and $\sigma^{2}=4\mu^{(e)}_{\mathcal{W}}(t)$ . For every $s>0$ ,

\displaystyle\mathbb{P}\!\left(N_{t}(m)-\mu_{\mathcal{W}}(t,m)\geq s\mid\mathcal{W}\right)\leq\exp\left\{-\frac{s^{2}}{2\left(4\mu^{(e)}_{\mathcal{W}}(t)+2s/3\right)}\right\}.

(32)

Applying the same argument to $-Y_{k}$ yields the lower-tail bound

\displaystyle\mathbb{P}\!\left(N_{t}(m)-\mu_{\mathcal{W}}(t,m)\leq-s\mid\mathcal{W}\right)\leq\exp\left\{-\frac{s^{2}}{2\left(4\mu^{(e)}_{\mathcal{W}}(t)+2s/3\right)}\right\}.

(33)

Combining the two inequalities, we obtain

\displaystyle\mathbb{P}\left(\left|N_{t}(m)-\mu_{\mathcal{W}}(t,m)\right|\geq s\,\middle|\,\mathcal{W}\right)\leq 2\exp\left\{-\frac{s^{2}}{2\left(4\mu^{(e)}_{\mathcal{W}}(t)+2s/3\right)}\right\}.

(34)

Since

\displaystyle N_{t}(m)-\mu_{\mathcal{W}}(t,m)\to N_{t}-\mu_{\mathcal{W}}(t)\qquad\text{almost surely},

(35)

it follows that

\displaystyle\{|N_{t}-\mu_{\mathcal{W}}(t)|>s\}\subseteq\liminf_{m\to\infty}\{|N_{t}(m)-\mu_{\mathcal{W}}(t,m)|>s\}.

(36)

Therefore, by Fatou’s lemma,

\displaystyle\mathbb{P}\!\left(|N_{t}-\mu_{\mathcal{W}}(t)|>s\,\middle|\,\mathcal{W}\right)\leq\liminf_{m\to\infty}\mathbb{P}\!\left(|N_{t}(m)-\mu_{\mathcal{W}}(t,m)|>s\,\middle|\,\mathcal{W}\right).

(37)

Applying the previous Freedman bound yields

\displaystyle\mathbb{P}\!\left(|N_{t}-\mu_{\mathcal{W}}(t)|>s\,\middle|\,\mathcal{W}\right)\leq 2\exp\!\left\{-\frac{s^{2}}{2\bigl(4\mu^{(e)}_{\mathcal{W}}(t)+2s/3\bigr)}\right\}.

(38)

Taking $s=\epsilon\mu_{\mathcal{W}}(t)$ , we obtain

\displaystyle\mathbb{P}\left(\left|N_{t}-\mu_{\mathcal{W}}(t)\right|>\epsilon\mu_{\mathcal{W}}(t)\,\middle|\,\mathcal{W}\right)\leq 2\exp\left\{-\frac{\epsilon^{2}\mu_{\mathcal{W}}(t)^{2}}{2\left(4\mu^{(e)}_{\mathcal{W}}(t)+2\epsilon\mu_{\mathcal{W}}(t)/3\right)}\right\}.

(39)

By the summability assumption (14) and the Borel–Cantelli lemma,

\displaystyle\frac{N_{t}}{\mu_{\mathcal{W}}(t)}\longrightarrow 1

(40)

almost surely, conditionally on $\mathcal{W}$ . ∎

Appendix C Proof of Theorem 3.1

Conditionally on the weights $\mathcal{W}$ , our model is closely related to the rank-1 edge-exchangeable multigraph defined by (Janson, 2018). As a result, the proof strategy is formally parallel to that of (Janson, 2018): edge variables are independent, while vertex indicators are dependent only through shared incident edges.

We prove the following sharper result.

Theorem C.1.

Assume that, as $w\downarrow 0$ , we have $\nu(w)\sim c\,w^{-2}\ell(w^{-1}),$ where $c>0$ , $\ell$ is slowly varying, and

\displaystyle\ell_{1}(x):=\int_{x}^{\infty}u^{-1}\ell(u)\,\mathrm{d}u<\infty

(41)

for all sufficiently large $x$ . Let

\displaystyle S:=\sum_{i}w_{i},\qquad L:=\sum_{i<j}w_{i}w_{j}.

(42)

Then $0<S<\infty$ and $0<L<\infty$ a.s., and

	$\displaystyle N_{t}^{(e)}$	$\displaystyle\sim tL\qquad\text{a.s.}$		(43)
	$\displaystyle N_{t}$	$\displaystyle\sim cS\,t\ell_{1}(t)\qquad\text{a.s.}$		(44)

Consequently, if $\phi(x):=x\ell_{1}(x)$ , then

N_{t}^{(e)}=\Theta\!\left(\phi^{-1}(N_{t})\right)\qquad\text{a.s.}

(45)

If moreover $\ell_{1}(x\ell_{1}(x))\sim\ell_{1}(x),$ then

N_{t}^{(e)}=\Theta\!\left(\frac{N_{t}}{\ell_{1}(N_{t})}\right)\qquad\text{a.s.}

(46)

In particular, if $\ell(x)=(\log x)^{a}$ with $a<-1$ , then

N_{t}^{(e)}=\Theta\!\left(N_{t}(\log N_{t})^{-a-1}\right)\qquad\text{a.s.}

(47)

Proof.

We write $\mathcal{W}=\{w_{i}\}_{i\geq 1}$ for the Poisson process of weights. Since $\int_{0}^{1}w\,\nu(\mathrm{d}w)<\infty,$ we have almost surely, $0<S:=\sum_{i}w_{i}<\infty$ . Moreover,

\displaystyle 0<L:=\sum_{i<j}w_{i}w_{j}=\frac{1}{2}\left(S^{2}-\sum_{i}w_{i}^{2}\right)<\infty\qquad\text{a.s.}

(48)

Preliminary.

We first evaluate the asymptotic behaviour of the realised Poisson process.

\displaystyle T_{\lambda}:=\sum_{i}\left(1-e^{-\lambda w_{i}}\right),\qquad\lambda>0.

(49)

Let us show that

T_{\lambda}\sim c\lambda\ell_{1}(\lambda)\qquad\text{a.s.}

(50)

Let $m(\lambda):=\mathbb{E}[T_{\lambda}]$ . By Campbell’s theorem,

\displaystyle m(\lambda)=\int_{0}^{\infty}\left(1-e^{-\lambda w}\right)\nu(\mathrm{d}w).

(51)

For any $\varepsilon>0$ , the integral over $[\varepsilon,\infty)$ is bounded by $\nu([\varepsilon,\infty))<\infty$ . Since $\lambda\ell_{1}(\lambda)\to\infty$ as $\lambda\to\infty$ , this bounded term is $o(\lambda\ell_{1}(\lambda))$ . Thus, it is sufficient to analyse the behaviour near $0$ .

By hypothesis, for any $\delta>0$ , there exists $\varepsilon>0$ such that for all $w\in(0,\varepsilon]$ :

\displaystyle(1-\delta)cw^{-2}\ell(w^{-1})\leq\nu(w)\leq(1+\delta)cw^{-2}\ell(w^{-1}).

(52)

Let $I(\lambda)=\int_{0}^{\varepsilon}\left(1-e^{-\lambda w}\right)w^{-2}\ell(w^{-1})\,\mathrm{d}w$ . With the change of variables $u=w^{-1}$ , we obtain

\displaystyle I(\lambda)=\int_{1/\varepsilon}^{\infty}\left(1-e^{-\lambda/u}\right)\ell(u)\,\mathrm{d}u.

(53)

We partition this integral at $u=\lambda$ , assuming that $\lambda$ is sufficiently large so that $\lambda>1/\varepsilon$ .

For the first part, $u\in[1/\varepsilon,\lambda]$ , we use the bound $1-e^{-\lambda/u}\leq 1$ to obtain

\displaystyle\int_{1/\varepsilon}^{\lambda}\left(1-e^{-\lambda/u}\right)\ell(u)\,\mathrm{d}u\leq\int_{1/\varepsilon}^{\lambda}\ell(u)\,\mathrm{d}u.

(54)

Let us show that this upper bound is $o(\lambda\ell_{1}(\lambda))$ .First by Proposition G.3, $\int_{1/\varepsilon}^{\lambda}\ell(u)\,\mathrm{d}u\sim\lambda\ell(\lambda).$ We must now establish that $\lim_{\lambda\to\infty}\ell(\lambda)/\ell_{1}(\lambda)=0$ . Recall that $\ell_{1}(\lambda)=\int_{\lambda}^{\infty}u^{-1}\ell(u)\,\mathrm{d}u$ , and by assumption, this integral is finite. According to Proposition G.5, since $\ell$ is slowly varying and its tail integral $\ell_{1}$ converges, $\ell_{1}$ is also slowly varying and satisfies:

\displaystyle\lim_{\lambda\to\infty}\frac{\ell_{1}(\lambda)}{\ell(\lambda)}=\lim_{\lambda\to\infty}\frac{1}{\ell(\lambda)}\int_{\lambda}^{\infty}\frac{\ell(u)}{u}\,\mathrm{d}u=\infty.

(55)

By inverting this relationship, we conclude that the first integral is bounded by $o(\lambda\ell_{1}(\lambda))$ .

For the second part, $u>\lambda$ , we apply the inequalities $x-x^{2}/2\leq 1-e^{-x}\leq x$ for $x\geq 0$ . Setting $x=\lambda/u$ , we get:

\displaystyle\int_{\lambda}^{\infty}\left(\frac{\lambda}{u}-\frac{\lambda^{2}}{2u^{2}}\right)\ell(u)\,\mathrm{d}u\leq\int_{\lambda}^{\infty}\left(1-e^{-\lambda/u}\right)\ell(u)\,\mathrm{d}u\leq\int_{\lambda}^{\infty}\frac{\lambda}{u}\ell(u)\,\mathrm{d}u.

(56)

The upper bound evaluates exactly to $\lambda\ell_{1}(\lambda)$ . For the lower bound, Proposition G.3, yields $\int_{\lambda}^{\infty}u^{-2}\ell(u)\,\mathrm{d}u\sim\lambda^{-1}\ell(\lambda)$ . Therefore, the subtractive error term scales as $\frac{\lambda^{2}}{2}O(\lambda^{-1}\ell(\lambda))=O(\lambda\ell(\lambda))$ , which is $o(\lambda\ell_{1}(\lambda))$ .

Consequently,

\displaystyle I(\lambda)\sim\lambda\ell_{1}(\lambda),

(57)

and it follows that

\displaystyle m(\lambda)\sim c\lambda\ell_{1}(\lambda).

(58)

To establish $T_{\lambda}\sim m(\lambda)$ , we derive a specific concentration bound for $T_{\lambda}$ . Since $T_{\lambda}$ is a functional of the Poisson point process $\mathcal{W}$ , its moment generating function is given by Campbell’s theorem: for any $u\in\mathbb{R}$ ,

\displaystyle\mathbb{E}[\exp(uT_{\lambda})]=\exp\left(\int_{0}^{\infty}\left(e^{u(1-e^{-\lambda w})}-1\right)\nu(\mathrm{d}w)\right).

(59)

For any $w\geq 0$ , the term $x=1-e^{-\lambda w}$ lies in the interval $[0,1]$ . By the convexity of the function $x\mapsto e^{ux}$ , we have the bound $e^{ux}-1\leq x(e^{u}-1)$ . Applying this inequality to the integrand yields:

\displaystyle\mathbb{E}[\exp(uT_{\lambda})]\leq\exp\left((e^{u}-1)\int_{0}^{\infty}(1-e^{-\lambda w})\nu(\mathrm{d}w)\right)=\exp\left(m(\lambda)(e^{u}-1)\right).

(60)

So the moment generating function of $T_{\lambda}$ is bounded above by that of a Poisson random variable with parameter $m(\lambda)$ . Consequently, Chernoff’s inequality yields, for any $\eta\in(0,1)$ ,

\displaystyle\mathbb{P}\!\left(|T_{\lambda}-m(\lambda)|>\eta m(\lambda)\right)\leq 2\exp\!\left(-\frac{\eta^{2}m(\lambda)}{3}\right).

(61)

Let us define a geometric grid $\lambda_{k}=(1+\gamma)^{k}$ for a fixed $\gamma>0$ . Since $m(\lambda)$ is regularly varying with index 1, it grows asymptotically faster than $x^{1-\delta}$ for any $\delta\in(0,1)$ . Thus, $m(\lambda_{k})$ grows exponentially with $k$ , meaning the sequence of probabilities $\mathbb{P}\left(|T_{\lambda_{k}}-m(\lambda_{k})|>\eta m(\lambda_{k})\right)$ decays exponentially and is therefore summable. By the Borel-Cantelli lemma, $T_{\lambda_{k}}\sim m(\lambda_{k})$ almost surely as $k\to\infty$ .

To extend this convergence to the continuous parameter $\lambda$ , we rely on the monotonicity of $T_{\lambda}$ . For any $\lambda>0$ , there exists an index $k$ such that $\lambda_{k}\leq\lambda\leq\lambda_{k+1}$ . As $T_{\lambda}$ is nondecreasing and $m$ is strictly increasing in $\lambda$ , we have

\displaystyle\frac{T_{\lambda_{k}}}{m(\lambda_{k+1})}\leq\frac{T_{\lambda}}{m(\lambda)}\leq\frac{T_{\lambda_{k+1}}}{m(\lambda_{k})}.

(62)

Taking the limit as $\lambda\to\infty$ (and therefore $k\to\infty$ ), and utilizing the regular variation of $m$ which guarantees $\lim_{k\to\infty}m(\lambda_{k+1})/m(\lambda_{k})=1+\gamma$ , we obtain almost surely:

\displaystyle\frac{1}{1+\gamma}\leq\liminf_{\lambda\to\infty}\frac{T_{\lambda}}{m(\lambda)}\leq\limsup_{\lambda\to\infty}\frac{T_{\lambda}}{m(\lambda)}\leq 1+\gamma.

(63)

Since this holds for any arbitrarily small $\gamma>0$ , taking the intersection of these almost sure events over a countable sequence $\gamma\downarrow 0$ proves that $T_{\lambda}\sim m(\lambda)$ a.s. This establishes (50).

Edge count.

By the first part of Lemma B.1, we have established that, almost surely,

\displaystyle N_{t}^{(e)}\sim\mathbb{E}(N_{t}^{(e)}\mid\mathcal{W})=tL.

(64)

This is exactly Equation 43.

Vertex count.

We now study the number of active vertices. Conditionally on $\mathcal{W}$ , the probability that vertex $i$ is active after $t$ interactions is

\displaystyle p_{i}^{(t)}=1-\prod_{j\neq i}(1-w_{i}w_{j})^{t}.

(65)

Hence $\mu_{\mathcal{W}}(t):=\mathbb{E}[N_{t}\mid\mathcal{W}]=\sum_{i}p_{i}^{(t)}.$ We shall show that

\mu_{\mathcal{W}}(t)\sim cS\,t\ell_{1}(t)\qquad\text{a.s.}

(66)

Fix $\varepsilon>0$ . The number of atoms with $w_{i}>\varepsilon$ is finite almost surely, and their total contribution to $\mu_{\mathcal{W}}(t)$ is $O(1)$ , which is negligible compared to $t\ell_{1}(t)$ . Thus, it is enough to consider atoms with $w_{i}\leq\varepsilon$ . For such atoms, write

\displaystyle R_{i}:=-\sum_{j\neq i}\log(1-w_{i}w_{j}).

(67)

Then $p_{i}^{(t)}=1-e^{-tR_{i}}.$ To bound $R_{i}$ , we use the inequalities $x\leq-\log(1-x)\leq\frac{x}{1-x}$ , valid for $x\in[0,1/2)$ . Applying the lower bound with $x=w_{i}w_{j}$ yields:

\displaystyle R_{i}\geq\sum_{j\neq i}w_{i}w_{j}=w_{i}\sum_{j\neq i}w_{j}=w_{i}(S-w_{i}).

(68)

For the upper bound, recall that we are only considering atoms where $w_{i}\leq\varepsilon$ . Because the total sum of the weights is $S$ , every individual weight satisfies $w_{j}\leq S$ . This implies that for any pair, the product is bounded by $w_{i}w_{j}\leq\varepsilon S$ . Since $S<\infty$ almost surely, we can choose $\varepsilon$ sufficiently small such that $\varepsilon S\leq 1/2$ . Applying the second inequality to the sum gives:

R_{i}\leq\sum_{j\neq i}\frac{w_{i}w_{j}}{1-w_{i}w_{j}}\leq\sum_{j\neq i}\frac{w_{i}w_{j}}{1-\varepsilon S}\leq\frac{1}{1-\varepsilon S}\sum_{j\neq i}w_{i}w_{j}=\frac{1}{1-\varepsilon S}w_{i}(S-w_{i}).

In total

\displaystyle w_{i}(S-w_{i})\leq R_{i}\leq\frac{1}{1-\varepsilon S}w_{i}(S-w_{i}),

(69)

Since $w_{i}\leq\varepsilon$ , it follows that $S-w_{i}\geq S-\varepsilon$ . Thus,

\displaystyle R_{i}\geq(S-\varepsilon)w_{i}.

(70)

We also have $S-w_{i}\leq S$ , and since $\varepsilon\downarrow 0$ ,

\displaystyle\frac{1}{1-\varepsilon S}=1+o_{\varepsilon}(1).

(71)

Hence,

\displaystyle R_{i}\leq(1+o_{\varepsilon}(1))Sw_{i}.

(72)

As the function $x\mapsto 1-e^{-tx}$ is strictly monotonically increasing for $t>0$ , we have

\displaystyle 1-e^{-t(S-\varepsilon)w_{i}}\leq p_{i}^{(t)}\leq 1-e^{-t(1+o_{\varepsilon}(1))Sw_{i}}.

(73)

Summing these inequalities over all atoms satisfying $w_{i}\leq\varepsilon$ , we obtain

\displaystyle\sum_{w_{i}\leq\varepsilon}\left(1-e^{-t(S-\varepsilon)w_{i}}\right)\leq\sum_{w_{i}\leq\varepsilon}p_{i}^{(t)}\leq\sum_{w_{i}\leq\varepsilon}\left(1-e^{-t(1+o_{\varepsilon}(1))Sw_{i}}\right).

(74)

Recall that $\{w_{i}>\varepsilon\}$ is almost surely finite. Consequently,

	$\displaystyle\sum_{w_{i}\leq\varepsilon}p_{i}^{(t)}$	$\displaystyle=\mu_{\mathcal{W}}(t)+O(1),$
	$\displaystyle\sum_{w_{i}\leq\varepsilon}(1-e^{-\lambda w_{i}})$	$\displaystyle=T_{\lambda}+O(1),$

for every $\lambda>0$ as $1-e^{-\lambda w_{i}}\leq 1$ . By substituting $\lambda=t(S-\varepsilon)$ into the left-hand side, $\lambda=t(1+o_{\varepsilon}(1))S$ into the right-hand side, and incorporating the $O(1)$ terms, we obtain the final bounded inequalities:

\displaystyle T_{t(S-\varepsilon)}+O(1)\leq\mu_{\mathcal{W}}(t)\leq T_{t(1+o_{\varepsilon}(1))S}+O(1).

(75)

Using (50) and the slow variation of $\ell_{1}$ , we obtain

\displaystyle\liminf_{t\to\infty}\frac{\mu_{\mathcal{W}}(t)}{t\ell_{1}(t)}\geq c(S-\varepsilon),

(76)

and

\displaystyle\limsup_{t\to\infty}\frac{\mu_{\mathcal{W}}(t)}{t\ell_{1}(t)}\leq c(1+o_{\varepsilon}(1))S.

(77)

Letting $\varepsilon\downarrow 0$ proves (66).

Now, we apply Lemma B.1 conditionally on $\mathcal{W}$ . Recall that the summability condition requires $\sum_{t=1}^{\infty}\exp\left\{-\frac{\epsilon^{2}\mu(t)^{2}}{4(2\mu^{(e)}(t)+\epsilon/3\times\mu(t))}\right\}<\infty$ . We know that $\mu_{\mathcal{W}}(t)\sim cS\,t\ell_{1}(t)$ and $\mu_{\mathcal{W}}^{(e)}(t)\sim tL$ . Because $\ell_{1}(t)\to 0$ , $\mu(t)$ grows strictly slower than $\mu^{(e)}(t)$ , meaning the denominator of the exponent is dominated by $tL$ . Consequently, the exponent scales asymptotically as $-\Theta(t\ell_{1}(t)^{2})$ . To lower bound this growth, we apply Lemma G.9 to the slowly varying function $\ell_{1}$ , which states that $\lim_{n\to\infty}\frac{\ln(\ell_{1}(t))}{\ln(t)}=0$ . This means that for any $\delta\in(0,1/2)$ , and for all sufficiently large $t$ ,

\displaystyle t\ell_{1}(t)^{2}>t^{1-2\delta}.

(78)

Thus the summability condition holds. Therefore

\displaystyle N_{t}\sim\mu_{\mathcal{W}}(t)\sim cS\,t\ell_{1}(t)\qquad\text{a.s.}

(79)

Sparsity results.

We now prove Equation 45. Let $\phi(x):=x\ell_{1}(x).$ Since $\ell_{1}$ is slowly varying, $\phi$ is regularly varying with index $1$ . According to Theorem G.6, any regularly varying function with index $\alpha>0$ possesses an asymptotic inverse that is regularly varying with index $1/\alpha$ . Therefore, $\phi$ has an asymptotic inverse $\phi^{-1}$ which is regularly varying with index $1$ , satisfying $\phi^{-1}(\phi(x))\sim x$ as $x\to\infty$ . We have proven

\displaystyle N_{t}\sim cS\,\phi(t)\qquad\text{a.s.}

(80)

Because $\phi^{-1}$ is regularly varying with index $1$ , it preserves asymptotic equivalence and satisfies $\phi^{-1}(ax)\sim a\phi^{-1}(x)$ for any constant $a>0$ . Applying $\phi^{-1}$ to both sides yields:

\displaystyle\phi^{-1}(N_{t})\sim\phi^{-1}(cS\phi(t))\sim cS\phi^{-1}(\phi(t))\sim cSt\qquad\text{a.s.}

(81)

Since $c$ and $S$ are strictly positive and finite almost surely, we can rearrange this asymptotic equivalence to isolate $n$ :

\displaystyle t\sim\frac{1}{cS}\phi^{-1}(N_{t})\qquad\text{a.s.}

(82)

This implies the exact asymptotic bound:

\displaystyle t=\Theta\!\left(\phi^{-1}(N_{t})\right)\qquad\text{a.s.}

(83)

Since $N_{t}^{(e)}\sim tL$ , it follows that

\displaystyle N_{t}^{(e)}=\Theta\!\left(\phi^{-1}(N_{t})\right)\qquad\text{a.s.}

(84)

We now prove Equation 46: assume that $\ell_{1}(x\ell_{1}(x))\sim\ell_{1}(x).$ Since $N_{t}\sim cS\,t\ell_{1}(t),$ slow variation gives

\displaystyle\ell_{1}(N_{t})\sim\ell_{1}(t\ell_{1}(t))\sim\ell_{1}(t).

(85)

Therefore

\displaystyle\frac{N_{t}}{\ell_{1}(N_{t})}\sim\frac{cS\,t\ell_{1}(t)}{\ell_{1}(t)}=cS\,t.

(86)

Since $N_{t}^{(e)}\sim tL$ , we conclude that

\displaystyle N_{t}^{(e)}=\Theta\!\left(\frac{N_{t}}{\ell_{1}(N_{t})}\right)\qquad\text{a.s.}

(87)

Finally, let us prove Equation 47. If

\displaystyle\ell(x)=(\log x)^{a},\qquad a<-1,

(88)

then

\displaystyle\ell_{1}(x)=\int_{x}^{\infty}u^{-1}(\log u)^{a}\,\mathrm{d}u=\frac{-1}{a+1}(\log x)^{a+1}.

(89)

Moreover, $\ell_{1}(x\ell_{1}(x))\sim\ell_{1}(x).$ Hence

\displaystyle N_{t}^{(e)}=\Theta\!\left(\frac{N_{t}}{(\log N_{t})^{a+1}}\right)=\Theta\!\left(N_{t}(\log N_{t})^{-a-1}\right),

(90)

∎

Appendix D Proof of Proposition 4.1

When $\eta=1$ , consider

	$\displaystyle(\alpha-\tau)x^{\alpha+1}\nu(x)$	$\displaystyle=(1-x)^{\xi-1}\int_{\tau}^{\alpha}\frac{s}{\Gamma(1-s)}x^{\alpha-s}\mathrm{d}s$
		$\displaystyle=(1-x)^{\xi-1}\int_{0}^{\alpha-\tau}\frac{\alpha-u}{\Gamma(1+u-\alpha)}e^{-u\ln(1/x)}\mathrm{d}u$

using the change of variables $u=\alpha-s$ . Let $g(z)=\int_{0}^{\alpha-\tau}\frac{\alpha-u}{\Gamma(1+u-\alpha)}e^{-uz}\mathrm{d}u$ . We have, as $u\to 0$ ,

\displaystyle\frac{\alpha-u}{\Gamma(1+u-\alpha)}\sim\left\{\begin{array}[]{ll}u&\text{if }\alpha=1,\\ \frac{\alpha}{\Gamma(1-\alpha)}&\text{if }\alpha<1.\end{array}\right.

Using the Tauberian Theorem G.7, we obtain, as $z\to\infty$

g(z)\sim\left\{\begin{array}[]{ll}z^{-2}&\text{if }\alpha=1,\\ \frac{z^{-1}\alpha}{\Gamma(1-\alpha)}&\text{if }\alpha<1.\end{array}\right.

It follows that, as $x\to 0$ ,

(\alpha-\tau)x^{\alpha+1}\nu(x)=g(\ln 1/x)\sim\left\{\begin{array}[]{ll}\ln^{-2}(1/x)&\text{if }\alpha=1,\\ \frac{\ln^{-1}(1/x)\alpha}{\Gamma(1-\alpha)}&\text{if }\alpha<1.\end{array}\right.

The result when $\eta>0$ follows immediately as $\eta$ is a multiplicative constant.

Appendix E Conditions

We now prove that the RapidBeta measure satisfies the assumption $\nu([0,1])=\infty$ and $\int_{0}^{1}w\nu(\mathrm{d}w)<\infty$ .

Divergence of $\int_{0}^{1}\nu(w)\mathrm{d}w$

The integral is:

\displaystyle\int_{0}^{1}\nu(w)\mathrm{d}w=\int_{0}^{1}\left(\frac{\eta}{\alpha-\tau}\int_{\tau}^{\alpha}\frac{s}{\Gamma(1-s)}w^{-1-s}(1-w)^{\xi-1}ds\right)\mathrm{d}w

(91)

By Fubini’s theorem, we can swap the order of integration:

\displaystyle\frac{\eta}{\alpha-\tau}\int_{\tau}^{\alpha}\frac{s}{\Gamma(1-s)}\left(\int_{0}^{1}w^{-1-s}(1-w)^{\xi-1}\mathrm{d}w\right)ds

(92)

The inner integral over $w$ is the Beta function, $B(x,y)=\int_{0}^{1}t^{x-1}(1-t)^{y-1}\mathrm{d}t$ . We identify the parameters $x$ by setting $x-1=-1-s$ , which gives $x=-s$ .

The Beta integral converges only if its parameters are positive. This requires $x=-s>0$ , which implies $s<0$ .

However, the given constraint is $0\leq\tau<\alpha\leq 1$ . For any $s\in[\tau,\alpha]$ , we have $s\geq 0$ . This violates the necessary condition $s<0$ . Because the inner integral

\displaystyle\int_{0}^{1}w^{-1-s}(1-w)^{\xi-1}\mathrm{d}w

(93)

diverges for all $s\geq 0$ in the domain, the entire expression for $\int_{0}^{1}\nu(w)\mathrm{d}w$ diverges.

Convergence of $\int_{0}^{1}w\nu(w)\,\mathrm{d}w$ .

We want to show that $\int_{0}^{1}w\nu(w)\,\mathrm{d}w<\infty.$ By definition of $\nu$ , we have

\displaystyle\int_{0}^{1}w\nu(w)\,\mathrm{d}w=\int_{0}^{1}w\left(\frac{\eta}{\alpha-\tau}\int_{\tau}^{\alpha}\frac{s}{\Gamma(1-s)}w^{-1-s}(1-w)^{\xi-1}\,\mathrm{d}s\right)\,\mathrm{d}w.

(94)

Combining the powers of $w$ gives

\displaystyle\int_{0}^{1}w\nu(w)\,\mathrm{d}w=\frac{\eta}{\alpha-\tau}\int_{0}^{1}\int_{\tau}^{\alpha}\frac{s}{\Gamma(1-s)}w^{-s}(1-w)^{\xi-1}\,\mathrm{d}s\,\mathrm{d}w.

(95)

By Tonelli’s theorem,

\displaystyle\int_{0}^{1}w\nu(w)\,\mathrm{d}w=\frac{\eta}{\alpha-\tau}\int_{\tau}^{\alpha}\frac{s}{\Gamma(1-s)}\left(\int_{0}^{1}w^{-s}(1-w)^{\xi-1}\,\mathrm{d}w\right)\,\mathrm{d}s.

(96)

For $s<1$ , the inner integral is the Beta integral with parameters $1-s$ and $\xi$ . Since $1-s>0$ and $\xi>0$ , it is finite and satisfies

\displaystyle\int_{0}^{1}w^{-s}(1-w)^{\xi-1}\,\mathrm{d}w=B(1-s,\xi)=\frac{\Gamma(1-s)\Gamma(\xi)}{\Gamma(1-s+\xi)}.

(97)

Therefore, for $s<1$ ,

\displaystyle\frac{s}{\Gamma(1-s)}\int_{0}^{1}w^{-s}(1-w)^{\xi-1}\,\mathrm{d}w=\frac{s}{\Gamma(1-s)}\frac{\Gamma(1-s)\Gamma(\xi)}{\Gamma(1-s+\xi)}=\frac{s\Gamma(\xi)}{\Gamma(\xi+1-s)}.

(98)

Thus,

\displaystyle\int_{0}^{1}w\nu(w)\,\mathrm{d}w=\frac{\eta\Gamma(\xi)}{\alpha-\tau}\int_{\tau}^{\alpha}\frac{s}{\Gamma(\xi+1-s)}\,\mathrm{d}s.

(99)

This identity is immediate when $\alpha<1$ . If $\alpha=1$ , the Beta integral itself is singular at the single point $s=1$ , but this point has Lebesgue measure zero. Moreover, $s\mapsto\frac{s\Gamma(\xi)}{\Gamma(\xi+1-s)}$ has a continuous extension to $s=1$ , since

\displaystyle\lim_{s\uparrow 1}\frac{s\Gamma(\xi)}{\Gamma(\xi+1-s)}=\frac{\Gamma(\xi)}{\Gamma(\xi)}=1.

(100)

Hence the endpoint $s=1$ does not affect the value or finiteness of the integral. Finally, the function $s\mapsto\frac{s}{\Gamma(\xi+1-s)}$ is continuous on the closed interval $[\tau,\alpha]$ . Indeed, for all $s\in[\tau,\alpha]$ ,

\displaystyle\xi+1-s\geq\xi>0,

(101)

and the gamma function is finite, strictly positive, and continuous on $(0,\infty)$ . Consequently,

\displaystyle\int_{\tau}^{\alpha}\frac{s}{\Gamma(\xi+1-s)}\,\mathrm{d}s<\infty.

(102)

Therefore,

\displaystyle\int_{0}^{1}w\nu(w)\,\mathrm{d}w=\frac{\eta\Gamma(\xi)}{\alpha-\tau}\int_{\tau}^{\alpha}\frac{s}{\Gamma(\xi+1-s)}\,\mathrm{d}s<\infty.

(103)

Appendix F Sampling algorithm

F.1 Proofs

F.1.1 Proof of Theorem 5.1

On each region, $A_{\varepsilon}$ and $B$ , the proposal intensity dominates the target intensity. By the thinning theorem for Poisson point processes (Kingman, 1993, Section 5.1), the accepted points on each region form independent Poisson point processes with an intensity equal to the target intensity $\lambda(\mathrm{d}s,\mathrm{d}w)$ restricted to that region.

Since $A_{\varepsilon}$ and $B$ are disjoint, their superposition yields a Poisson point process on $[\tau,\alpha]\times[\varepsilon,1]$ with intensity $\lambda(\mathrm{d}s,\mathrm{d}w)$ . Finally, marginalising out the latent dimension $s$ (which corresponds to the projection property of Poisson processes) yields a Poisson point process on $[\varepsilon,1]$ with intensity:

\displaystyle\nu(w)\,\mathrm{d}w=\left(\int_{\tau}^{\alpha}\lambda(s,w)\,\mathrm{d}s\right)\mathrm{d}w.

(104)

This establishes that $\mathcal{W}_{\varepsilon}$ is a realisation of the $\varepsilon$ -truncated RapidBeta process.

F.1.2 Proof of Proposition 5.2

By Campbell’s theorem for Poisson point processes,

\displaystyle\mathbb{E}[R_{\varepsilon}]=\int_{0}^{\varepsilon}w\,\nu(w)\,\mathrm{d}w.

(105)

Using Proposition 4.1 with $\alpha=1$ , there exist constants $0<c_{1}<c_{2}<\infty$ and $\varepsilon_{0}\in(0,1)$ such that for all $0<w<\varepsilon_{0}$ ,

\displaystyle c_{1}\,w^{-2}\big(\ln(1/w)\big)^{-2}\leq\nu(w)\leq c_{2}\,w^{-2}\big(\ln(1/w)\big)^{-2}.

(106)

Hence, for all sufficiently small $\varepsilon$ ,

\displaystyle c_{1}\int_{0}^{\varepsilon}\frac{1}{w\big(\ln(1/w)\big)^{2}}\,\mathrm{d}w\leq\mathbb{E}[R_{\varepsilon}]\leq c_{2}\int_{0}^{\varepsilon}\frac{1}{w\big(\ln(1/w)\big)^{2}}\,\mathrm{d}w.

(107)

Now make the change of variables $u=\ln(1/w)$ , so that $\mathrm{d}u=-\mathrm{d}w/w$ . Then

\displaystyle\int_{0}^{\varepsilon}\frac{1}{w\big(\ln(1/w)\big)^{2}}\,\mathrm{d}w=\int_{\ln(1/\varepsilon)}^{\infty}u^{-2}\,\mathrm{d}u=\frac{1}{\ln(1/\varepsilon)}.

(108)

Combining the upper and lower bounds yields

\displaystyle\mathbb{E}[R_{\varepsilon}]=\Theta\!\left(\frac{1}{\ln(1/\varepsilon)}\right),

(109)

as claimed.

Appendix G Background on rapidly varying function

Most of the background material in this section originates from the book of Bingham et al. (1987).

G.1 Definitions

Definition G.1 (Slowly varying function).

A function $\ell:(0,\infty)\to(0,\infty)$ is slowly varying at infinity if for all $c>0$ ,

\frac{\ell(cx)}{\ell(x)}\to 1\text{ as }x\to\infty.

Examples of slowly varying functions include $\ln^{a}$ , for $a\in{\mathbb{R}}$ , and functions converging to a constant $c>0$ .

Definition G.2 (Regularly varying function).

A function $\ell:(0,\infty)\to(0,\infty)$ is regularly varying at infinity with exponent $\rho\in{\mathbb{R}}$ if $f(x)=x^{\rho}\ell(x)$ for some slowly varying function $\ell$ . We note $R_{\rho}$ the set of all functions regularly varying at infinity with exponent $\rho$ . A function $f$ is regularly varying at $0$ if $f(1/x)$ is regularly varying at infinity that is, $f(x)=x^{-\rho}\ell(1/x)$ for some $\rho\in{\mathbb{R}}$ .

G.2 Karamata theorems

The following propositions and corollaries relate to integrals of regularly varying functions.

Proposition G.3 (Karamata theorem).

See (Bingham et al., 1987, Propositions 1.5.8 and 1.5.10). Let $U(t)=t^{\rho}\ell(t)$ for some locally bounded slowly varying function $\ell$ . Then

(i)

If $\rho>-1$

$\int_{0}^{x}U(t)dt\sim\frac{1}{1+\rho}x^{\rho+1}\ell(x)\text{ as }x\to\infty.$
(ii)

If $\rho<-1$

$\int_{x}^{\infty}U(t)dt\sim-\frac{1}{1+\rho}x^{\rho+1}\ell(x)\text{ as }x\to\infty.$

The following corollaries will be useful.

Proposition G.4.

Let $U(x)=x^{\alpha}\ell(1/x)$ for some locally bounded slowly varying function $\ell$ .

(i)

If $\alpha>-1$

$\int_{0}^{x}U(t)\mathrm{d}t\sim\frac{1}{\alpha+1}x^{1+\alpha}\ell(1/x)\text{ as }x\to 0.$
(ii)

If $\alpha<-1$

$\int_{x}^{\infty}U(t)\mathrm{d}t\sim-\frac{1}{\alpha+1}x^{1+\alpha}\ell(1/x)\text{ as }x\to 0.$

If $\ell$ is slowly varying and $\alpha>-1$ then $\int_{0}^{x}t^{\alpha}\ell(1/t)dt$ converges and

\displaystyle\frac{x^{1+\alpha}\ell(1/x)}{\int_{0}^{x}t^{\alpha}\ell(1/t)dt}\rightarrow\alpha+1\text{ as }x\rightarrow 0.

(110)

Proof.

Let $\rho<-1$ . We have $\int_{x}^{\infty}t^{\rho}\ell(t)\mathrm{d}t=\int_{0}^{1/x}t^{-\rho-2}\ell(1/t)\mathrm{d}t$ . Writing $\alpha=-\rho-2>-1$ , we obtain, using Proposition G.3(ii)

\displaystyle\frac{x^{-\alpha-1}\ell(x)}{\int_{0}^{1/x}t^{\alpha}\ell(1/t)\mathrm{d}t}\rightarrow\rho+1\text{ as }x\rightarrow\infty,

(111)

\displaystyle\frac{x^{1+\alpha}\ell(1/x)}{\int_{0}^{x}t^{\alpha}\ell(1/t)dt}\rightarrow\rho+1\text{ as }x\rightarrow 0.

(112)

The case $\alpha<-1$ follows similarly, using Proposition G.3(i). ∎

Proposition G.5 (Proposition 1.5.9b, Bingham et al., 1987).

Let $\ell$ be slowly varying and suppose $\int_{x}^{\infty}\ell(t)\mathrm{d}t/t<\infty$ . Then $\int_{x}^{\infty}\ell(t)\mathrm{d}t/t$ is slowly varying and

\displaystyle\frac{1}{\ell(x)}\int_{x}^{\infty}\ell(t)\mathrm{d}t/t\to\infty\quad\text{as }x\to\infty.

(113)

G.3 Asymptotic inverse

Theorem G.6 (Theorem 1.5.12, Bingham et al., 1987).

Let $f\in R_{\alpha}$ with $\alpha>0$ . Then there exists $g\in R_{1/\alpha}$ with

\displaystyle f(g(x))\sim g(f(x))\sim x\quad\text{as }x\to\infty.

(114)

Here $g$ is unique to within asymptotic equivalence, and one version of $g$ is the generalized inverse $f^{-1}(x)=\inf\{y:f(y)>x\}$ .

G.4 Tauberian theorem

The following theorem is a variation of Bingham et al. (1987, Theorem 1.7.6 p.46), where the two limits at 0 and infinity are exchanged. The proof is similar. See also Feller (1971, Chapter XIII).

Theorem G.7 (Tauberian theorem).

Assume $U(x)\geq 0$ , $c\geq 0$ , $\rho>-1$ , $\widehat{U}(s)=s\int_{0}^{\infty}e^{-sx}U(x)\mathrm{d}x$ convergent for $s>0$ , and $\ell$ a slowly varying function. Then

\displaystyle U(x)\sim cx^{\rho}\ell(1/x)/\Gamma(1+\rho)\text{ as }x\rightarrow 0

(115)

implies

\displaystyle\widehat{U}(s)\sim cs^{-\rho}\ell(s)\text{ as }s\rightarrow\infty.

(116)

Proof.

Write $V(x)=\int_{0}^{x}U(y)\mathrm{d}y$ (this is finite for any $x$ as $\widehat{U}(s)$ is convergent for any $s$ ), then $V$ is non-decreasing and by Proposition G.4

\displaystyle V(x)\sim\frac{c}{\rho+1}x^{\rho+1}\ell(1/x)/\Gamma(1+\rho)\text{ as }x\rightarrow 0.

(117)

Then by Bingham et al. (1987, Theorem 1.7.1, p.38), this is equivalent to

\displaystyle\widehat{V}(s)=\int_{0}^{\infty}e^{-sx}\mathrm{d}V(x)\sim cs^{-\rho-1}\ell(s)\text{ as }s\rightarrow\infty.

(118)

Finally, note that $\widehat{V}(s)=\frac{\widehat{U}(s)}{s}$ . Thus the above equation is equivalent to

\displaystyle\widehat{U}(s)\sim cs^{-\rho}\ell(s)\text{ as }s\rightarrow\infty.

(119)

∎

G.5 Other useful results on regular variation

Lemma G.8 (Lemma 14, Gnedin et al., 2007).

For $\ell$ slowly varying, the relation

\int_{x}^{\infty}\nu(du)\sim x^{-1}\ell(x^{-1}),~~x\rightarrow 0

implies

\int_{0}^{x}u\nu(du)\sim\ell_{1}(x^{-1}),~~x\rightarrow 0

with $\ell_{1}(x)=o(\ell(x))$ is another slowly varying function defined by $\ell_{1}(y)=\int_{y}^{\infty}u^{-1}\ell(u)du$ .

Lemma G.9 (Proposition 1.3.6, Bingham et al., 1987).

If $\ell$ varies slowly then $\frac{\ln(\ell(x))}{\ln(x)}\underset{x\rightarrow\infty}{\longrightarrow}0$ .

Appendix H More background results

Theorem H.1.

Let $X_{1},X_{2},\ldots$ be a sequence of independent Bernoulli random variables with parameters $p_{1},p_{2},\ldots$ such that $\mu=\sum_{k}p_{k}<\infty$ . Let

\displaystyle S=\sum_{k}X_{k}.

(120)

Then, for every $\epsilon\in(0,1)$ ,

\displaystyle\mathbb{P}(|S-\mu|>\epsilon\mu)\leq 2\exp\left(-\mu\frac{\epsilon^{2}}{3}\right).

(121)

Proof.

For $n>0$ , let

\displaystyle S_{n}=\sum_{k=1}^{n}X_{k}\qquad\text{and}\qquad\mu_{n}=\mathbb{E}(S_{n})=\sum_{k=1}^{n}p_{k}.

(122)

The sequence $(\mu_{n})$ converges to $\mu$ .

Since

\displaystyle\sum_{k=1}^{\infty}\mathbb{P}(X_{k}=1)=\mu<\infty,

(123)

the first Borel–Cantelli lemma implies that the events $\{X_{k}=1\}$ occur only finitely often almost surely. Consequently, $S$ is almost surely finite, and the partial sums $S_{n}$ converge almost surely to $S$ .

Fix $\epsilon\in(0,1)$ and consider the events

	$\displaystyle A_{n}$	$\displaystyle=\{\|S_{n}-\mu_{n}\|\geq\epsilon\mu\},$
	$\displaystyle A$	$\displaystyle=\{\|S-\mu\|\geq\epsilon\mu\}.$

For any sample path such that $S_{n}(\omega)\to S(\omega)$ , if $|S-\mu|\geq\epsilon\mu$ , then for all sufficiently large $n$ ,

\displaystyle|S_{n}-\mu_{n}|\geq\epsilon\mu

(124)

also holds, since $\mu_{n}\to\mu$ . Therefore, the indicator functions satisfy

\displaystyle\mathbf{1}_{A}\leq\liminf_{n\to\infty}\mathbf{1}_{A_{n}}\qquad\text{almost surely}.

(125)

Taking expectations on both sides and applying Fatou’s lemma yields

\displaystyle\mathbb{P}(A)=\mathbb{E}[\mathbf{1}_{A}]\leq\mathbb{E}\!\left[\liminf_{n\to\infty}\mathbf{1}_{A_{n}}\right]\leq\liminf_{n\to\infty}\mathbb{E}[\mathbf{1}_{A_{n}}]=\liminf_{n\to\infty}\mathbb{P}(A_{n}).

(126)

By Corollary 4.6 of Mitzenmacher and Upfal (2005), for every $n>0$ ,

\displaystyle\mathbb{P}(A_{n})\leq 2\exp(-\mu_{n}\epsilon^{2}/3).

(127)

Consequently,

\displaystyle\liminf_{n\to\infty}\mathbb{P}(A_{n})\leq\liminf_{n\to\infty}2\exp(-\mu_{n}\epsilon^{2}/3)=\lim_{n\to\infty}2\exp(-\mu_{n}\epsilon^{2}/3)=2\exp(-\mu\epsilon^{2}/3).

(128)

Combining the previous inequalities completes the proof. ∎

Theorem H.2 (Tropp 2011, Theorem 1.1; Freedman 1975, Theorem 1.6).

Consider a real-valued martingale $(Y_{k})_{k\geq 0}$ with difference sequence $(X_{k})_{k\geq 0}$ , and assume that the difference sequence is uniformly bounded:

\displaystyle X_{k}\leq R\qquad\text{almost surely for all }k\geq 0.

(129)

Define the predictable quadratic variation process by

\displaystyle W_{k}:=\sum_{j=1}^{k}\mathbb{E}\!\left[X_{j}^{2}\,\middle|\,X_{0:j-1}\right],\qquad k\geq 0.

(130)

Then, for all $t\geq 0$ and $\sigma^{2}>0$ ,

\displaystyle\mathbb{P}\left\{\exists k\geq 0:Y_{k}\geq t\text{ and }W_{k}\leq\sigma^{2}\right\}\leq\exp\left(-\frac{t^{2}/2}{\sigma^{2}+Rt/3}\right).

(131)

A Generative Model for Extremely Sparse Edge-Exchangeable Networks

Abstract

1 Introduction

2 Edge-exchangeable graph sequences

2.1 Permutation Invariance to Edge Arrival Order

Definition 2.1 (Cai et al., 2016 Definition 2.4).

2.2 A Bayesian Nonparametric Model

Proposition 2.2 (Cai et al., 2016).

Proof.

3 Extreme Sparsity

Theorem 3.1.

4 Beta process with rapid variation

Proposition 4.1.

5 Simulation of the RapidBeta Process

5.1 Truncated RapidBeta Process

5.2 Partitioned Thinning Scheme

5.3 Correctness

Theorem 5.1.

5.4 Truncation error induced by the ε\varepsilon-approximation

Proposition 5.2 (Discarded mass under truncation).

5.5 Diagnostics

6 Simulations

7 Conclusion

Acknowledgements

References

Appendix A Asymptotic notation

Appendix B Main lemma

Lemma B.1.

Proof.

Edge count.

Vertex count.

Appendix C Proof of Theorem 3.1

Theorem C.1.

Proof.

Preliminary.

Edge count.

Vertex count.

Sparsity results.

Appendix D Proof of Proposition 4.1

Appendix E Conditions

Divergence of ∫01ν​(w)​dw\int_{0}^{1}\nu(w)\mathrm{d}w

Convergence of ∫01w​ν​(w)​dw\int_{0}^{1}w\nu(w)\,\mathrm{d}w.

Appendix F Sampling algorithm

F.1 Proofs

F.1.1 Proof of Theorem 5.1

F.1.2 Proof of Proposition 5.2

Appendix G Background on rapidly varying function

G.1 Definitions

Definition G.1 (Slowly varying function).

Definition G.2 (Regularly varying function).

G.2 Karamata theorems

Proposition G.3 (Karamata theorem).

Proposition G.4.

Proof.

Proposition G.5 (Proposition 1.5.9b, Bingham et al., 1987).

G.3 Asymptotic inverse

Theorem G.6 (Theorem 1.5.12, Bingham et al., 1987).

G.4 Tauberian theorem

Theorem G.7 (Tauberian theorem).

Proof.

G.5 Other useful results on regular variation

Lemma G.8 (Lemma 14, Gnedin et al., 2007).

Lemma G.9 (Proposition 1.3.6, Bingham et al., 1987).

Appendix H More background results

Theorem H.1.

Proof.

Theorem H.2 (Tropp 2011, Theorem 1.1; Freedman 1975, Theorem 1.6).

A Generative Model for Extremely Sparse
Edge-Exchangeable Networks

5.4 Truncation error induced by the $\varepsilon$ -approximation

Divergence of $\int_{0}^{1}\nu(w)\mathrm{d}w$

Convergence of $\int_{0}^{1}w\nu(w)\,\mathrm{d}w$ .