Order restricted estimation of the parameter functions in an additive hazard model

Dragi Anevski¹¹1dragi@maths.lth.se , Center for Mathematical Sciences, Lund University
and
ElBatoul Manel Merai²²2manel-elbatoul.merai@doc.umc.edu.dz, Department of Mathematics,
Constantine 1 Brothers Mentouri University

Abstract

In this paper we propose estimators of the parameter functions in an Aalen additive hasard regression model. The estimators are the individual and componentwise $l^{2}$ projections of the naive estimators resulting from the ordinary least squares estimator in the Aalen additive hazard model on the space of monotone functions. We provide pointwise limit distribution results for the resulting estimators, that exhibit $n^{-1/3}$ rate of convergence and the Chernoff distribution as the limit distribution.

1 Introduction

In this paper we suggest estimators for the parameter functions in an Aalen additive hasard model, in a survival analysis setting, under the assumption of right-censored date and independent censoring.

Assuming that the interesting time to event $T$ is a continuous positive random variable, with hasard cumulative distribution function $F$ and hasard function $h(t)=F^{\prime}(t)/(1-F(t))$ , a possible model for $h$ , incorporating covariates, is the Aalen additive hasard model

\displaystyle h(t)

\displaystyle=

\displaystyle\beta_{0}(t)+\beta_{1}(t)z_{1}+\ldots+\beta_{p}(t)z_{p},

(1)

where one supposes that $\beta_{0},\ldots,\beta_{p}$ are (unknown) functions, and $z_{1},\ldots,z_{p}$ are given covariates. If the the parameter vector of function $\beta=(\beta_{0},\ldots,\beta_{p+1})$ is completely unspecified, the common approach to estimating the vector of functions $\beta$ is to first realise that it is not possible to provide a nonparametric estimator of it directly. Instead one estimates the vector of integrated functions $B=(B_{0},\ldots,B_{p})$ where $B_{k}(t)=\int_{0}^{t}\beta_{k}(u)\,du$ , for each $k=0,\ldots,p$ , cf. [1].

The disadvantage with providing estimators of $B$ instead of $\beta$ is that $B_{k}(t)$ gives the total effect of covariate $z_{k}$ summed (i.e. integrated) for all times $u\in[0,t]$ , whereas $\beta_{k}(t)$ gives the effect of the covariate $z_{k}$ at the time $t$ . Clearly $\beta_{k}(t)$ is more informative and is potentially more interesting e.g. for the clinical doctor that is interested in describing the effect of covariate values (e.g. LDL cholesterol) on the conditional probability of experiencing the interesting event (e.g. heart attack) at time $t$ , conditional on not having experienced it before time $t$ , by the interpretation of the hasard as

\displaystyle h(t)dt

\displaystyle=

\displaystyle P(T\leq t+dt|T>t).

One possibility is to use kernel estimators, to get an estimator for $\beta$ from the estimator of $B$ . However, kernel estimators are somewhat ad-doc, and in particular necessitates the choice of a bandwidth.

We suggest in this paper an approach that provide an order-restricted nonparametric estimator of $\beta$ . One advantage with this estimator is that it is data-adaptive in that it uses an implicit bandwidth given by the data. We are furthermore able to provide limit distributions for the suggested estimator. The limit distribution is the Chernoff distribution, which is commonly featured in order-restricted nonparametric inference.

There are some previous results on this problem, [5] uses a method to monotonise the basic estimator in the Aalen model that is different from ours and show’s that his estimator is asymptotically equivalent to the standard estimator, [6] uses a slightly different additive hasard model, which does not seem to include the Aalen model, proposes an order restricted least square estimator and treat mainly computational issues.

Our estimator is to our knowledge new, and our limit distribution results are to our knowledge as well novel.

The paper is organised as follows. In Section 2 we introduce the probabilistic model for the data, as well as the inference problem that we will treat. The model gives rise to a system of stochastic differential equation, and we review the common and well known least squares solution estimator ${\hat{B}}$ of the integrals $B=(B_{0},\ldots,B_{p})$ of the unknown parameter functions $\beta=(\beta_{0},\ldots,\beta_{p})$ . The least squares solution will serve as a starting estimator for the order restricted estimator that we present next.

Then, in Section 3 we present the component-wise least squares projection of the naive estimator arising from the starting estimator presented in Section 2, on the space of decreasing functions. These can be written as the derivative of the function $S(\hat{B}_{k})$ , where $S$ is the least concave majorant map.

Next, in Section 4, we derive the main results of this paper, which are the limit distributions of the estimators. We start by writing $\hat{B}$ as a sum of the unknown $B$ and a stochastic process $v_{n}$ . We furthermore rescale and localise the estimator $\hat{B}$ which gives rise to a rescaled deterministic term $g_{n}$ and a rescaled stochastic term $\tilde{v}_{n}$ . In Theorem 1 we derive the process limit distribution of the $p+1$ dimensional rescaled process $\tilde{v}_{n}$ to a Gaussian stochastic process $\tilde{v}$ with a certain covariance structure. In Corollary 1 we state the resulting component-wise limit distributions for the individual processes $\tilde{v}_{k,n}$ , for every $k=0,\ldots,p$ .

Next, in Lemma 1 we prove a result on a bound on the tail of the process $\tilde{v}_{k,n}$ that ensures that when applying the least concave majorant map $S$ on the process $g_{k,n}+\tilde{v}_{k,n}$ , the tail behaviour of that process will not affect the application of the map $S$ around the origin. In Lemma 2 we state the analog bound for the tail of the limit process $\tilde{v}_{k}$ that ensures the same thing for the limit process $-s^{2}+\tilde{v}(s)$ , where is fact $-s^{2}$ is proportional to the uniform limit of $g_{k,n}(s)$ .

Then in Theorem 2 we state one of the two main results of this paper, namely that the integral $\tilde{B}_{k}$ of the proposed estimators converges to a limit random variable, as

\displaystyle n^{2/3}c(t_{0})(\tilde{B}_{k}(t_{0})-B_{k}(t_{0})

\displaystyle\stackrel{{\scriptstyle d}}{{\to}}

\displaystyle S(-s^{2}+w(s))(0),

with $w(s)$ a two-sided Brownian motion. The constant $c(t_{0})$ is specified in Theorem 2.

Next, in Theorem 3, we state the second main result of the paper, namely that the proposed estimator $\tilde{\beta}_{k}$ converges to a limit random variable, as

\displaystyle n^{1/3}c(t_{0})(\tilde{\beta}_{k}(t_{0})-\beta_{k}(t_{0}))

\displaystyle\stackrel{{\scriptstyle d}}{{\to}}

\displaystyle S(-s^{2}+w(s))^{\prime}(0).

We note that the rate is $n^{1/3}$ , which is common in nonparametric order restricted inference, and that the limit distribution $S(-s^{2}+w(s))^{\prime}(0)$ is (proportional to) the Chernoff distribution, which is common in nonparametric order restricted inference.

Finally in Section 5 we discuss the derived results.

2 The survival analysis model setting

Let $T\geq 0$ be a positive continuous random variable with an unknown distribution function $F$ . We assume that $T$ models the time to an event. We assume no left-truncation for the data, and that we have the standard right-censoring, i.e. that we observe the minimum of the time $T_{i}$ and a censoring time $C_{i}$ , together with an indicator for the time being exact, $\delta_{i}=1\{T_{i}\leq C_{i}\}$ .

Introduce the individual counting processes $N_{i}(t)=1\{t_{i}\leq t,\delta_{i}=1\}$ for which one has the stochastic differential equation

\displaystyle dN_{i}(t)

\displaystyle=

\displaystyle Y_{i}(t)h(t)dt+dM_{i}(t),

(2)

for $i=1,\ldots,n$ , where $h(t)$ is the individual hasard function, which exists when the distribution of $T$ is absolutely continuous, and is then given by $h(t)=-{d}/{dt}\log(1-F(t))$ , where $Y_{i}(t)=1\{t_{i}\geq t\}$ is the left-continuous indicator process for the individual being at risk at time $t-$ , and $M_{i}$ is the individual martingale, i.e. satisfying $E(dM_{i}(t)|{\cal F}_{t})=0$ , and where the sigma algebras $\{{\cal F}_{t},t\geq\}$ is a filtration storing the information available at times $t$ .

The $\sigma$ -algebra generated by the information depends on the amount and type of information about the observed times that is available and the amount of information about the covariates that is available. Starting with the model $(\ref{eq:SDE-base})$ satisfying $E(dM_{i}(t)|{\cal F}_{t})=0$ one needs to establish that $E(dM_{i}(t)|{\cal G}_{t})=0$ where ${\cal G}_{t}$ is the $\sigma$ -algebra generated by the observed and available information at time $t$ . The concept of noninformative and independent censoring as well as the innovation theorem are used to establish this link. We do not make this assumptions explicit here, since they are of no relevance to us, and refer to reader to a standard reference such as [1]. We will in the sequel assume that ${\cal F}_{t}$ is a filtration, containing the information available at time $t$ , and that $E(dM_{i}(t)|{\cal F}_{t})=0$ holds so that $E(dN_{i}(t)|{\cal F}_{t})=Y_{i}(t)h(t)dt$ .

A basic inference problem in survival analysis is assessing the effect of group indicators or continuous covariate measurements on the distribution of the time to an event. It is then necessary to assume that and to model for the distribution function depending on covariates $z_{1},\ldots,z_{p}$ . This can be done using various models. One standard model is the Aalen additive hasard model

\displaystyle h(t)

\displaystyle=

\displaystyle\beta_{0}(t)+\beta_{1}(t)z_{1}+\ldots+\beta_{p}(t)z_{p},

(3)

where $\beta_{0},\ldots,\beta_{p}$ are (unknown) functions. Thus $\beta=(\beta_{0},\ldots,\beta_{p})$ is the unknown parameter vector of functions and $E^{p+1}$ is the parameter space, where $E=\{g:[0,\infty)\to{\mathbb R},\int g<\infty\}$ is the set of integrable functions on $[0,\infty)$ .

The value of $\beta_{k}(t)$ describes the time-instanteneous effect of covariate $z_{k}$ on the hasard function $h(t)$ . The standard approach for estimating the $\beta_{k}$ ’s is to first acknowledge that they are not possible to estimate directly. Rather one estimates their integrals $B_{k}(t)=\int_{0}^{t}\beta_{k}(u)du$ . In fact, one can write $(\ref{eq:SDE-base})$ for the Aalen model as

\displaystyle dN_{i}(t)

\displaystyle=

\displaystyle Y_{i}(t)\left\{dB_{0}(t)+dB_{1}(t)z_{1i}+\ldots+dB_{p}(t)z_{pi}\right\}+dM_{i}(t),

(4)

for $i=1,\ldots,n$ , where $Y_{i}(t)$ is the individual at-risk process and $M_{i}$ is a continuous time martingale, for individual $i=1,\ldots,n$ . The $n$ equations $(\ref{eq:SDE})$ can be written on the matrix formulation

\displaystyle dN(t)

\displaystyle=

\displaystyle Y(t)dB(t)+dM(t),

where $B(t)=(B_{0}(t),\ldots,B_{p}(t))^{t}$ is the vector of unknown functions, and $Y$ is a $n\times(p+1)$ matrix, with the $i$ ’th row of $Y$ being $Y_{i}(t)(1,z_{1i},\ldots,z_{pi})$ .

If $J(t)$ is the (predictable) indicator that $Y(t)$ has full rank, and

\displaystyle Y^{-}(t)

\displaystyle=

\displaystyle(Y(t)^{T}Y(t))^{-1}Y(t)^{T}

is a (generalised) inverse, then $B$ can be estimated by Aalen’s ordinary least squares solution

\displaystyle\hat{B}(t)

\displaystyle=

\displaystyle\int_{0}^{t}J(u)Y^{-}(u)dN(u).

(5)

The interpretation of a value of the integral $B_{k}(t)$ is less intuitive than the interpretation of the value of $\beta_{k}(t)$ , since $\beta_{k}(t)$ is the instantaneous effect, at time $t$ , of the covariate $z_{k}$ of the total hasard $h(t)$ at time $t$ . Thus one would really like to get an estimate of $\beta_{k}$ , and this is not possible to do directly. One possibility for estimation of $\beta_{k}$ itself would be to do kernel smoothing, with the drawback that this is a slightly ad hoc method, with a bandwidth the user has to specify. Thus it is not automated, or data adaptive.

An alternative for estimation of $\beta_{k}$ , which does not necessitate bandwidth choices, and is data adaptive, is to use estimation under some nonparametric restrictions. In this paper we suggest to estimate $B$ under the assumption that each $\beta_{k}$ is a nonincreasing function, i.e. a function that is (not necessarily strictly) decreasing. This is an order restricted inference problem, and a nonparametric such.

3 The order restricted estimator

We define the order restricted estimators of the $\beta_{k}$ ’s as the least squares projection of the increments of the components on the Aalen estimator $\hat{B}$ on the space of monotone functions. Thus let $\hat{B}_{k}$ be the $k$ ’th component in $\hat{B}$ , for $k=0,\ldots,p$ , and suppose that $\hat{B}_{k}$ (which is a step function) has $L_{k}$ incremental steps, at points $t_{1},\ldots,t_{L_{k}}$ . Thus $(\Delta_{1}\hat{B}_{k},\ldots,\Delta_{L_{k}}\hat{B}_{k})$ is the vector of increments, where $\Delta_{j}\hat{B}_{k}=\hat{B}_{k}(t_{j})-\hat{B}_{k}(t_{j-1})$ . Then we may introduce the naive vector of estimates $\hat{\beta}^{(k)}=(\hat{\beta}_{1}^{(k)},\ldots,\hat{\beta}_{L_{k}}^{(k)})$ , where $\hat{\beta}_{i}^{(k)}={\Delta_{j}\hat{B}_{k}}/{\Delta_{j}t}$ , where $\Delta_{j}t=t_{j}-t_{j-1}$ , and with the convention $t_{0}=0$ .

For any integer number $L$ , let ${\cal R}_{L}=\{\gamma\in{\mathbb R}^{L}:\gamma_{1}\geq\ldots\geq\gamma_{L}\}$ be the set of real vectors that have non-increasing coordinates. We then define the isotonic regression $\tilde{\beta}^{(k)}$ of $\hat{\beta}^{(k)}$ as

\displaystyle\tilde{\beta}^{(k)}

\displaystyle=

\displaystyle\mathrm{argmin}_{\gamma\in{\cal R}_{L_{k}}}\sum_{i=1}^{L_{k}}(\hat{\beta}_{i}^{(k)}-\gamma_{i})^{2}.

(6)

Finally we define the order restricted estimator $\tilde{\beta}_{k}$ as the constant interpolation of the vector $\tilde{\beta}^{(k)}$

\displaystyle\tilde{\beta}_{k}(s)

\displaystyle=

\displaystyle\sum_{t_{i}\leq s}\tilde{\beta}^{(k)}_{i}\Delta_{i}t+\tilde{\beta}^{(k)}_{j}(s-t_{j}),

where $j=\sup\{i:t_{i}\leq s\}$ is index of the largest $t_{i}\leq s$ . Standard theory for isotonic regression shows that the vector $\tilde{\beta}^{(k)}$ , and therefore $\tilde{\beta}_{k}(s)$ , exists. Furthermore, a geometric characterisation of the solution $\tilde{\beta}_{k}(s)$ is given by

\displaystyle\tilde{\beta}_{k}(t)

\displaystyle=

\displaystyle\frac{d}{dt}(S(\hat{B}_{k}(t)))

(7)

where $S$ is the least concave majorant map and $d/dt$ denotes the left hand derivative. We note also that the corresponding cumulative function

\displaystyle\tilde{B}_{k}(t)

\displaystyle=

\displaystyle S(\hat{B}_{k}(t))

(8)

is an order restricted estimator of the cumulative function $B_{k}$ , and that it is concave.

4 The limit distribution results for the estimators

We first see that we can write that the $p+1$ -dimensional vector of estimators $\hat{B}$ as

\displaystyle\hat{B}(t)

\displaystyle=

\displaystyle\int_{0}^{t}J(u)Y^{-}(u)dM(u),

where $J(t)$ is the (predictable) indicator that $Y(t)$ has full rank, where

\displaystyle Y^{-}(t)

\displaystyle=

\displaystyle(Y(t)^{T}Y(t))^{-1}Y(t)^{T}

is a (generalised) inverse, and where $M(t)$ is an $n$ -vector of locally square-integrable martingales.

We center the estimator $\hat{B}$ at $B$ , and define the process part of the estimator as

\displaystyle v_{n}(t)

\displaystyle=

\displaystyle\int_{0}^{t}J(u)Y^{-}(u)dM(u)-B(t).

to get

\displaystyle\hat{B}(t)

\displaystyle=

\displaystyle B(t)+v_{n}(t),

and note that this is a slight adaptation from the partition/centering used in [4]. In fact, we have written the preliminary estimator $\hat{B}(t)$ as a sum of a deterministic part $B(t)$ and a stochastic part $v_{n}(t)$ . The final order restricted estimator is obtained as a coordinate-wise isotonic regression of the increments of the preliminary estimator $\hat{B}(t)$ , as defined in $(\ref{eq:def-isotonic-regression})$ .

Therefore the local rescaling should be defined coordinate-wise, and it is in fact enough to study the coordinate-wise partition

\displaystyle\hat{B}_{k}(t)

\displaystyle=

\displaystyle B_{k}(t)+v_{k,n}(t)

and the component-wise rescaling

\displaystyle\tilde{v}_{k,n}(s)

\displaystyle=

\displaystyle d_{n}^{-2}\big(v_{k,n}(t_{0}+sd_{n})-v_{k,n}(t_{0})\big),

and to establish limit properties for the rescaled $\tilde{v}_{k,n}$ , for the results that we will develop here. In particular we want to establish local limit distribution results as well as certain truncation properties for the rescaled process $\tilde{v}_{k,n}$ . We are however able to establish the limit distributions for the full vector valued rescaled process, and since that result may be of independent interest, we will state this result. The coordinate wise property will then be a corollary of that result.

We will make frequent referencing to the Cramér-Wold device for processes, that $W_{n}$ , a $d$ -dimensional stochastic process converging in distribution to a $d$ -dimensional Gaussian $W$ process, is equivalent to the weak convergence of the one-dimensional process $\alpha_{1}W_{1,n}+\ldots+\alpha_{d}W_{d,n}$ to $\alpha_{1}W_{1}+\ldots+\alpha_{d}W_{d}$ , for every choice of of $\alpha_{1},\ldots,\alpha_{d}$ .

In fact we are going to adapt the proof of Theorem VII of [1] which states that the $p+1$ -dimensional process $v_{n}$ converges to a Gaussian process, say $v$ , with a certain covariance structure, to our settings. This implies, by the Cramér-Wold device and since a Gaussian process is determined by it’s expectation and covariance function, that the $k$ ’th coordinate $v_{k,n}$ will converge to a Gaussian process $v_{k}$ with a covariance structure determined from the covariance structure of the full process $v$ . One could therefeore rescale the $k$ ’th coordinate process $v_{k,n}$ and establish limit distributions for that process. As already mentioned, we will instead rescale the full process and invoke the Cramér-Wold device subsequentaly.

Thus let us define the full rescaled process part

\displaystyle\tilde{v}_{n}(s)=d_{n}^{-2}\big(v_{n}(t_{0}+sd_{n})-v_{n}(t_{0})\big).

For the local limit distribution results we will adapt the proof of Theorem VII.4.1 of [1] to our settings, and we will establish the limit distribution result under the same assumptions as those in Theorem VII.4.1 Thus we define for $j,k,l=0,1,\ldots,p$ , the functions

$\displaystyle R^{(1)}_{j}(t)$	$\displaystyle=$	$\displaystyle\sum_{i=1}^{n}Y_{i}(t)Z_{ij}(t),$
$\displaystyle R^{(2)}_{jk}(t)$	$\displaystyle=$	$\displaystyle\sum_{i=1}^{n}Y_{ij}(t)Y_{ik}(t),$
$\displaystyle R^{(3)}_{jkl}(t)$	$\displaystyle=$	$\displaystyle\sum_{i=1}^{n}Y_{ij}(t)Y_{ik}(t)Y_{il}(t).$

Let $0<s^{\prime}<\infty$ be arbitrary, so that $[0,s^{\prime}]$ is an arbitrary compact set.

Assumption 1

For all $j,k,l=0,1,\ldots,p$ , there exist continuous functions $r^{(1)}_{j},r^{(2)}_{jk},r^{(3)}_{jkl}$ such that as $n\to\infty$ :

	$\displaystyle\sup_{s\in[0,s^{\prime}]}\left\|\frac{1}{n}R^{(1)}_{j}(s)-r^{(1)}_{j}(s)\right\|$	$\displaystyle\xrightarrow{P}0,$
	$\displaystyle\sup_{s\in[0,s^{\prime}]}\left\|\frac{1}{n}R^{(2)}_{jk}(s)-r^{(2)}_{jk}(s)\right\|$	$\displaystyle\xrightarrow{P}0,$
	$\displaystyle\sup_{s\in[0,s^{\prime}]}\left\|\frac{1}{n}R^{(3)}_{jkl}(s)-r^{(3)}_{jkl}(s)\right\|$	$\displaystyle\xrightarrow{P}0.$

Assumption 2

For all $j=0,1,\ldots,p$ ,

\displaystyle\frac{1}{\sqrt{n}}\sup_{i=1,\dots,n}\sup_{s\in[0,s^{\prime}]}|Y_{ij}(s)|\xrightarrow{P}0.

Assumption 3

For all $s\in[0,s^{\prime}]$ , the matrix $r^{(2)}(s)=\big(r^{(2)}_{jk}(s)\big)$ is nonsingular.

Theorem 1

Suppose that Assumptions 1- 3 hold. Then

\displaystyle\tilde{v}_{n}(s)

\displaystyle\stackrel{{\scriptstyle d}}{{\to}}

\displaystyle\tilde{v}(s)

on $D^{p+1}(-c,c)$ , as $n\to\infty$ , where $\tilde{v}$ is mean zero Gaussian process with covariance structure

\displaystyle Cov(\tilde{v}_{j}(s^{\prime}),\tilde{v}_{k}(s^{\prime\prime})

\displaystyle=

\displaystyle\sigma_{j,k}\,\min(s^{\prime},s^{\prime\prime}),

where

\displaystyle\sigma_{j,k}

\displaystyle=

\displaystyle\sum_{g,l,m=0}^{p}(r^{(2)}(t_{0}))^{-1}_{jl}(r^{(2)}(t_{0}))^{-1}_{km}r^{(3)}_{lmg}(t_{0})\beta_{g}(t_{0}).

Proof. Defining the matrix

	$\displaystyle R^{(2)}(t)$	$\displaystyle:=$	$\displaystyle\sum_{i=1}^{n}Y_{i}(t)Y_{i}^{T}(t)$
		$\displaystyle=$	$\displaystyle Y(t)^{T}Y(t),$

the second statement of Assumption 1 implies

\displaystyle\sup_{s\in[0,s^{\prime}]}||\frac{1}{n}R^{(2)}(s)-r^{(2)}(s)||

\displaystyle\stackrel{{\scriptstyle P}}{{\to}}

\displaystyle 0

where $r^{(2)}$ is defined in Assumption 3, and with $||\cdot||$ denoting the euclidian (matrix) norm on ${\mathbb R}^{p+1}\times{\mathbb R}^{p+1}$ . By Assumption 3 the matrix $r^{(2)}$ is invertible, and the inverse $(\frac{1}{n}R^{(2)}(s))^{-1}$ is well defined when $J(s)=1$ , and for those $s$ converges to $(r^{(2)}(s))^{-1}$ by the continuous mapping theorem, since matrix inversion is a continuous map (under the supnorm matrix metric). Thus, since $Y^{-}(t)=(Y(t)^{T}Y(t))^{-1}Y(t)^{T}$ , we may partition the process part as

	$\displaystyle v_{n}(t)$	$\displaystyle=$	$\displaystyle\int_{0}^{t}J(u)\Big[\big(\tfrac{1}{n}R^{(2)}(u)\big)^{-1}-\big(r^{(2)}(u)\big)^{-1}\Big]Y^{T}(u)dM(u)$		(9)
			$\displaystyle+\int_{0}^{t}J(u)\big(r^{(2)}(u)\big)^{-1}Y^{T}(u)dM(u)+\int_{0}^{t}(J(u)-1)\beta(u)du.$		(9)

Recall the definition of the rescaled process

\displaystyle\tilde{v}_{n}(s)=d_{n}^{-2}\big(v_{n}(t_{0}+sd_{n})-v_{n}(t_{0})\big),\quad

for $s\in[-c,c]$ , and note that it entails that $\tilde{v}_{n}(s)$ is the sum of the three integrals in $(\ref{eq:process-partition})$ with the integrals going from $t_{0}$ to $t_{0}+sd_{n}$ , and all multiplied by $d_{n}^{-2}$ . We may now use a change of variables inside the integrals, so for $u\in[t_{0},t_{0}+sd_{n}]$ and $s$ fixed we let $s^{\prime}\in[0,s]$ vary and and thus $du=d_{n}ds^{\prime}$ so that we obtain

$\displaystyle\tilde{v}_{n}(s)$	$\displaystyle=$	$\displaystyle\frac{d_{n}^{-2}}{\sqrt{n}}\frac{1}{\sqrt{n}}\int_{0}^{s}J(t_{0}+s^{\prime}d_{n})\Big(\big(\tfrac{1}{n}R^{(2)}(t_{0}+s^{\prime}d_{n})\big)^{-1}-\big(r^{(2)}(t_{0}+s^{\prime}d_{n})\big)^{-1}\Big)\cdot$
		$\displaystyle\cdot Y^{T}(t_{0}+s^{\prime}d_{n})dM(t_{0}+s^{\prime}d_{n})$
		$\displaystyle+\frac{d_{n}^{-2}}{\sqrt{n}}\left[\frac{1}{\sqrt{n}}\int_{0}^{s}J(t_{0}+s^{\prime}d_{n})(r^{(2)}(t_{0}+s^{\prime}d_{n}))^{-1}Y^{T}(t_{0}+s^{\prime}d_{n})dM(t_{0}+s^{\prime}d_{n})\right]$
		$\displaystyle+\frac{d_{n}^{-2}}{\sqrt{n}}\left[\sqrt{n}\int_{0}^{s}(J(t_{0}+s^{\prime}d_{n})-1)\beta(t_{0}+s^{\prime}d_{n})d_{n}ds^{\prime}\right]$
	$\displaystyle=:$	$\displaystyle\tilde{v}_{n}^{(1)}(s)+\tilde{v}_{n}^{(2)}(s)+\tilde{v}_{n}^{(3)}(s).$

We will now treat the three terms $\tilde{v}_{n}^{(1)},\tilde{v}_{n}^{(2)},\tilde{v}_{n}^{(3)}$ above separately, and show that the first and last vanish asymptotically, while the second $\tilde{v}_{n}^{(2)}$ gives rise to the asymptotic distribution.

$(i):\tilde{v}_{n}^{(1)}$ vanishes asymptotically.

If we denote write the $j$ ’th component of the first term $\tilde{v}_{n}^{(1)}$ as $\tilde{v}_{n,j}^{(1)}$ , we get

	$\displaystyle\tilde{v}_{n,j}^{(1)}(s^{\prime})$	$\displaystyle=$	$\displaystyle\frac{d_{n}^{-2}}{\sqrt{n}}\frac{1}{\sqrt{n}}\int_{0}^{s^{\prime}}J(t_{0}+sd_{n})\sum_{i=1}^{n}\sum_{l=0}^{p}\Big[\big(\tfrac{1}{n}R^{(2)}(t_{0}+sd_{n})\big)^{-1}$
			$\displaystyle-\big(r^{(2)}(t_{0}+sd_{n})\big)^{-1}\Big]_{jl}Y_{il}(t_{0}+sd_{n})dM_{i}(t_{0}+sd_{n}),$

for arbitrary but fixed $j=0,\ldots,p$ . Therefore the predictable variation process becomes

$\displaystyle\langle\tilde{v}_{n,j}^{(1)}\rangle(s^{\prime})$	$\displaystyle=$	$\displaystyle\frac{d_{n}^{-4}}{n}\frac{1}{n}\int_{0}^{s^{\prime}}J(t_{0}+sd_{n})\sum_{i=1}^{n}\Bigg\{\sum_{l=0}^{p}\Big(\big(\tfrac{1}{n}R^{(2)}(t_{0}+sd_{n})\big)^{-1}$
		$\displaystyle-\big(r^{(2)}(t_{0}+sd_{n})\big)^{-1}\Big)_{jl}\cdot Y_{il}(t_{0}+sd_{n})\Bigg\}^{2}d\langle M_{i}\rangle(t_{0}+sd_{n})$
	$\displaystyle=$	$\displaystyle\frac{d_{n}^{-4}}{n}\frac{1}{n}\int_{0}^{s^{\prime}}J(t_{0}+sd_{n})\sum_{i=1}^{n}\Bigg\{\sum_{l=0}^{p}\Big(\big(\tfrac{1}{n}R^{(2)}(t_{0}+sd_{n})\big)^{-1}$
		$\displaystyle-\big(r^{(2)}(t_{0}+sd_{n})\big)^{-1}\Big)_{jl}\cdot Y_{il}(t_{0}+sd_{n})\Bigg\}^{2}d_{n}\lambda_{i}(t_{0})ds,$

since $d\langle M_{i}\rangle(t_{0}+sd_{n})=d_{n}\lambda_{i}(t_{0})ds$ , and since $d\langle M_{i},M_{j}\rangle=0$ for $i\neq j$ . From the above we see that the local rescaling rate $d_{n}=n^{-\alpha}$ is determined by the condition

\displaystyle\frac{d_{n}^{-4}}{n}\cdot d_{n}

\displaystyle=

\displaystyle\frac{d_{n}^{-3}}{n}=1

and thus we must have $d_{n}=n^{-1/3}$ .

By Assumption 1 we have that

\displaystyle\sup\limits_{s\in[0,s^{{}^{\prime}}]}||\displaystyle\frac{1}{n}R^{(2)}(t_{0}+sd_{n})-r^{(2)}(t_{0}+sd_{n})||

\displaystyle\stackrel{{\scriptstyle P}}{{\rightarrow}}

\displaystyle 0

and by Assumption 3, $r^{(2)}(t_{0})$ is nonsingular. Then, since on the set of points $t$ where $J(t)=1$ we have that $R^{(2)}(t)^{-1}$ exists, and since the matrix inverse map is a continuous map (under the supnorm metric), then by the continuous mapping theorem, for $j$ fixed and for every $l=0,\ldots,p$ ,

\displaystyle\sup_{s\in[0,s^{\prime}]}|J(t_{0}+sd_{n})\Big((\tfrac{1}{n}R^{(2)}(t_{0}+sd_{n}))^{-1}-(r^{(2)}(t_{0}+sd_{n}))^{-1}\Big)_{jl}|

\displaystyle\stackrel{{\scriptstyle P}}{{\rightarrow}}

\displaystyle 0.

Therefore, there are random variables $C_{0}^{(n)},\ldots,C_{p}^{(n)}$ such that

			$\displaystyle\sup_{s\in[0,s^{\prime}]}\|J(t_{0}+sd_{n})\sum_{l=0}^{p}\Big((\tfrac{1}{n}R^{(2)}(t_{0}+sd_{n}))^{-1}-(r^{(2)}(t_{0}+sd_{n}))^{-1}\Big)_{jl}Y_{il}(t_{0}+sd_{n})\|$
		$\displaystyle\leq$	$\displaystyle\sum_{l=0}^{p}C_{l}^{(n)}Y_{il}(t_{0}+sd_{n}),$

and such that $C_{l}^{(n)}=o_{P}(1)$ , for all $l=0,\ldots,p$ .

Thus, for $d_{n}=n^{-1/3}$ ,

\displaystyle\langle\tilde{v}_{n,j}^{(1)}\rangle(s^{\prime})

\displaystyle\leq

\displaystyle\int_{0}^{s^{\prime}}J(t_{0}+sd_{n})\sum_{l,k=0}^{p}C_{l}^{(n)}C_{k}^{{n}}\frac{1}{n}\sum_{i=1}^{n}Y_{il}(t_{0}+sd_{n})Y_{ik}(t_{0}+sd_{n})\lambda_{i}(t_{0}+sd_{n})ds.

From the second line of Assumption 1, we have

\displaystyle P\left(\sup_{s\in[0,s^{\prime}]}\Big|\frac{1}{n}\sum_{i=1}^{n}Y_{il}(t_{0}+sd_{n})Y_{ik}(t_{0}+sd_{n})-r^{(2)}_{kl}(t_{0}+sd_{n})\Big|>\varepsilon\right)\to 0,

and thus

\displaystyle\langle\tilde{v}_{n,j}^{(1)}\rangle(s^{\prime})\xrightarrow{P}0.

Finally by Lenglart’s inequality, for all $\varepsilon,\eta>0$ ,

\displaystyle P\left(\sup_{s\in[0,s^{\prime}]}|\tilde{v}_{n,j}^{(1)}(s)|>\varepsilon\right)

\displaystyle\leq

\displaystyle\frac{\eta}{\varepsilon^{2}}+P\big(\langle\tilde{v}_{n,j}^{(1)}\rangle(s^{\prime})>\eta\big),

which shows that that $\tilde{v}_{n,j}^{(1)}$ converges in probability to zero, uniformly on $[0,s^{\prime}]$ , as claimed.

$(ii):\tilde{v}_{n}^{(2)}$ gives the limiting behaviour.

We derive the asymptotic normality of $\tilde{v}_{n}^{(2)}$ by first establishing the limit in probability of the predictable covariation processes of $\tilde{v}_{n}^{(2)}$ and then by checking the Lindeberg condition for $\tilde{v}_{n}^{(2)}$ .

Let $\tilde{v}_{n,j}^{(2)}$ denote the $j$ ’th component of $\tilde{v}_{n}^{(2)}$ , for $j=0,\ldots,p$ . Thus

\displaystyle\tilde{v}_{n,j}^{(2)}(s^{\prime})

\displaystyle=

\displaystyle\frac{d_{n}^{-2}}{\sqrt{n}}\frac{1}{\sqrt{n}}\int_{0}^{s^{\prime}}\sum_{i=1}^{n}\sum_{l=0}^{p}(r^{(2)}(t_{0}+sd_{n}))^{-1}_{jl}Y_{il}(t_{0}+sd_{n})dM_{i}(t_{0}+sd_{n}).

(10)

We first establish the asymptotic limit of the quadratic covariation processes. For $d_{n}=n^{-1/3}$ , the predictable quadratic covariation between components $j$ and $k$ is

$\displaystyle\langle\tilde{v}_{n,j}^{(2)},\tilde{v}_{n,k}^{(2)}\rangle(s^{\prime})$	$\displaystyle=$	$\displaystyle\frac{d_{n}^{-4}}{n}\frac{1}{n}\sum_{i=1}^{n}\int_{0}^{s^{\prime}}\sum_{l,m=0}^{1}(r^{(2)}(t_{0}+sd_{n}))^{-1}_{jl}(r^{(2)}(t_{0}+sd_{n}))^{-1}_{km}$
		$\displaystyle\cdot Y_{il}(t_{0}+sd_{n})Y_{im}(t_{0}+sd_{n})d\left\langle M_{i}\right\rangle(t_{0}+sd_{n})$
	$\displaystyle=$	$\displaystyle\frac{1}{n}\sum_{i=1}^{n}\int_{0}^{s^{\prime}}\sum_{l,m=0}^{1}(r^{(2)}(t_{0}+sd_{n}))^{-1}_{jl}(r^{(2)}(t_{0}+sd_{n}))^{-1}_{km}$
		$\displaystyle\cdot Y_{il}(t_{0}+sd_{n})Y_{im}(t_{0}+sd_{n})\lambda_{i}(t_{0}+sd_{n})ds.$

Since

\displaystyle\lambda_{i}(t_{0}+sd_{n})=\sum_{g=0}^{p}\beta_{g}(t_{0}+sd_{n})Z_{ig}(t_{0}+sd_{n})Y_{i}(t_{0}+sd_{n}),

this becomes

	$\displaystyle\langle\tilde{v}_{n,j}^{(2)},\tilde{v}^{(2)}_{n,k}\rangle(s^{\prime})$	$\displaystyle=$	$\displaystyle\frac{1}{n}\sum_{i=1}^{n}\int_{0}^{s^{\prime}}\sum_{l,m=0}^{p}(r^{(2)}(t_{0}+sd_{n}))^{-1}_{jl}(r^{(2)}(t_{0}+sd_{n}))^{-1}_{km}$
			$\displaystyle\cdot Y_{il}(t_{0}+sd_{n})Y_{im}(t_{0}+sd_{n})\sum_{g=0}^{p}\beta_{g}(t_{0}+sd_{n})Y_{ig}(t_{0}+sd_{n})ds.$

Since, by Assumption 2,

\displaystyle\sup_{s\in[0,s^{\prime}]}|\frac{1}{n}\sum_{i=1}^{n}Y_{il}Y_{im}Y_{ig}(t_{0}+sd_{n})-r^{(3)}_{lmg}(t_{0}+sd_{n})|

\displaystyle\stackrel{{\scriptstyle P}}{{\to}}

\displaystyle 0,

we obtain

\displaystyle\langle\tilde{v}_{n,j}^{(2)},\tilde{v}^{(2)}_{n,k}\rangle(s^{\prime})

\displaystyle\stackrel{{\scriptstyle P}}{{\to}}

\displaystyle\int_{0}^{s^{\prime}}\sum_{g,l,m=0}^{p}(r^{(2)}(t_{0}))^{-1}_{jl}(r^{(2)}(t_{0}))^{-1}_{km}r^{(3)}_{lmg}(t_{0})\beta_{g}(t_{0})ds.

The result also shows that the asymptotic covariance of $\tilde{v}_{n,j}^{(2)}$ and $\tilde{v}^{(2)}_{n,k}$ is given by the right hand side of the above expression. Similarly, if we let $\tilde{v}^{(2)}$ denote the process obtained as the limit in distribution of $\tilde{v}_{n}^{(2)}$ , using the conditional indepedence of the increments of the martingale difference sequence, we can show that for $s^{\prime},s^{\prime\prime}>0$ ,

	$\displaystyle Cov(\tilde{v}_{j}^{(2)}(s^{\prime}),\tilde{v}_{k}^{(2)}(s^{\prime\prime}))=$
	$\displaystyle\int_{0}^{\min(s^{\prime},s^{\prime\prime})}\sum_{g,l,m=0}^{p}(r^{(2)}(t_{0}))^{-1}_{jl}(r^{(2)}(t_{0}))^{-1}_{km}r^{(3)}_{lmg}(t_{0})\beta_{g}(t_{0})ds.$		(11)

We next verify the Lindeberg condition for $\tilde{v}_{n}^{(2)}$ , for establishing the asymptotic normality. We have, with the choice $d_{n}=n^{-1/3}$ , and $\tilde{v}_{n,j}^{(2)}$ the $j^{\prime}$ th component, defined in $(\ref{eq:v_n-tilde-j})$ ,

			$\displaystyle\sum_{j=1}^{r_{n}}\mathbb{E}\left[(\tilde{v}_{n,j}^{(2)}(s^{\prime}))^{2}\cdot\mathbf{1}_{\{\|\tilde{v}_{n,j}^{(2)}(s^{\prime})\|>\varepsilon\}}\right]\leq$
			$\displaystyle\mathbb{E}\left[\left(\displaystyle\frac{1}{n}\sum\limits_{i=1}^{n}\int\limits_{0}^{s^{{}^{\prime}}}\sum_{l=0}^{p}\left(r^{(2)}(t_{0}+sd_{n})\right)^{-1}_{jl}Y_{il}(t_{0}+sd_{n})\right)^{2}\lambda_{i}(t_{0}+sd_{n})ds\cdot\mathbf{1}_{\{\|\tilde{v}_{n,j}^{(2)}(s^{\prime})\|>\varepsilon\}}\right]$
		$\displaystyle\leq$	$\displaystyle\sum\limits_{i=1}^{r_{n}}\dfrac{1}{n}\sum\limits_{i=1}^{n}\int\limits_{0}^{s^{{}^{\prime}}}\sum\limits_{l=0}^{p}\sum\limits_{k=0}^{p}\sup\limits_{s\in[0,s^{\prime}]}\left\|r^{(2)}(t_{0}+sd_{n})\right\|^{-1}_{jl}\sup\limits_{s\in[0,s^{\prime}]}\left\|r^{(2)}(t_{0}+sd_{n})\right\|^{-1}_{jk}$
			$\displaystyle\sup\limits_{i=1,...,n,\,s\in[0,s^{\prime}]}\left\|Y_{il}(t_{0}+sd_{n})\right\|\lambda_{i}(t_{0}+sd_{n})ds\times\mathbb{E}\left(\mathbf{1}_{\left\{\left\|\tilde{v}_{n,j}^{(2)}(s^{\prime})\right\|>\varepsilon\right\}}\right)$

where the last inequality follows by expanding the square and the triangle inequality.

Under Assumption 1, $r^{(2)}$ are continuous functions on the compact $[0,s^{{}^{\prime}}]$ , and thus they are bounded.

Furthermore, from Chebyshev’s inequality we get,

			$\displaystyle P\left(\left\|\frac{d_{n}^{-2}}{\sqrt{n}}\frac{1}{\sqrt{n}}\sum\limits_{i=1}^{n}\int\limits_{0}^{s^{{}^{\prime}}}\sum\limits_{l=0}^{p}\left(r^{(2)}(t_{0}+sd_{n})\right)^{-1}_{jl}Y_{il}(t_{0}+sd_{n})\,dM_{i}(t_{0}+sd_{n})\right\|>\varepsilon\right)$
		$\displaystyle\leq$	$\displaystyle\varepsilon^{-2}E\left[\displaystyle\frac{1}{n}\sum\limits_{i=1}^{n}\int\limits_{0}^{s^{{}^{\prime}}}\left(\sum\limits_{l=0}^{p}\left(r^{(2)}(t_{0}+sd_{n})\right)^{-1}_{jl}Y_{il}(t_{0}+sd_{n})\right)^{2}\lambda_{i}(t_{0}+sd_{n})ds\right].$

Thus, our proof is accomplished due to the uniform convergence in probability to zero of the $Y_{il}$ function, in Assumption 2.

$(iii)$ : The term $\tilde{v}_{n}^{(3)}$ is asymptotically negligible.
From the definition of $\tilde{v}_{n}^{(3)}$ , we see that

\displaystyle\tilde{v}_{n}^{(3)}(s^{\prime})

\displaystyle=

\displaystyle d_{n}^{-1}\int_{0}^{s^{\prime}}\big(J(t_{0}+sd_{n})-1\big)\beta(t_{0}+sd_{n})\,ds.

Recall that the process $J(t_{0}+sd_{n})$ is the indicator that $Y^{T}(t_{0}+sd_{n})$ has full rank. Define the set

\displaystyle E_{n}=\left\{\sup_{s\in[0,s^{\prime}]}\Big\|\tfrac{1}{n}R^{(2)}(t_{0}+sd_{n})-r^{(2)}(t_{0}+sd_{n})\Big\|<\varepsilon\right\}.

We have established that $r^{(2)}(t)$ is nonsingular, and hence $\tfrac{1}{n}R^{(2)}(t_{0}+sd_{n})$ is invertible on $E_{n}$ , and therefore $J(t_{0}+sd_{n})=1$ for all $s\in[0,s^{\prime}]$ .

Thus, with the choise $d_{n}=n^{-1/3}$ ,

			$\displaystyle P\left(\sup_{s\in[0,s^{\prime}]}\left\|{d_{n}}^{-1}\int_{0}^{s^{\prime}}\big(J(t_{0}+sd_{n})-1\big)\beta(t_{0}+sd_{n})\,ds\right\|>\varepsilon\right)$
		$\displaystyle=$	$\displaystyle P\left(\sup_{s\in[0,s^{\prime}]}\left\|{d_{n}}^{-1}\int_{0}^{s^{\prime}}\big(J(t_{0}+sd_{n})-1\big)\beta(t_{0}+sd_{n})\,ds\right\|>\varepsilon\cap E_{n}\right)$
			$\displaystyle+P\left(\sup_{s\in[0,s^{\prime}]}\left\|{d_{n}}^{-1}\int_{0}^{s^{\prime}}\big(J(t_{0}+sd_{n})-1\big)\beta(t_{0}+sd_{n})\,ds\right\|>\varepsilon\cap E_{n}^{c}\right)$
		$\displaystyle\leq$	$\displaystyle P(E_{n}^{c})$
		$\displaystyle\to$	$\displaystyle 0,$

where the inequality follows since the first term vanishes since $J(t_{0}+sd_{n})=1$ on $E_{n}$ . Consequently,

\displaystyle P\left(\sup_{s\in[0,s^{\prime}]}|\tilde{v}_{n}^{(3)}|>\varepsilon\right)

\displaystyle\longrightarrow

\displaystyle 0,

as $n\to\infty$ , i.e. $\tilde{v}_{n}^{(3)}$ is asymptotically negligible.

$\Box$

The coordinate-wise result now follows by the Cramér-Wold device. Recall that

\displaystyle\tilde{v}_{k,n}(s)

\displaystyle=

\displaystyle d_{n}^{-2}\big(v_{k,n}(t_{0}+sd_{n})-v_{k,n}(t_{0})\big),

is the coordinate-wise rescaled process, for $k=0,\ldots,p$ .

Corollary 1

Suppose that Assumptions 1- 3 hold. Then, for any $k=0,\ldots,p$ ,

\displaystyle\tilde{v}_{k,n}(s)

\displaystyle\stackrel{{\scriptstyle d}}{{\to}}

\displaystyle\tilde{v}_{k}(s),

on $D(-c,c)$ , as $n\to\infty$ , where $\tilde{v}$ is mean zero Gaussian process with covariance structure

\displaystyle Cov(\tilde{v}_{k}(s^{\prime}),\tilde{v}_{k}(s^{\prime\prime})

\displaystyle=

\displaystyle\sigma^{2}_{k}\,\min(s^{\prime},s^{\prime\prime}),

where

\displaystyle\sigma^{2}_{k}

\displaystyle=

\displaystyle\sum_{g,l,m=0}^{p}(r^{(2)}(t_{0}))^{-1}_{kl}(r^{(2)}(t_{0}))^{-1}_{km}r^{(3)}_{lmg}(t_{0})\beta_{g}(t_{0}).

Proof. The result follows from the previous theorem and the Cramér-Wold device with choice of coefficients $\alpha_{k}=1$ , and $\alpha_{j}=0$ for $j\neq k$ . $\Box$

The corollary thus establishes Assumption A1 of [4] for $\tilde{v}_{k,n}$ .

Note 1

We note that the limit process $\tilde{v}_{k}$ can be identified with a (two-sided) Brownian motion with covariance structure $Cov(\tilde{v}_{k}(s^{\prime}),\tilde{v}_{k}(s^{\prime\prime})=\sigma^{2}_{k}\,\min(s^{\prime},s^{\prime\prime})$ , with $\sigma^{2}_{k}$ defined above. $\Box$

Next we define $k$ ’th coordinate of the rescaled deterministic part

	$\displaystyle g_{k,n}(s)$	$\displaystyle=$	$\displaystyle d_{n}^{-2}\left(\int_{t_{0}}^{t_{0}+sd_{n}}Y^{-1}(u)Y(u)\,dB(u)-Y^{-1}(t_{0})Y(t_{0})\,dB(t_{0})sd_{n}\right)_{k}$
		$\displaystyle=$	$\displaystyle d_{n}^{-2}\left(\int_{t_{0}}^{t_{0}+sd_{n}}\,dB_{k}(u)-\beta_{k}(t_{0})sd_{n}\right)\ .$

We will first show that $g_{k,n}$ satisfies Assumption A2 of [4], i.e. that for every finite $c>0$ there is an $A_{k}<0$ such that

\displaystyle\sup_{|s|\leq c}\left|g_{k,n}(s)-A_{k}s^{2}\right|

\displaystyle\to

\displaystyle 0

(12)

as $n\to\infty$ . But it is elementary so see that this holds if $\beta_{k}$ is differentiable with $\beta_{k}^{\prime}<0$ in a neighbourhood around $t_{0}$ , with $A_{k}=\beta^{\prime}_{k}(t_{0})$ .

Next we want to establish Proposition 1 of [4], from which Assumptions A3 and A4 of that paper will follow.

Lemma 1

Suppose that Assumptions 1, 2 and 3 hold and that $\beta_{k}$ is differentiable with $\beta_{k}^{\prime}<0$ in a neighbourhood around $t_{0}$ . Then Proposition 1 in [4] holds for $\tilde{v}_{k,n}$ and $g_{k,n}$ , i.e. they satisfy $\forall\varepsilon,\delta>0$ , $\exists\tau=\tau(\varepsilon,\delta)<\infty$ , such that

\displaystyle\limsup_{n\to\infty}\mathbb{P}\left(\sup_{|s|\geq\tau}\left|\frac{\tilde{v}_{k,n}(s)}{g_{k,n}(s)}\right|>\varepsilon\right)

\displaystyle<

\displaystyle\delta,

Proof. We show the result by first bounding $g_{k,n}$ , and then using that bound to prove, via Doob’s and Chebyshev’s inequalities and the Ito isometry and properties of $\tilde{v}_{k,n}$ , the full result.

$(i)$ Bounding $g_{k,n}(s)$ : We have shown that $g_{k,n}$ satisfies Assumption A2 in [4], i.e. that $(\ref{Ass:A2})$ holds. Then in particular, $\forall\tau>0$ , $0<\varepsilon<\displaystyle\frac{1}{2}|A_{k}|\tau^{2}$ and for $s=\pm\tau$ , we get

\displaystyle g_{k,n}(\pm\tau)

\displaystyle\leq

\displaystyle A_{k}\tau^{2}+\varepsilon.

Since $g_{k,n}(0)=0$ and $g_{k,n}$ is concave, for some finite $n_{0}=n_{0}(\varepsilon)$ , we have that

$\displaystyle g_{k,n}(s)$	$\displaystyle\leq$	$\displaystyle\displaystyle\frac{g_{k,n}(\tau)}{\tau}\|s\|$
	$\displaystyle\leq$	$\displaystyle\displaystyle\frac{A_{k}\tau^{2}-\varepsilon}{\tau}\|s\|$
	$\displaystyle\leq$	$\displaystyle\displaystyle\frac{1}{2}A_{k}\tau^{2-1}\|s\|,$

for all $|s|\geq\tau$ and all $n\geq n_{0}$ . Thus we have established that for all $|s|\geq\tau$ and all $n\geq n_{0}$ ,

\displaystyle g_{n}(s)\geq\displaystyle\frac{1}{2}A_{k}\tau|s|.

(13)

$(ii)$ Bounding $\tilde{v}_{k,n}(s)$ : The proof is similar to the corresponding result for the rescaled process in [4] in the cases of rescaled partial sum processes and empirical process, for which one partitioned $\{|s|\geq\tau\}$ into intervals, exhibited bounds of the process at the boundaries of those intervals, and used a modulus of continuity for the processes on the intervals. We, however, will use the fact that we have martingales to our advantage by using Doob’s maximal $L^{2}$ inequality to bound the maximum over an interval by the values at its endpoints, and then the Ito isometry.

Thus we partition the tail set $\{|s|\geq\tau\}$ into dyadic intervals, by

\displaystyle\{|s|\geq\tau\}

\displaystyle=

\displaystyle\cup_{j=0}^{\infty}B_{j},

with $B_{j}=\{s:2^{j}\tau\leq s\leq 2^{j+1}\tau\}$ , for $j=0,1,2,\ldots.$ Then for $|s|\geq\tau$ , and using the bound $(\ref{eq:gn_bound})$ , we get

\displaystyle\frac{|\tilde{v}_{k,n}(s)|}{|g_{k,n}(s)|}

\displaystyle\leq

\displaystyle\frac{2}{A_{k}}\frac{|\tilde{v}_{k,n}(s)|}{\tau|s|}.

Therefore

$\displaystyle\mathbb{P}\left(\sup_{\|s\|\geq\tau}\frac{\|\tilde{v}_{k,n}(s)\|}{\|g_{k,n}(s)\|}>\epsilon\right)$	$\displaystyle\leq$	$\displaystyle\mathbb{P}\left(\sup_{\|s\|\geq\tau}\frac{\|\tilde{v}_{k,n}(s)\|}{\|s\|}>\frac{A_{k}}{2}\epsilon\tau\right)$	(14)
	$\displaystyle\leq$	$\displaystyle\mathbb{P}\left(\cup_{j=0}^{\infty}\{\sup_{s\in B_{j}}\frac{\|\tilde{v}_{k,n}(s)\|}{\|s\|}>\frac{A_{k}}{2}\epsilon\tau\}\right)$
	$\displaystyle\leq$	$\displaystyle\sum_{j=0}^{\infty}\mathbb{P}\left(\sup_{s\in B_{j}}\frac{\|\tilde{v}_{k,n}(s)\|}{\|s\|}>\frac{A_{k}}{2}\epsilon\tau\right)$
	$\displaystyle\leq$	$\displaystyle\sum_{j=0}^{\infty}\mathbb{P}\left(\sup_{s\in B_{j}}\|\tilde{v}_{k,n}(s)\|>\epsilon\frac{A_{k}}{2}\tau^{2}2^{j}\right),$

where the last inequality follows since on $B_{j}$ we have $|s|>2^{j}\tau$ .

We now bound the individual terms in the above sum, by

	$\displaystyle\mathbb{P}\left(\sup_{s\in B_{j}}\|\tilde{v}_{k,n}(s)\|>\epsilon A_{k}\tau^{2}2^{j-1}\right)$	$\displaystyle\leq$	$\displaystyle\frac{\mathbb{E}\left(\sup_{s\in B_{j}}\tilde{v}^{2}_{k,n}(s)\right)}{(\epsilon A_{k}\tau^{2}2^{j-1})^{2}}$		(15)
		$\displaystyle\leq$	$\displaystyle\frac{4\mathbb{E}\left(\tilde{v}^{2}_{k,n}(2^{j+1}\tau)\right)}{(\epsilon A_{k}\tau^{2}2^{j-1})^{2}},$		(15)

where the first inequality follows by Chebyshev’s inequality and the second by Doob’s maximal $L^{2}$ inequality. By the Ito isometry

\displaystyle\mathbb{E}\left(\tilde{v}^{2}_{k,n}(2^{j+1}\tau)\right)=d_{n}^{-4}(\int_{t_{0}}^{t_{0}+2^{j+1}\tau d_{n}}\|Y^{-}(u)\|^{2}\,d\langle M\rangle(u))_{k},

which is bounded, by $C<\infty$ say, since $Y^{-}$ is bounded in probability and $M$ is a square-integrable martingale.

Thus, from $(\ref{eq:bounding-partition})$ and $(\ref{eq:bounding-individual})$ , we get

	$\displaystyle\mathbb{P}\left(\sup_{\|s\|\geq\tau}\frac{\|\tilde{v}_{k,n}(s)\|}{\|g_{k,n}(s)\|}>\epsilon\right)$	$\displaystyle\leq$	$\displaystyle\frac{64\,C}{(\epsilon A_{k}\tau^{2})^{2}}\sum_{j=0}^{\infty}2^{-j}$
		$\displaystyle<$	$\displaystyle\delta,$

where the last inequality follows by choosing $\tau=\tau(\epsilon,\delta)$ large enough for fixed $\epsilon,\delta>0$ , and $n\geq n_{0}$ . $\Box$

Finally we establish the tail behaviour of the components of the limit process $\tilde{v}$ , i.e. we prove that $\tilde{v}_{k}$ satisfies Assumption A5 in [4].

Lemma 2

Suppose that Assumptions 1, 2 and 3 hold and that $\beta_{k}$ is differentiable with $\beta_{k}^{\prime}<0$ in a neighbourhood around $t_{0}$ , for a fixed $k=0,1,\ldots,p$ . Then the component $\tilde{v}_{k}$ of the limit process $\tilde{v}$ satisfies Assumption A5 in [4], i.e. for every $\epsilon,\delta>0$

\displaystyle\mathbb{P}\left(\sup_{|s|\geq\tau}\frac{|\tilde{v}_{k}(s)|}{s^{2}}>\epsilon\right)

Proof. The proof is a straight-forward adaptation of the methods in the proof of Lemma 1, with the use of Doob’s and Chebyshev’s inequalities and the Ito isometry. $\Box$

We are next able to state a limit distribution result for the order restricted estimator $\tilde{B}_{k}$ , defined in $(\ref{eq:cumulative-isotonic-regression})$ , of the cumulative function $B_{k}$ .

Theorem 2

Suppose that Assumptions 1, 2 and 3 hold and that $\beta_{k}$ is differentiable with $\beta_{k}^{\prime}<0$ in a neighbourhood around $t_{0}$ . Then

\displaystyle n^{2/3}c(t_{0})(\tilde{B}_{k}(t_{0})-B_{k}(t_{0}))

\displaystyle\stackrel{{\scriptstyle d}}{{\to}}

\displaystyle S(-s^{2}+w(s))(0),

as $n\to\infty$ , where

\displaystyle c(t_{0})

\displaystyle=

\displaystyle 2^{-1/3}|\beta_{k}^{\prime}(t_{0})|^{1/3}(\sigma_{k}^{2})^{-2/3},

and $w$ is a standard two-sided Brownian motion.

Proof. Since we have established that Assumption A1-A5 in [4] hold, we have that

\displaystyle n^{2/3}[\tilde{B}_{k}(t_{0})-B_{k}(t_{0})]

\displaystyle\stackrel{{\scriptstyle d}}{{\to}}

\displaystyle S(A_{k}s^{2}+\tilde{v}_{k}(s))(0),

as $n\to\infty$ , as a consequence of Theorem 1 in [4].

Furthermore, since $\tilde{v}_{k}$ is a two-sided Brownian motion, with $\tilde{v}_{k}(0)=0$ , and with covariance $Cov(\tilde{v}_{k}(s),\tilde{v}_{k}(s^{\prime}))=\sigma_{k}^{2}\min(s,s^{\prime})$ , we have by the self similarity properties of Brownian motion that $\tilde{v}_{k}(s)\stackrel{{\scriptstyle d}}{{=}}(\sigma_{k}^{2})^{1/2}\,w(s)$ , with $w$ a standard (two-sided) Brownian motion. In fact, we can simplify the expression for limit distribution further, by the change of variable $s=\gamma u$ , to obtain

$\displaystyle A_{k}s^{2}+\tilde{v}_{k}(s)$	$\displaystyle=$	$\displaystyle A_{k}\gamma^{2}u^{2}+\tilde{v}_{k}(\gamma u)$
	$\displaystyle\stackrel{{\scriptstyle d}}{{=}}$	$\displaystyle A_{k}\gamma^{2}u^{2}+\gamma^{1/2}(\sigma_{k}^{2})^{1/2}\,w(u)$
	$\displaystyle=$	$\displaystyle(\sigma_{k}^{2})^{2/3}A_{k}^{-1/3}[-u^{2}+w(u)]$

where the second equality follows by the self similarity of Brownian motion, and the third by choosing $\gamma$ so that $-A_{k}\gamma^{2}=\gamma^{1/2}(\sigma_{k}^{2})^{1/2}$ , i.e. with $\gamma=-(\sigma_{k}^{2})^{1/3}(-A_{k})^{-2/3}$ .

Finally, we use that $S(cg(u))=cS(g(u))$ for any function $g$ and any constant $c>0$ , by properties of the least concave majorant $S$ , cf. e.g. Lemma A1 in [4] (noting the typo in formula (74) in [4]; the constant $a$ must be positive), to establish that

\displaystyle S(A_{k}s^{2}+\tilde{v}_{k}(s))

\displaystyle\stackrel{{\scriptstyle d}}{{=}}

\displaystyle(\sigma_{k}^{2})^{2/3}(-A_{k})^{-1/3}S(-s^{2}+w(s)).

Finally, noting that $-A_{k}=|\beta_{k}^{\prime}(t_{0})|/2$ , proves the formula for $c_{1}(t_{0})$ , and ends the proof of the theorem.

$\Box$

In order to state the final limit distribution, for the solution $\tilde{\beta}_{k}$ , we need to study the limit process $y(s)=-s^{2}+w(s)$ , and show that it satisfies the assumptions of Proposition 2 in [4], with the appropriate analog statements for the least concave majorant, and thus that Assumption A6 in [4] for $y(s)$ holds. However, this has in fact been already established for the process $y(s)$ in [4]. Thus we have the following theorem.

Theorem 3

Suppose that Assumptions 1, 2 and 3 hold and that $\beta_{k}$ is differentiable with $\beta_{k}^{\prime}<0$ in a neighbourhood around $t_{0}$ . Then

\displaystyle n^{1/3}c(t_{0})(\tilde{\beta}_{k}(t_{0})-\beta_{k}(t_{0}))

\displaystyle\stackrel{{\scriptstyle d}}{{\to}}

\displaystyle S(-s^{2}+w(s))^{\prime}(0),

as $n\to\infty$ , where

\displaystyle c(t_{0})

\displaystyle=

\displaystyle 2^{-1/3}|\beta_{k}^{\prime}(t_{0})|^{1/3}(\sigma_{k}^{2})^{-4/3},

and $w$ is a standard two-sided Brownian motion.

Proof. Since we have established that Assumptions A1-A6 in [4] hold, from Theorem 2 in [4] it follows that

\displaystyle n^{1/3}[\tilde{\beta}_{k}(t_{0})-\beta_{k}(t_{0})]

\displaystyle\stackrel{{\scriptstyle d}}{{\to}}

\displaystyle S(A_{k}s^{2}+\tilde{v}_{k}(s))^{\prime}(0),

as $n\to\infty$ . Rescaling and use of self similarity for the Brownian motion as in the proof of Theorem 2, shows the statement of the theorem. $\Box$

Note 2

The limit distribution $S(-s^{2}+w(s))^{\prime}(0)$ is a version of the Chernoff distribution $\mathrm{argmax}_{s\in{\mathbb R}}(-s^{2}+w(s))$ , that arises in many cases of nonparametric order restricted inference.

5 Discussion

In this paper we have derived limit distributions for the coordinate wise least squares projection of a naive estimator on the space of decreasing functions. The results are derived using a general approach presented in [4], and the main work in this paper has been to establish the necessary conditions required in [4] for the conclusions of that paper to hold. That in fact gives us our two main results Theorems 2 and 3. The conditions under which we are able to establish these results are the conditions required in [1] for the derivation of limit distributions of the starting estimator $\hat{B}$ ; thus we do not need to demand more than is demanded in [1].

One of main vehicles for this is our Theorem 1, which derives the limit distribution for the rescaled process $\tilde{v}_{n}$ . We note that the result in Theorem 1 is in fact stronger than necessary for our need, and that we only need its consequence Corollary 1.

6 Acknowledgments

The research of DA is partially supported by the Swedish Research Council (SRC). DA gratefully acknowledges the SRC’s support.

References

[1] Per Kragh Andersen, Ørnulf Borgan, Richard D. Gill and Niels Keiding (1993). Statistical Models Based on Counting Processes. Springer series in Statistics
[2] Robertson, T., Wright, F. T. and Dykstra R. L. (1988). Order restricted statistical inference. John Wiley & Sons, Ltd., Chichester.
[3] van der Vaart, A.W. (1998). Asymptotic Statistics. Cambridge University Press, New York.
[4] Anevski, D. and Hössjer, O. (2006) A general asymptotic scheme for inference under order restrictions. Annals of Statistics, 34(4): 1874-1930
[5] Yijian Huang (2017) Restoration of monotonicity respecting in dynamic regression. Journal of the American Statistical Association, 112:518, 613-622,
[6] Yunro Chung, Anastasia Ivanova and Jason P. Fine (2024) Shape restricted additive hazards models: Monotone, unimodal, and U-shaped hazard functions. Statistics in Medicine, 43:1671–1687.

			$\displaystyle\sum_{j=1}^{r_{n}}\mathbb{E}\left[(\tilde{v}_{n,j}^{(2)}(s^{\prime}))^{2}\cdot\mathbf{1}_{\{\|\tilde{v}_{n,j}^{(2)}(s^{\prime})\|>\varepsilon\}}\right]\leq$
			$\displaystyle\mathbb{E}\left[\left(\displaystyle\frac{1}{n}\sum\limits_{i=1}^{n}\int\limits_{0}^{s^{{}^{\prime}}}\sum_{l=0}^{p}\left(r^{(2)}(t_{0}+sd_{n})\right)^{-1}_{jl}Y_{il}(t_{0}+sd_{n})\right)^{2}\lambda_{i}(t_{0}+sd_{n})ds\cdot\mathbf{1}_{\{\|\tilde{v}_{n,j}^{(2)}(s^{\prime})\|>\varepsilon\}}\right]$
		$\displaystyle\leq$	$\displaystyle\sum\limits_{i=1}^{r_{n}}\dfrac{1}{n}\sum\limits_{i=1}^{n}\int\limits_{0}^{s^{{}^{\prime}}}\sum\limits_{l=0}^{p}\sum\limits_{k=0}^{p}\sup\limits_{s\in[0,s^{\prime}]}\left\|r^{(2)}(t_{0}+sd_{n})\right\|^{-1}_{jl}\sup\limits_{s\in[0,s^{\prime}]}\left\|r^{(2)}(t_{0}+sd_{n})\right\|^{-1}_{jk}$
			$\displaystyle\sup\limits_{i=1,...,n,\,s\in[0,s^{\prime}]}\left\|Y_{il}(t_{0}+sd_{n})\right\|\lambda_{i}(t_{0}+sd_{n})ds\times\mathbb{E}\left(\mathbf{1}_{\left\{\left\|\tilde{v}_{n,j}^{(2)}(s^{\prime})\right\|>\varepsilon\right\}}\right)$