1 Introduction
In this paper we suggest estimators for the parameter functions in an Aalen additive hasard model, in a survival analysis setting, under the assumption of right-censored date and independent censoring.
Assuming that the interesting time to event is a continuous positive random variable, with hasard cumulative distribution function and hasard function , a possible model for , incorporating covariates, is the Aalen additive hasard model
|
|
|
|
|
(1) |
where one supposes that are (unknown) functions, and are given covariates. If the the parameter vector of function is completely unspecified, the common approach to estimating the vector of functions is to first realise that it is not possible to provide a nonparametric estimator of it directly. Instead one estimates the vector of integrated functions where , for each , cf. [1].
The disadvantage with providing estimators of instead of is that gives the total effect of covariate summed (i.e. integrated) for all times , whereas gives the effect of the covariate at the time . Clearly is more informative and is potentially more interesting e.g. for the clinical doctor that is interested in describing the effect of covariate values (e.g. LDL cholesterol) on the conditional probability of experiencing the interesting event (e.g. heart attack) at time , conditional on not having experienced it before time , by the interpretation of the hasard as
|
|
|
|
|
One possibility is to use kernel estimators, to get an estimator for from the estimator of . However, kernel estimators are somewhat ad-doc, and in particular necessitates the choice of a bandwidth.
We suggest in this paper an approach that provide an order-restricted nonparametric estimator of . One advantage with this estimator is that it is data-adaptive in that it uses an implicit bandwidth given by the data. We are furthermore able to provide limit distributions for the suggested estimator. The limit distribution is the Chernoff distribution, which is commonly featured in order-restricted nonparametric inference.
There are some previous results on this problem, [5] uses a method to monotonise the basic estimator in the Aalen model that is different from ours and show’s that his estimator is asymptotically equivalent to the standard estimator, [6] uses a slightly different additive hasard model, which does not seem to include the Aalen model, proposes an order restricted least square estimator and treat mainly computational issues.
Our estimator is to our knowledge new, and our limit distribution results are to our knowledge as well novel.
The paper is organised as follows. In Section 2 we introduce the probabilistic model for the data, as well as the inference problem that we will treat. The model gives rise to a system of stochastic differential equation, and we review the common and well known least squares solution estimator of the integrals of the unknown parameter functions . The least squares solution will serve as a starting estimator for the order restricted estimator that we present next.
Then, in Section 3 we present the component-wise least squares projection of the naive estimator arising from the starting estimator presented in Section 2, on the space of decreasing functions. These can be written as the derivative of the function , where is the least concave majorant map.
Next, in Section 4, we derive the main results of this paper, which are the limit distributions of the estimators. We start by writing as a sum of the unknown and a stochastic process . We furthermore rescale and localise the estimator which gives rise to a rescaled deterministic term and a rescaled stochastic term . In Theorem 1 we derive the process limit distribution of the dimensional rescaled process to a Gaussian stochastic process with a certain covariance structure. In Corollary 1 we state the resulting component-wise limit distributions for the individual processes , for every .
Next, in Lemma 1 we prove a result on a bound on the tail of the process that ensures that when applying the least concave majorant map on the process , the tail behaviour of that process will not affect the application of the map around the origin. In Lemma 2 we state the analog bound for the tail of the limit process that ensures the same thing for the limit process , where is fact is proportional to the uniform limit of .
Then in Theorem 2 we state one of the two main results of this paper, namely that the integral of the proposed estimators converges to a limit random variable, as
|
|
|
|
|
with a two-sided Brownian motion. The constant is specified in Theorem 2.
Next, in Theorem 3, we state the second main result of the paper, namely that the proposed estimator converges to a limit random variable, as
|
|
|
|
|
We note that the rate is , which is common in nonparametric order restricted inference, and that the limit distribution is (proportional to) the Chernoff distribution, which is common in nonparametric order restricted inference.
Finally in Section 5 we discuss the derived results.
2 The survival analysis model setting
Let be a positive continuous random variable with an unknown distribution function . We assume that models the time to an event. We assume no left-truncation for the data, and that we have the standard right-censoring, i.e. that we observe the minimum of the time and a censoring time , together with an indicator for the time being exact, .
Introduce the individual counting processes for which one has the stochastic differential equation
|
|
|
|
|
(2) |
for , where is the individual hasard function, which exists when the distribution of is absolutely continuous, and is then given by , where is the left-continuous indicator process for the individual being at risk at time , and is the individual martingale, i.e. satisfying , and where the sigma algebras is a filtration storing the information available at times .
The -algebra generated by the information depends on the amount and type of information about the observed times that is available and the amount of information about the covariates that is available. Starting with the model satisfying one needs to establish that where is the -algebra generated by the observed and available information at time . The concept of noninformative and independent censoring as well as the innovation theorem are used to establish this link. We do not make this assumptions explicit here, since they are of no relevance to us, and refer to reader to a standard reference such as [1]. We will in the sequel assume that is a filtration, containing the information available at time , and that holds so that .
A basic inference problem in survival analysis is assessing the effect of group indicators or continuous covariate measurements on the distribution of the time to an event. It is then necessary to assume that and to model for the distribution function depending on covariates . This can be done using various models. One standard model is the Aalen additive hasard model
|
|
|
|
|
(3) |
where are (unknown) functions. Thus is the unknown parameter vector of functions and is the parameter space, where is the set of integrable functions on .
The value of describes the time-instanteneous effect of covariate on the hasard function . The standard approach for estimating the ’s is to first acknowledge that they are not possible to estimate directly. Rather one estimates their integrals . In fact, one can write for the Aalen model as
|
|
|
|
|
(4) |
for , where is the individual at-risk process and is a continuous time martingale, for individual . The equations can be written on the matrix formulation
|
|
|
|
|
where is the vector of unknown functions, and is a matrix, with the ’th row of being
.
If is the (predictable) indicator that has full rank, and
|
|
|
|
|
is a (generalised) inverse, then can be estimated by Aalen’s ordinary least squares solution
|
|
|
|
|
(5) |
The interpretation of a value of the integral is less intuitive than the interpretation of the value of , since is the instantaneous effect, at time , of the covariate of the total hasard at time . Thus one would really like to get an estimate of , and this is not possible to do directly. One possibility for estimation of itself would be to do kernel smoothing, with the drawback that this is a slightly ad hoc method, with a bandwidth the user has to specify. Thus it is not automated, or data adaptive.
An alternative for estimation of , which does not necessitate bandwidth choices, and is data adaptive, is to use estimation under some nonparametric restrictions. In this paper we suggest to estimate under the assumption that each is a nonincreasing function, i.e. a function that is (not necessarily strictly) decreasing. This is an order restricted inference problem, and a nonparametric such.
4 The limit distribution results for the estimators
We first see that we can write that the -dimensional vector of estimators as
|
|
|
|
|
where is the (predictable) indicator that has full rank, where
|
|
|
|
|
is a (generalised) inverse, and where is an -vector of locally square-integrable martingales.
We center the estimator at , and define the process part of the estimator as
|
|
|
|
|
to get
|
|
|
|
|
and note that this is a slight adaptation from the partition/centering used in [4]. In fact, we have written the preliminary estimator as a sum of a deterministic part and a stochastic part . The final order restricted estimator is obtained as a coordinate-wise isotonic regression of the increments of the preliminary estimator , as defined in .
Therefore the local rescaling should be defined coordinate-wise, and it is in fact enough to study the coordinate-wise partition
|
|
|
|
|
and the component-wise rescaling
|
|
|
|
|
and to establish limit properties for the rescaled , for the results that we will develop here. In particular we want to establish local limit distribution results as well as certain truncation properties for the rescaled process . We are however able to establish the limit distributions for the full vector valued rescaled process, and since that result may be of independent interest, we will state this result. The coordinate wise property will then be a corollary of that result.
We will make frequent referencing to the Cramér-Wold device for processes, that , a -dimensional stochastic process converging in distribution to a -dimensional Gaussian process, is equivalent to the weak convergence of the one-dimensional process to , for every choice of of .
In fact we are going to adapt the proof of Theorem VII of [1] which states that the -dimensional process converges to a Gaussian process, say , with a certain covariance structure, to our settings. This implies, by the Cramér-Wold device and since a Gaussian process is determined by it’s expectation and covariance function, that the ’th coordinate will converge to a Gaussian process with a covariance structure determined from the covariance structure of the full process . One could therefeore rescale the ’th coordinate process and establish limit distributions for that process. As already mentioned, we will instead rescale the full process and invoke the Cramér-Wold device subsequentaly.
Thus let us define the full rescaled process part
|
|
|
For the local limit distribution results we will adapt the proof of Theorem VII.4.1 of [1] to our settings, and we will establish the limit distribution result under the same assumptions as those in Theorem VII.4.1 Thus we define
for , the functions
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Let be arbitrary, so that is an arbitrary compact set.
Assumption 1
For all , there exist continuous functions
such that as :
|
|
|
|
|
|
|
|
|
|
|
|
Assumption 2
For all ,
|
|
|
Assumption 3
For all , the matrix is nonsingular.
Theorem 1
Suppose that Assumptions 1- 3 hold. Then
|
|
|
|
|
on , as , where is mean zero Gaussian process with covariance structure
|
|
|
|
|
where
|
|
|
|
|
Proof. Defining the matrix
|
|
|
|
|
|
|
|
|
|
the second statement of Assumption 1 implies
|
|
|
|
|
where is defined in Assumption 3, and with denoting the euclidian (matrix) norm on . By Assumption 3 the matrix is invertible, and the inverse is well defined when , and for those converges to by the continuous mapping theorem, since matrix inversion is a continuous map (under the supnorm matrix metric). Thus, since , we may partition the process part as
|
|
|
|
|
(9) |
|
|
|
|
|
Recall the definition of the rescaled process
|
|
|
for , and note that it entails that is the sum of the three integrals in with the integrals going from to , and all multiplied by . We may now use a change of variables inside the integrals,
so for and fixed we let vary and
and thus so that we obtain
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
We will now treat the three terms above separately, and show that the first and last vanish asymptotically, while the second gives rise to the asymptotic distribution.
vanishes asymptotically.
If we denote write the ’th component of the first term as , we get
|
|
|
|
|
|
|
|
|
|
for arbitrary but fixed .
Therefore the predictable variation process becomes
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
since , and since for . From the above we see that the local rescaling rate is determined by the condition
|
|
|
|
|
and thus we must have .
By Assumption 1 we have that
|
|
|
|
|
and by Assumption 3, is nonsingular. Then, since on the set of points where we have that exists, and since the matrix inverse map is a continuous map (under the supnorm metric), then by the continuous mapping theorem, for fixed and for every ,
|
|
|
|
|
Therefore, there are random variables such that
|
|
|
|
|
|
|
|
|
|
and such that , for all .
Thus, for ,
|
|
|
|
|
From the second line of Assumption 1, we have
|
|
|
and thus
|
|
|
Finally by Lenglart’s inequality, for all ,
|
|
|
|
|
which shows that that converges in probability to zero, uniformly on , as claimed.
gives the limiting behaviour.
We derive the asymptotic normality of by first establishing the limit in probability of the predictable covariation processes of and then by checking the Lindeberg condition for .
Let denote the ’th component of , for . Thus
|
|
|
|
|
(10) |
We first establish the asymptotic limit of the quadratic covariation processes. For , the predictable quadratic covariation between components and is
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Since
|
|
|
this becomes
|
|
|
|
|
|
|
|
|
|
Since, by Assumption 2,
|
|
|
|
|
we obtain
|
|
|
|
|
The result also shows that the asymptotic covariance of and is given by the right hand side of the above expression. Similarly, if we let denote the process obtained as the limit in distribution of , using the conditional indepedence of the increments of the martingale difference sequence, we can show that for ,
|
|
|
|
|
|
(11) |
We next verify the Lindeberg condition for , for establishing the asymptotic normality. We have, with the choice , and the th component, defined in ,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
where the last inequality follows by expanding the square and the triangle inequality.
Under Assumption 1, are continuous functions on the compact , and thus they are bounded.
Furthermore, from Chebyshev’s inequality we get,
|
|
|
|
|
|
|
|
|
|
Thus, our proof is accomplished due to the uniform convergence in probability to zero of the function, in Assumption 2.
: The term is asymptotically negligible.
From the definition of , we see that
|
|
|
|
|
Recall that the process is the indicator that has full rank.
Define the set
|
|
|
We have established that is nonsingular, and hence is invertible on ,
and therefore for all .
Thus, with the choise ,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
where the inequality follows since the first term vanishes since on . Consequently,
|
|
|
|
|
as , i.e. is asymptotically negligible.
The coordinate-wise result now follows by the Cramér-Wold device. Recall that
|
|
|
|
|
is the coordinate-wise rescaled process, for .
Corollary 1
Suppose that Assumptions 1- 3 hold. Then, for any ,
|
|
|
|
|
on , as , where is mean zero Gaussian process with covariance structure
|
|
|
|
|
where
|
|
|
|
|
Proof. The result follows from the previous theorem and the Cramér-Wold device with choice of coefficients , and for .
The corollary thus establishes Assumption A1 of [4] for .
Note 1
We note that the limit process can be identified with a (two-sided) Brownian motion with covariance structure , with defined above.
Next we define ’th coordinate of the rescaled deterministic part
|
|
|
|
|
|
|
|
|
|
We will first show that satisfies Assumption A2 of [4], i.e. that for every finite there is an such that
|
|
|
|
|
(12) |
as . But it is elementary so see that this holds if is differentiable with in a neighbourhood around , with .
Next we want to establish Proposition 1 of [4], from which Assumptions A3 and A4 of that paper will follow.
Lemma 1
Suppose that Assumptions 1, 2 and 3 hold and that is differentiable with in a neighbourhood around . Then Proposition 1 in [4] holds for and , i.e. they satisfy
, , such that
|
|
|
|
|
Proof. We show the result by first bounding , and then using that bound to prove, via Doob’s and Chebyshev’s inequalities and the Ito isometry and properties of , the full result.
Bounding : We have shown that satisfies Assumption A2 in [4], i.e. that holds. Then in particular, , and for , we get
|
|
|
|
|
Since and is concave, for some finite , we have that
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
for all and all . Thus we have established that for all and all ,
|
|
|
(13) |
Bounding : The proof is similar to the corresponding result for the rescaled process in [4] in the cases of rescaled partial sum processes and empirical process, for which one partitioned into intervals, exhibited bounds of the process at the boundaries of those intervals, and used a modulus of continuity for the processes on the intervals. We, however, will use the fact that we have martingales to our advantage by using Doob’s maximal inequality to bound the maximum over an interval by the values at its endpoints, and then the Ito isometry.
Thus we partition the tail set into dyadic intervals, by
|
|
|
|
|
with , for Then for , and using the bound , we get
|
|
|
|
|
Therefore
|
|
|
|
|
(14) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
where the last inequality follows since on we have .
We now bound the individual terms in the above sum, by
|
|
|
|
|
(15) |
|
|
|
|
|
where the first inequality follows by Chebyshev’s inequality and the second by Doob’s maximal inequality.
By the Ito isometry
|
|
|
which is bounded, by say, since is bounded in probability and is a square-integrable martingale.
Thus, from and , we get
|
|
|
|
|
|
|
|
|
|
where the last inequality follows by choosing large enough for fixed , and .
Finally we establish the tail behaviour of the components of the limit process , i.e. we prove that satisfies Assumption A5 in [4].
Lemma 2
Suppose that Assumptions 1, 2 and 3 hold and that is differentiable with in a neighbourhood around , for a fixed . Then the component of the limit process satisfies Assumption A5 in [4], i.e. for every
|
|
|
Proof. The proof is a straight-forward adaptation of the methods in the proof of Lemma 1, with the use of Doob’s and Chebyshev’s inequalities and the Ito isometry.
We are next able to state a limit distribution result for the order restricted estimator , defined in , of the cumulative function .
Theorem 2
Suppose that Assumptions 1, 2 and 3 hold and that is differentiable with in a neighbourhood around . Then
|
|
|
|
|
as , where
|
|
|
|
|
and is a standard two-sided Brownian motion.
Proof. Since we have established that Assumption A1-A5 in [4] hold, we have that
|
|
|
|
|
as , as a consequence of Theorem 1 in [4].
Furthermore, since is a two-sided Brownian motion, with , and with covariance , we have by the self similarity properties of Brownian motion that , with a standard (two-sided) Brownian motion. In fact, we can simplify the expression for limit distribution further, by the change of variable , to obtain
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
where the second equality follows by the self similarity of Brownian motion, and the third by choosing so that , i.e. with .
Finally, we use that for any function and any constant , by properties of the least concave majorant , cf. e.g. Lemma A1 in [4] (noting the typo in formula (74) in [4]; the constant must be positive), to establish that
|
|
|
|
|
Finally, noting that , proves the formula for , and ends the proof of the theorem.
In order to state the final limit distribution, for the solution , we need to study the limit process , and show that it satisfies the assumptions of Proposition 2 in [4], with the appropriate analog statements for the least concave majorant, and thus that Assumption A6 in [4] for holds. However, this has in fact been already established for the process in [4]. Thus we have the following theorem.
Theorem 3
Suppose that Assumptions 1, 2 and 3 hold and that is differentiable with in a neighbourhood around . Then
|
|
|
|
|
as , where
|
|
|
|
|
and is a standard two-sided Brownian motion.
Proof. Since we have established that Assumptions A1-A6 in [4] hold, from Theorem 2 in [4] it follows that
|
|
|
|
|
as . Rescaling and use of self similarity for the Brownian motion as in the proof of Theorem 2, shows the statement of the theorem.
Note 2
The limit distribution is a version of the Chernoff distribution , that arises in many cases of nonparametric order restricted inference.