Alebachew Abebe*
Department of Statistics, College of Computing and Informatics, Haramaya University, P.O.Box: 138, Dire Dawa, Ethiopia
^{*}Corresponding author: Alebachew Abebe, Department of Statistics, College of Computing and Informatics, Haramaya University, P.O.Box: 138, Dire Dawa, Ethiopia; Email: [email protected]
Received Date: May 6, 2024
Publication Date: July 17, 2024
Citation: Abebe A. (2024). Modeling Multivariate Longitudinal Factors on HIV Patients Cell Counts at Gonder University Specialized Hospital, Ethiopia. Mathews J Surgery. 7(2):32.
Copyright: Abebe A. © (2024)
ABSTRACT
Modeling of multivariate longitudinal data provides a unique opportunity in studying the joint evolution of multiple response variables over time. This study was modeling of multivariate longitudinal data cell counts on HIV infections at Gondar districts were used two different multivariate repeated measurement with a kronecker product covariance and random coefficient mixed models. The study was based on data from 566 per four visits HIV infections were enrolled in the first 4 visits of the 5year secondary data with retrospective study design. Both models of HIV infections cell counts values of CD4 and CD8 increase over time while hemoglobin decreases over time. Those models results reveals that a strong positive correlation between CD4 and CD8 cells, but the correlation between CD4 and hemoglobin as well as the correlation between CD8 and hemoglobin are not statistically significant. Consequently, the study suggests that concerned bodies should focus on awareness creation to increase cell counts of CD4 and CD8 over time while hemoglobin decrease over time for HIV infections at Gondar districts, Ethiopia.
Keywords: HIV Infections, Cell Counts, Multivariate Longitudinal Data, Kronecker Product Covariance, Random Coefficients
INTRODUCTION
Longitudinal data consist of repeated measurements that belong to same subjects/units. This type of data might arise in many research ﬁelds such as medical sciences [14]. The repeated measurements are typically dependent and these dependencies (associations) must be taken into account to have valid statistical inferences [5].
Longitudinal data are often collected on multiple responses. In such a case, in addition to the withinsubject association, two additional association structures, multivariate response association at a speciﬁc time point and crosstemporal response association, occur. The multivariate responses could either belong to the same response family or different response families. In literature, there are two common approaches to handle multivariate longitudinal data rather than jointly analyzing them. First one is to ignore the multivariate response association and construct univariate models [6] or carry out univariate cluster analyses [7,8]. The second one is to consider one of the responses as the dependent variable and the other(s) as independent one(s) together with the other independent variables and to construct a univariate model [9].
From the clinical point of view, what is required for the present model is that an essential component of the immune system is the Lymphocytes which destroy invaders. Lymphocytes are of two types Bcells and Tcells, being Bcells antibody factories producing antibodies as fast as they can and also clone themselves, whereas Tcells either direct the activity of Bcells (called CD4+Tcells) or act as suppressors (called CD8+Tcells) destroying infected cells and thus damp out the activity of the immune system. The AIDS has three stages, one including the initial infection, a second one of latency and the third one corresponding to a runaway destruction of the immune system. During the latency lapse healthy Tcells are infected although its number remains high. When the concentration of Tcells decreases and that of HIV virus cells increases, it is the AIDS stage. In practice, as an example, cytometry is a procedure useful to count the concentration of healthy, latently infected and infected Tcells in the blood stream, for instance using the dispersion of laser light, which depends on the enzymes covering the cells [23]. Knowing the concentration of CD4+Tcells it is possible to wonder about modeling the evolution of the different cell populations. Among the various models trying to describe the dynamics of CD4+Tcells (see for instance [24, 25]), a simple one describing the phases of latency and the destruction of the immune system is Perelson’s model [26, 27, 28].
A total of 566 per four visits (2264 individuals) were enrolled in HIV infections for this study and followed up to 5years with in 4 repetitions. In this study, we only considered modeling of multivariate longitudinal response variables for HIV infections have three responses. These are CD4 counts (CD4), CD8 counts (CD8) and Hemoglobin levels (HMG). Our primary interest was to assess the interrelationships among these multivariate responses. For an easy notation, throughout this study we let X, Y, and Z represents for CD4, CD8 and HMG respectively. We note that, as ubiquitous in longitudinal studies, not all multivariate responses are measured at all occasions in HIV infections data. In spite of their practicability, these approaches do not yield valid statistical inferences. In this study, we consider the accommodation of the aforementioned three association structures with a multivariate modeling approach for multiple responses belonging to the same response family.
However, the analysis of such a modeling of multivariate longitudinal data can be challenging because (i) the variances of errors are likely to be different for different subjects, (ii) the errors are likely to be correlated for the same subjects measured at different occasions, and (iii) the errors are also likely to be correlated among subjects measured at the same time. In the SAS software, 2 different approaches have been provided to analyze modeling of multivariate longitudinal data: modeling of multivariate repeated measurement models with a kronecker product covariance structure [15] and random coefficient mixed models [12].
In this study, we first present the HIV infections data that motivates our study. Then we illustrate and compare these 2 approaches in studying the joint evolution of the modeling of multivariate longitudinal data for HIV infections at an early stage of HIV infections data. The objectives of this study was to explore the joint evolution of CD4 counts, CD8 counts and hemoglobin levels of HIV infections and to compare modeling of multivariate repeated measurement models (kronecker product covariance and random effects mixed model) using HIV infections data of North Gondar, Ethiopia. In this study, model development procedures were Akaike Information Criteria (AIC), Bayesian Information Criteria (BIC) and Likelihood Ratio Test (2LnL).
MATERIALS and METHODS
Study Area and Data Descriptions
This study was based on data from 566 per four visits (2264 individuals) persons with HIV infections were enrolled in the first four visits of the five year multivariate longitudinal study of HIV infections using the data of North Gondar, Ethiopia. The study design was retrospective on multivariate longitudinal data setting that go back in time to compare multivariate repeated measurement models with a kronecker product covariance structure and random mixed effects model and to explore the joint evolution of CD4, CD8 counts and hemoglobin levels of HIV infections. The secondary data were collected from different part of North Gondar districts that measured repeatedly in four visits between the years on May 30, 2013 to May 30, 2017. Each repeated measures were conducted on the average six month interval in the study period. In this study all multivariate longitudinal data response variables were time variant and independents variables were time variant and invariant.
Eligible Criteria
Outstanding to some natural and unknown reasons few person fail to continue up to the end of the study period. This study was considered both inclusion and exclusion criteria.
Inclusion Criteria
Throughout the data collection, we were selected person who have HIV infections that could have four repeatedly measured over time data were included in the study period. That is, the repeatedly measured data with four followups (visits) person with HIV infections were included in the study.
Exclusion Criteria
HIV infections who dropout (withdraw), intermittent missing values and loss to followup within the study period was excluded from the study. Hence, person who may have died from other cause, refuse and others who do not participate at the previous measurement (attrition) time of the study period was excluded from the study. Although, the time period before May 30, 2013 and after May 30, 2017 HIV infections would been excluded from the study.
Variables Considered in the Study
HIV infections were considered as multivariate longitudinal data act as the response variables. That variable has three categories were CD4 counts, CD8 counts and hemoglobin levels even if they were used as a continuous variable to maximize the amount of information available in the data set.
The independent variables such as cite of HIV infections (Chilga, Debark, Gondar and Metema), ART number, visits (1^{st}, 2^{nd}, 3^{rd} & 4^{th}), and value of HIV infections and time of HIV infections were included in the study. The first two variables were used for descriptive statistics and the remaininng variables were applied for inferential statistics (modeling part).
Methods of Statistical Analysis
Multivariate longitudinal data is a special case of repeatedly measured data, the observations are not independent and are characterized as having both betweensubject and withinsubject variation, time dependent covariates and missing data [10]. Furthermore, multivariate longitudinal data response variables have become increasingly popular, more accessible and good in missing data handling through statistical software such as SAS Virson9.4 [13].
Exploring Data Analysis
Exploring the Correlation Structure
Each covariance model has a corresponding correlation model. Even though there are so many models for covariance structure. There are 4 types of covariance models were considered and compared in this study [29].
Compound Symmetry (CS)
Suppose that a parameter representing the common correlation for any two time points, then the correlation here, where the single correlation parameter is generally referred to as the intraclass correlation coefficient.
Let us assume that, with m repeated measures.
Autoregressive Order One (AR(1))
Autoregressive models express the current observation as a linear function of previous observations plus a homoscedastic noise term, α_{t} , centered at (E[α_{t}] = 0) and assumed independent of the previous observations.
Toeplitz (TOEP)
If there were m measures, then there will be two distances: one unit distance, two units distance. And, hence two parameters were estimated.
Unstructured (UN)
Unstructured covariance matrix allows m different variances, one for each time point and distinct offdiagonal elements representing the possibly different covariance for each pair of times, for a total of variances and covariances.
Model Selection
It is one of a most series problem in data analysis. The variancecovariance structure with the lower AIC, BIC and likelihood ratio test (2LnL) value can be an appropriate to select the model. Most commonly in epidemiological studies, investigators frequently attempt to construct the most desirable statistical model using the popular methods of forward, backward, and stepwise regressions [30].
IntraCluster Correlation (ICC)
The covariance matrix of the randomeffects is thought to summarize the ICC. If there are a large number of randomeffects components, then this leads to a complex covariance matrix and can increase computational burden [31].
Multivariate Repeated Measurement Models
Kronecker Product Covariance Structure
The first approach considered is to fit a model with a kronecker product covariance structure. This model allows investigator to specify the variancecovariance matrix of the measurement error within each study unit (i.e., subject) and thus to examine the intraand intersubject correlations of the measurement errors. The data layout for this model has the following presentation, where the variable VISIT indicates that all the assessments are equally spaced while the variable TIME gives the exact measurement time (six month). Where, PHID stands for patient who have HIV infections identified card in the SAS program. To assure normality and homoscedasticity of the residual distribution, the response variable is defined as the change in value of subject at time t since the initial visit, i.e.,
Table 1: The data layout presentation for multivariate longitudinal of HIV data.
PHID 
Vname 
Visit 
Time 
Value 
1 
CD4 
1 
0 
217 
1 
CD4 
2 
5.866667 
393 
1 
CD4 
3 
11.63333 
2944 
1 
CD4 
4 
17.43333 
545 
1 
HMG 
1 
0 
13 
1 
HMG 
2 
5.866667 
11.4 
1 
HMG 
3 
11.63333 
11.2 
1 
HMG 
4 
17.43333 
21 
1 
CD8 
1 
0 
29 
1 
CD8 
2 
5.866667 
43.9 
1 
CD8 
3 
11.63333 
37.7 
1 
CD8 
4 
17.43333 
29.4 
Proc mixed data=HIV_MIX covtest noclprint;
Title1 “model 1 mixed model with a kronoker product covariance”;
Title2 “CD4 (X) versus CD8 (Y)”;
Class PHID vname visit;
Model value=vname*visit/s noint;
Repeated vname visit/ type=UN@AR(1) subject=PHID r rcorr;
Where vname=“CD4” or vname=“CD8”;
Run;
The option TYPE=UN@AR(1) in the REPEATED statement specifies that the covariance matrix with in an individual has the following structure:
Thus, the above model assumes that (i) the two cells (CD4 & CD8) share a common intracells correlation as measured by:
, and (ii) the intercells correlations (as measured ) are the same for two cells measured at the same time point. The two cells are independent if the matrix V has the form of , and the hypothesis H_{0}: σ_{xy} = 0 will be tested by the COVTEST option.
Random Coefficient Mixed Models
Instead of modeling the variation with in the study unit as in the repeated measurement models, the random coefficient mixed models assume that the regression coefficients are a random sample from some population of possible coefficient and allow one to model variations between study units [14]. In the presence of multiple response variables, a separate set of regression coefficients will be fitted for each response variable and the correlations among these random coefficients can be examined. In SAS this model is also implemented with PROC MIXED and the data layout of the model is exactly the same as in previous section. The following SAS codes fit a random coefficient mixed model for CD4 and CD8. We note that the variable TIME rather than VISIT is used in the model, indicating its capability to handle equally spaced measurements. The syntax/program in SAS version9.4 was fitted the random coefficient mixed models were presented below.
Proc mixed data=HIV_MIX covtest noclprint;
Title1 “model 2 mixed model with random coefficients”;
Title2 “CD4 (X) versus CD8 (Y)”;
Class PHID vname;
Model value=vname*time/s noint;
Random vname*time/ type=UN subject=PHID g gcorr;
Repeated / type=VC group=vname subject=PHID;
Where vname=“CD4” or vname=“CD8”;
Run;
The RANDOM statement requests that two random slopes (one for CD4 and one for CD8) be fitted for each individual and the GROUP option in the REPEATED statement specifies that the variances of measurement errors are different for different cells. The G and GCORR options in the RANDOM statement require the display of covariance and correlation matrix for the random slopes respectively, with , the two slopes are independent if σ_{xy} = 0.
RESULTS
Among the total of 566 per four visits (2264 individuals) HIV infections data at North Gondar Zone, Ethiopia for category of CD4 counts were 852(37.6%), while in CD8 counts were 644(28.4%) and the remaining hemoglobin levels were 768(33.9%) presented in figure2 and table2. However, figure1 revealed that the value of HIV infections data at North Gondar Zone, Ethiopia was approximately skewed to the right direction or decrease to the right direction.
Figure 1: Barchart for value of HIV Fig. 2: Piechart for variables name of HIV distribution.
In table2 reveals that each visit of HIV infections per visits were 566(25%). According to this scenario in HIV infections of the cite Chilga have 1016(44.9%), Debark cite have 84(3.7%), Gondar cite have 456(20.1%) and the rest HIV infections have 708(31.3%) in Metema cite.
Table 2: Categorical variables of HIV infections data (n=566)
S. No. 
Variables 
Categories 
Frequency (n) 
Percentage (%) 
1 
Cite of Infections 
Chilga 
1016 
44.9 
Debark 
84 
3.7 

Gondar 
456 
20.1 

Metema 
708 
31.3 

2 
Visit of HIV Infections 
First 
566 
25.0 
Second 
566 
25.0 

Third 
566 
25.0 

Fourth 
566 
25.0 

3 
Variables name of Infections 
CD4 
852 
37.6 
CD8 
644 
28.4 

HMG 
768 
33.9 
Table 3: Crosstabulation for visit on variables name of HIV infections data.
Crosstabulation 
Variables name of HIV Infections 
Total 

CD4 
CD8 
HMG 

Visit of HIV Infections 
1 
213 
161 
192 
566 
2 
213 
161 
192 
566 

3 
213 
161 
192 
566 

4 
213 
161 
192 
566 

Total 
852 
644 
768 
2264 

Mean 
0.376 
0.285 
0.339 
 

Standard Deviation 
14.577 
12.676 
13.841 
 
Table 4: Continuous variables of HIV infections data at North Gondar Zone, Ethiopia (n=566).
Variables 
Minimum 
Maximum 
Mean 
Std. Deviation 

Dependent 
Value of HIV Infections 
.90 
2944.00 
163.77 
235.81 
Covariates 
ART Number 
1 
137 
69.46 
40.32 
Time of HIV Infections 
0.00 
128.53 
29.97 
26.48 
The results show that, patients with HIV infections, the values of CD4 and CD8 increase over time while hemoglobin (HMG) decreases over time. All these changes are significantly different from zero (with P<0.001 for CD4 and CD8, P<0.05 for HMG respectively). The results also reveal that a strong positive correlation between CD4 and CD8 cells (ρ_{X,Y}=0.463 with P<0.001), but the correlation between CD4 and HMG (ρ_{X,Z}, P=0.148) as well as the correlation between CD8 and HMG (ρ_{Y,Z}, P=0.217) are not statistically significant. We note that these correlations index had the extra associations among cell counts after removing the effect of involution process over time.
Table 5: Bivariate mixed models with a kronoker product covariance on HIV infections data.
Features 
CD4(X) and CD8(Y) 
CD4(X) and HMG(Z) 
CD8(Y) and HMG(Z) 

UN@AR(1) UN@UN 
UN@AR(1) UN@UN 
UN@AR(1) UN@UN 

Fixed effects 

Slope_{X} 
0.0623^{**} 
0.0639^{**} 
0.0600^{**} 
0.0611^{**} 
 
 
Slope_{Y} 
0.2244^{^^} 
0.2239^{**} 
 
 
0.2266^{**} 
0.2235^{**} 
Slope_{Z} 
 
 
0.0279^{*} 
0.0271* 
0.0282^{*} 
0.0282^{*} 
Random effects 

ρ_{X,Y} 
0.466^{**} 
0.463^{**} 
 
 
 
 
ρ_{X,Z} 
 
 
0.029 
0.069 
 
 
ρ_{Y,Z} 
 
 
 
 
0.059 
0.055 
Fit Statistics 

AIC 
3360 
3323 
1509 
1745 
880 
926 
BIC 
3365 
3303 
1478 
1708 
867 
913 
2LnL 
3398 
3357 
1529 
1764 
898 
970 
The results of mixed models with random coefficients are listed in table6 where the slope parameter represents the average change per six month for each cell counts over time. For simplicity, only the estimated correlation coefficients out of the random effect portion are listed in table6.
Results show that CD4 and CD8 increase over time (P<0.001) while HMG decrease (P<0.05). There exist a strong positive correlation between CD4 and CD8 (ρ_{X,Y}=0.753 with P<0.001), but the correlation between CD4 and HMG (ρ_{X,Z}=0.074, P=0.536) as well as the correlation between CD8 and HMG (ρ_{Y,Z}=0.045, P=0.764) are not statistically significant. However, it should be pointed out that the correlation coefficients in table6 represent the associations among random slopes (i.e., the trajectory of cell counts over time) rather than measurement errors. We also note this approach is easily extendable to multivariate longitudinal data with more than two response variables.
Table 6: Heterogeneous mixed models with random coefficients on HIV infections data
Features 
(X, Y) 
(X, Z) 
(Y, Z) 
(X, Y, Z) 
Fixed effects 

Slope_{X} 
0.0053^{**} 
0.0053^{**} 
 
0.0053^{**} 
Slope_{Y} 
0.0088^{**} 
 
0.0087^{**} 
0.0088^{**} 
Slope_{Z} 
 
0.0021^{*} 
0.0021^{*} 
0.0021* 
Random effects 

ρ_{X,Y} 
0.753^{**} 
 
 
0.753^{**} 
ρ_{X,Z} 
 
0.074 
 
0.074 
ρ_{Y,Z} 
 
 
0.045 
0.034 
Fit Statistics 

AIC 
3355 
1933 
823 
3423 
BIC 
3343 
1906 
795 
3494 
2LnL 
3356 
1937 
833 
3443 
DISCUSSIONS
In this study was conducted comparison of the 2 approaches for multivariate longitudinal data modeling: the 2 different approaches considered in this study to explore different aspects of the joint evolution of multiple response variables, and thus each approach has its own advantages and limitations.
The first models with kronecker product covariance provide a convenient way to fit bivariate data and enable one to examine both intercell and intracell correlations of the measurement errors. However, this model possesses several apparent limitations, namely, a common intracell correlation for different cells, a constant intercell correlation for cells measured at the same time point, as well as the demanding for equally spaced measurements. Furthermore, SAS only provides the possibility to fit bivariate mixed models.
The second models with random coefficients provide an opportunity to examine correlations among the trajectories of cell counts over time and to capture the growth of these response variables. As exemplified by the HIV data, this approach is capable of handling equally spaced measurements and is easily extendable to multivariate models with more than two response variables. Another advantage of these mixedmodel approaches is that they allow incomplete observations and thus can use information more efficiently.
CONCLUSION
According to this study in the early stage of HIV infections, CD4 and CD8 steadily increase over time while the values of hemoglobin (HMG) decrease over time. There exists a strong positive correlation between the two cell counts of HIV infections. The correlations between the cell counts (CD4 and CD8) and hemoglobin levels are not significant in either measurement errors or random slopes, but CD4 and CD8 cells show a negative crosslagged effect on hemoglobin levels, i.e., the higher the value of CD4 (CD8) at time t1, the lower the hemoglobin measurement at time t.
DECLARATIONS/ ETHICAL STANDARDS
Ethical Approval and Consent to Participate
Ethical clearance had been obtained from Department of Statistics at Haramaya University, Dire Dawa, Ethiopia.
Consent for Publication
This manuscript has not been published elsewhere and is not under consideration by another journal. Author has approved the final manuscript and agreed with its submission to Archives of Public Health. I agreed about authorship for this manuscript.
Availability of Data and Materials
I used secondary data for this investigation obtained at University of Gonder Teaching and Specialized Hospital.
Competing Interests
The authors declare that he has no competing interests.
Funding
No applicable!
Author Contributions
Author wrote the proposal, developed data collection format, supervised the data collection process and analyzed the data in consultation with Mr. Gashaw A. MSc in Public Health. He is also edited the document and gave critical comments. Author read and approved the final manuscript.
Acknowledgments
In a special way, I wish to extend my sincere gratitude to Mrs. Elesabet G. for her support and guidance accorded me during this study. I would also like to acknowledge the contribution of my colleagues from whom I enjoyed fruitful discussion on challenging topics.
Author’s Information
Alebachew Abebe is an Assistant Professor of the department of Statistics at Haramaya University, Ethiopia, He had previous eight manuscripts and one proceeding.
REFERENCES