Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
1.
J Appl Stat ; 51(11): 2178-2196, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39157271

RESUMO

This paper aims to evaluate the statistical association between exposure to air pollution and forced expiratory volume in the first second (FEV1) in both asthmatic and non-asthmatic children and teenagers, in which the response variable FEV1 was repeatedly measured on a monthly basis, characterizing a longitudinal experiment. Due to the nature of the data, an robust linear mixed model (RLMM), combined with a robust principal component analysis (RPCA), is proposed to handle the multicollinearity among the covariates and the impact of extreme observations (high levels of air contaminants) on the estimates. The Huber and Tukey loss functions are considered to obtain robust estimators of the parameters in the linear mixed model (LMM). A finite sample size investigation is conducted under the scenario where the covariates follow linear time series models with and without additive outliers (AO). The impact of the time-correlation and the outliers on the estimates of the fixed effect parameters in the LMM is investigated. In the real data analysis, the robust model strategy evidenced that RPCA exhibits three principal component (PC), mainly related to relative humidity (Hmd), particulate matter with a diameter smaller than 10 µm (PM10) and particulate matter with a diameter smaller than 2.5 µm (PM2.5).

2.
BMC Med Res Methodol ; 24(1): 38, 2024 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-38360575

RESUMO

BACKGROUND: Several strategies for identifying biologically implausible values in longitudinal anthropometric data have recently been proposed, but the suitability of these strategies for large population datasets needs to be better understood. This study evaluated the impact of removing population outliers and the additional value of identifying and removing longitudinal outliers on the trajectories of length/height and weight and on the prevalence of child growth indicators in a large longitudinal dataset of child growth data. METHODS: Length/height and weight measurements of children aged 0 to 59 months from the Brazilian Food and Nutrition Surveillance System were analyzed. Population outliers were identified using z-scores from the World Health Organization (WHO) growth charts. After identifying and removing population outliers, residuals from linear mixed-effects models were used to flag longitudinal outliers. The following cutoffs for residuals were tested to flag those: -3/+3, -4/+4, -5/+5, -6/+6. The selected child growth indicators included length/height-for-age z-scores and weight-for-age z-scores, classified according to the WHO charts. RESULTS: The dataset included 50,154,738 records from 10,775,496 children. Boys and girls had 5.74% and 5.31% of length/height and 5.19% and 4.74% of weight values flagged as population outliers, respectively. After removing those, the percentage of longitudinal outliers varied from 0.02% (<-6/>+6) to 1.47% (<-3/>+3) for length/height and from 0.07 to 1.44% for weight in boys. In girls, the percentage of longitudinal outliers varied from 0.01 to 1.50% for length/height and from 0.08 to 1.45% for weight. The initial removal of population outliers played the most substantial role in the growth trajectories as it was the first step in the cleaning process, while the additional removal of longitudinal outliers had lower influence on those, regardless of the cutoff adopted. The prevalence of the selected indicators were also affected by both population and longitudinal (to a lesser extent) outliers. CONCLUSIONS: Although both population and longitudinal outliers can detect biologically implausible values in child growth data, removing population outliers seemed more relevant in this large administrative dataset, especially in calculating summary statistics. However, both types of outliers need to be identified and removed for the proper evaluation of trajectories.


Assuntos
Estatura , Gráficos de Crescimento , Criança , Masculino , Feminino , Humanos , Peso Corporal , Brasil/epidemiologia , Antropometria
3.
Br J Math Stat Psychol ; 77(2): 316-336, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38095333

RESUMO

Analysing data from educational tests allows governments to make decisions for improving the quality of life of individuals in a society. One of the key responsibilities of statisticians is to develop models that provide decision-makers with pertinent information about the latent process that educational tests seek to represent. Mixtures of t $$ t $$ factor analysers (MtFA) have emerged as a powerful device for model-based clustering and classification of high-dimensional data containing one or several groups of observations with fatter tails or anomalous outliers. This paper considers an extension of MtFA for robust clustering of censored data, referred to as the MtFAC model, by incorporating external covariates. The enhanced flexibility of including covariates in MtFAC enables cluster-specific multivariate regression analysis of dependent variables with censored responses arising from upper and/or lower detection limits of experimental equipment. An alternating expectation conditional maximization (AECM) algorithm is developed for maximum likelihood estimation of the proposed model. Two simulation experiments are conducted to examine the effectiveness of the techniques presented. Furthermore, the proposed methodology is applied to Peruvian data from the 2007 Early Grade Reading Assessment, and the results obtained from the analysis provide new insights regarding the reading skills of Peruvian students.


Assuntos
Algoritmos , Qualidade de Vida , Humanos , Funções Verossimilhança , Peru , Análise Multivariada , Simulação por Computador
4.
J Gen Psychol ; 150(4): 405-422, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-35792742

RESUMO

This study aims to examine the effects of the underlying population distribution (normal, non-normal) and OLs on the magnitude of Pearson, Spearman and Pearson Winzorized correlation coefficients through Monte Carlo simulation. The study is conducted using Monte Carlo simulation methodology, with sample sizes of 50, 100, 250, 250, 500 and 1000 observations. Each, underlying population correlations of 0.12, 0.20, 0.31 and 0.50 under conditions of bivariate Normality, bivariate Normality with Outliers (discordant, contaminants) and Non-normal with different values of skewness and kurtosis. The results show that outliers have a greater effect compared to the data distributions; specifically, a substantial effect occurs in Pearson and a smaller one in Spearman and Pearson Winzorized. Additionally, the outliers are shown to have an impact on the assessment of bivariate normality using Mardia's test and problems with decisions based on skewness and kurtosis for univariate normality. Implications of the results obtained are discussed.

5.
Methods Mol Biol ; 2481: 13-27, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35641756

RESUMO

Based on case studies, in this chapter we discuss the extent to which the number and identity of quantitative trait loci (QTL) identified from genome-wide association studies (GWAS) are affected by curation and analysis of phenotypic data. The chapter demonstrates through examples the impact of (1) cleaning of outliers, and of (2) the choice of statistical method for estimating genotypic mean values of phenotypic inputs in GWAS. No cleaning of outliers resulted in the highest number of dubious QTL, especially at loci with highly unbalanced allelic frequencies. A trade-off was identified between the risk of false positives and the risk of missing interesting, yet rare alleles. The choice of the statistical method to estimate genotypic mean values also affected the output of GWAS analysis, with reduced QTL overlap between methods. Using mixed models that capture spatial trends, among other features, increased the narrow-sense heritability of traits, the number of identified QTL and the overall power of GWAS analysis. Cleaning and choosing robust statistical models for estimating genotypic mean values should be included in GWAS pipelines to decrease both false positive and false negative rates of QTL detection.


Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Alelos , Frequência do Gene , Estudo de Associação Genômica Ampla/métodos , Locos de Características Quantitativas
6.
Entropy (Basel) ; 24(10)2022 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-37420369

RESUMO

The determination of The Radial Basis Function Network centers is an open problem. This work determines the cluster centers by a proposed gradient algorithm, using the information forces acting on each data point. These centers are applied to a Radial Basis Function Network for data classification. A threshold is established based on Information Potential to classify the outliers. The proposed algorithms are analysed based on databases considering the number of clusters, overlap of clusters, noise, and unbalance of cluster sizes. Combined, the threshold, and the centers determined by information forces, show good results in comparison to a similar Network with a k-means clustering algorithm.

7.
BMC Health Serv Res ; 20(1): 804, 2020 Aug 26.
Artigo em Inglês | MEDLINE | ID: mdl-32847575

RESUMO

BACKGROUND: Universal health coverage promises equity in access to and quality of health services. However, there is variability in the quality of the care (QoC) delivered at health facilities in low and middle-income countries (LMICs). Detecting gaps in implementation of clinical guidelines is key to prioritizing the efforts to improve quality of care. The aim of this study was to present statistical methods that maximize the use of existing electronic medical records (EMR) to monitor compliance with evidence-based care guidelines in LMICs. METHODS: We used iSanté, Haiti's largest EMR to assess adherence to treatment guidelines and retention on treatment of HIV patients across Haitian HIV care facilities. We selected three processes of care - (1) implementation of a 'test and start' approach to antiretroviral therapy (ART), (2) implementation of HIV viral load testing, and (3) uptake of multi-month scripting for ART, and three continuity of care indicators - (4) timely ART pick-up, (5) 6-month ART retention of pregnant women and (6) 6-month ART retention of non-pregnant adults. We estimated these six indicators using a model-based approach to account for their volatility and measurement error. We added a case-mix adjustment for continuity of care indicators to account for the effect of factors other than medical care (biological, socio-economic). We combined the six indicators in a composite measure of appropriate care based on adherence to treatment guidelines. RESULTS: We analyzed data from 65,472 patients seen in 89 health facilities between June 2016 and March 2018. Adoption of treatment guidelines differed greatly between facilities; several facilities displayed 100% compliance failure, suggesting implementation issues. Risk-adjusted continuity of care indicators showed less variability, although several facilities had patient retention rates that deviated significantly from the national average. Based on the composite measure, we identified two facilities with consistently poor performance and two star performers. CONCLUSIONS: Our work demonstrates the potential of EMRs to detect gaps in appropriate care processes, and thereby to guide quality improvement efforts. Closing quality gaps will be pivotal in achieving equitable access to quality care in LMICs.


Assuntos
Registros Eletrônicos de Saúde , Fidelidade a Diretrizes/estatística & dados numéricos , Infecções por HIV/tratamento farmacológico , Guias de Prática Clínica como Assunto , Melhoria de Qualidade/organização & administração , Adulto , Fármacos Anti-HIV/uso terapêutico , Feminino , Haiti , Instalações de Saúde/normas , Pesquisa sobre Serviços de Saúde , Humanos , Masculino , Pessoa de Meia-Idade , Gravidez , Adulto Jovem
8.
J Appl Stat ; 47(10): 1833-1847, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-35707135

RESUMO

Zero adjusted regression models are used to fit variables that are discrete at zero and continuous at some interval of the positive real numbers. Diagnostic analysis in these models is usually performed using the randomized quantile residual, which is useful for checking the overall adequacy of a zero adjusted regression model. However, it may fail to identify some outliers. In this work, we introduce a class of residuals for outlier identification in zero adjusted regression models. Monte Carlo simulation studies and two applications suggest that one of the residuals of the class introduced here has good properties and detects outliers that are not identified by the randomized quantile residual.

9.
Sensors (Basel) ; 19(20)2019 Oct 18.
Artigo em Inglês | MEDLINE | ID: mdl-31635349

RESUMO

Geodetic networks provide accurate three-dimensional control points for mapping activities, geoinformation, and infrastructure works. Accurate computation and adjustment are necessary, as all data collection is vulnerable to outliers. Applying a Least Squares (LS) process can lead to inaccuracy over many points in such conditions. Robust Estimator (RE) methods are less sensitive to outliers and provide an alternative to conventional LS. To solve the RE functions, we propose a new metaheuristic (MH), based on the Vortex Search (IVS) algorithm, along with a novel search space definition scheme. Numerous scenarios for a Global Navigation Satellite Systems (GNSS)-based network are generated to compare and analyze the behavior of several known REs. A classic iterative RE and an LS process are also tested for comparison. We analyze the median and trim position of several estimators, in order to verify their impact on the estimates. The tests show that IVS performs better than the original algorithm; therefore, we adopted it in all subsequent RE computations. Regarding network adjustments, outcomes in the parameter estimation show that REs achieve better results in large-scale outliers' scenarios. For detection, both LS and REs identify most outliers in schemes with large outliers.

10.
Food Chem ; 293: 323-332, 2019 Sep 30.
Artigo em Inglês | MEDLINE | ID: mdl-31151619

RESUMO

This paper proposes the use of random forest for adulteration detection purposes, combining the random forest algorithm with the artificial generation of outliers from the authentic samples. This proposal was applied in two food adulteration studies: evening primrose oils using ATR-FTIR spectroscopy and ground nutmeg using NIR diffuse reflectance spectroscopy. The primrose oil was adulterated with soybean, corn and sunflower oils, and the model was validated using these adulterated oils and other different oils, such as rosehip and andiroba, in pure and adulterated forms. The ground nutmeg was adulterated with cumin, commercial monosodium glutamate, soil, roasted coffee husks and wood sawdust. For the primrose oil, the proposed method presented superior performance than PLS-DA and similar performance to SIMCA and for the ground nutmeg, the random forest was superior to PLS-DA and SIMCA. Also, in both applications using the random forest, no sample was excluded from the external validation set.


Assuntos
Contaminação de Alimentos/análise , Ácidos Linoleicos/química , Óleos de Plantas/química , Espectroscopia de Infravermelho com Transformada de Fourier/métodos , Ácido gama-Linolênico/química , Óleo de Milho/análise , Limite de Detecção , Myristica/química , Oenothera biennis , Óleo de Soja/análise , Óleo de Girassol/análise
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA