Pesquisa | Portal Regional da BVS

Feature Selection for Polymer Informatics: Evaluating Scalability and Robustness of the FS4RV_DD Algorithm Using Synthetic Polydisperse Data Sets.

Cravero, Fiorella; Schustik, Santiago A; Martínez, M Jimena; Vázquez, Gustavo E; Díaz, Mónica F; Ponzoni, Ignacio.

J Chem Inf Model ; 60(2): 592-603, 2020 02 24.

Artigo em Inglês | MEDLINE | ID: mdl-31790226

RESUMO

The feature selection (FS) process is a key step in the Quantitative Structure-Property Relationship (QSPR) modeling of physicochemical properties in cheminformatics. In particular, the inference of QSPR models for polymeric material properties constitutes a complex problem because of the uncertainty introduced by the polydispersity of these materials. The main challenge is how to capture the polydispersity information from the molecular weight distribution (MWD) curve to achieve a more effective computational representation of polymeric materials. To date, most of the existing QSPR techniques use only a single molecule to represent each of these materials, but polydispersity is not considered. Consequently, QSPR models obtained by these approaches are being oversimplified. For this reason, we introduced in a previous work a new FS algorithm called Feature Selection for Random Variables with Discrete Distribution (FS4RVDD), which allows dealing with polydisperse data. In the present paper, we evaluate both the scalability and the robustness of the FS4RVDD algorithm. In this sense, we generated synthetic data by varying and combining different parameters: the size of the database, the cardinality of the selected feature subsets, the presence of noise in the data, and the type of correlation (linear and nonlinear). Moreover, the performances obtained by FS4RVDD were contrasted with traditional FS techniques applied to different simplified representations of polymeric materials. The obtained results show that the FS4RVDD algorithm outperformed the traditional FS methods in all proposed scenarios, which suggest the need of an algorithm such as FS4RVDD to deal with the uncertainty that polydispersity introduces in human-made polymers.

Assuntos

Algoritmos , Polímeros/química , Modelos Moleculares , Conformação Molecular , Peso Molecular , Relação Quantitativa Estrutura-Atividade

QSAR Classification Models for Predicting the Activity of Inhibitors of Beta-Secretase (BACE1) Associated with Alzheimer's Disease.

Ponzoni, Ignacio; Sebastián-Pérez, Víctor; Martínez, María J; Roca, Carlos; De la Cruz Pérez, Carlos; Cravero, Fiorella; Vazquez, Gustavo E; Páez, Juan A; Díaz, Mónica F; Campillo, Nuria E.

Sci Rep ; 9(1): 9102, 2019 06 24.

Artigo em Inglês | MEDLINE | ID: mdl-31235739

RESUMO

Alzheimer's disease is one of the most common neurodegenerative disorders in elder population. The ß-site amyloid cleavage enzyme 1 (BACE1) is the major constituent of amyloid plaques and plays a central role in this brain pathogenesis, thus it constitutes an auspicious pharmacological target for its treatment. In this paper, a QSAR model for identification of potential inhibitors of BACE1 protein is designed by using classification methods. For building this model, a database with 215 molecules collected from different sources has been assembled. This dataset contains diverse compounds with different scaffolds and physical-chemical properties, covering a wide chemical space in the drug-like range. The most distinctive aspect of the applied QSAR strategy is the combination of hybridization with backward elimination of models, which contributes to improve the quality of the final QSAR model. Another relevant step is the visual analysis of the molecular descriptors that allows guaranteeing the absence of information redundancy in the model. The QSAR model performances have been assessed by traditional metrics, and the final proposed model has low cardinality, and reaches a high percentage of chemical compounds correctly classified.

Assuntos

Doença de Alzheimer/tratamento farmacológico , Secretases da Proteína Precursora do Amiloide/antagonistas & inibidores , Inibidores de Proteases/química , Inibidores de Proteases/farmacologia , Relação Quantitativa Estrutura-Atividade , Doença de Alzheimer/enzimologia , Simulação por Computador , Aprendizado de Máquina , Inibidores de Proteases/uso terapêutico

Visual analytics in cheminformatics: user-supervised descriptor selection for QSAR methods.

Martínez, María Jimena; Ponzoni, Ignacio; Díaz, Mónica F; Vazquez, Gustavo E; Soto, Axel J.

J Cheminform ; 7: 39, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26300983

RESUMO

BACKGROUND: The design of QSAR/QSPR models is a challenging problem, where the selection of the most relevant descriptors constitutes a key step of the process. Several feature selection methods that address this step are concentrated on statistical associations among descriptors and target properties, whereas the chemical knowledge is left out of the analysis. For this reason, the interpretability and generality of the QSAR/QSPR models obtained by these feature selection methods are drastically affected. Therefore, an approach for integrating domain expert's knowledge in the selection process is needed for increase the confidence in the final set of descriptors. RESULTS: In this paper a software tool, which we named Visual and Interactive DEscriptor ANalysis (VIDEAN), that combines statistical methods with interactive visualizations for choosing a set of descriptors for predicting a target property is proposed. Domain expertise can be added to the feature selection process by means of an interactive visual exploration of data, and aided by statistical tools and metrics based on information theory. Coordinated visual representations are presented for capturing different relationships and interactions among descriptors, target properties and candidate subsets of descriptors. The competencies of the proposed software were assessed through different scenarios. These scenarios reveal how an expert can use this tool to choose one subset of descriptors from a group of candidate subsets or how to modify existing descriptor subsets and even incorporate new descriptors according to his or her own knowledge of the target property. CONCLUSIONS: The reported experiences showed the suitability of our software for selecting sets of descriptors with low cardinality, high interpretability, low redundancy and high statistical performance in a visual exploratory way. Therefore, it is possible to conclude that the resulting tool allows the integration of a chemist's expertise in the descriptor selection process with a low cognitive effort in contrast with the alternative of using an ad-hoc manual analysis of the selected descriptors. Graphical abstractVIDEAN allows the visual analysis of candidate subsets of descriptors for QSAR/QSPR. In the two panels on the top, users can interactively explore numerical correlations as well as co-occurrences in the candidate subsets through two interactive graphs.

QSPR models for predicting log P(liver) values for volatile organic compounds combining statistical methods and domain knowledge.

Palomba, Damián; Martínez, María J; Ponzoni, Ignacio; Díaz, Mónica F; Vazquez, Gustavo E; Soto, Axel J.

Molecules ; 17(12): 14937-53, 2012 Dec 17.

Artigo em Inglês | MEDLINE | ID: mdl-23247367

RESUMO

Volatile organic compounds (VOCs) are contained in a variety of chemicals that can be found in household products and may have undesirable effects on health. Thereby, it is important to model blood-to-liver partition coefficients (log P(liver)) for VOCs in a fast and inexpensive way. In this paper, we present two new quantitative structure-property relationship (QSPR) models for the prediction of log P(liver), where we also propose a hybrid approach for the selection of the descriptors. This hybrid methodology combines a machine learning method with a manual selection based on expert knowledge. This allows obtaining a set of descriptors that is interpretable in physicochemical terms. Our regression models were trained using decision trees and neural networks and validated using an external test set. Results show high prediction accuracy compared to previous log P(liver) models, and the descriptor selection approach provides a means to get a small set of descriptors that is in agreement with theoretical understanding of the target property.

Assuntos

Gases , Modelos Teóricos , Relação Quantitativa Estrutura-Atividade , Compostos Orgânicos Voláteis , Animais , Inteligência Artificial , Gases/química , Gases/toxicidade , Humanos , Fígado/efeitos dos fármacos , Ratos , Compostos Orgânicos Voláteis/química , Compostos Orgânicos Voláteis/toxicidade

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA