Increasing knowledge of odors and molecular structures linkages of smell compounds by comparing UMAP method to other classification approaches - Université Côte d'Azur Access content directly
Conference Poster Year : 2021

Increasing knowledge of odors and molecular structures linkages of smell compounds by comparing UMAP method to other classification approaches

Abstract

The olfactory perception begins at the olfactory epithelium level with the activation of olfactory receptors (ORs) by the binding of odorants1. The olfactory system can discriminate a huge number of odors that would reach 1 trillion2. Odor structure relationships in olfaction is a challenging area and a key element in understanding the olfactory system3-5. This study aims to highlight the relationships between the structure of smell compounds and their odors. For this purpose, 6038 odorant compounds and their known associated odors (162 odor notes) were compiled. We assessed four dimensional reduction techniques (PCA, MDS, t-SNE and UMAP6) and two clustering methods (k-means and agglomerative hierarchical clustering AHC) applied to the molecular structures of these 6038 smell compounds encoded by 1024-bit fingerprints. An analysis of the distribution of odor notes and molecular substructures represented in the different clusters was performed. The less significant results were obtained using the t-SNE, as well concerning the blurred spatial arrangement of the elements in the 2D-space than the overlapping of clustering partitions obtained by k-means and AHC. The MDS and PCA calculations provided better but average results, except for PCA-AHC for which results were a slightly better. All the results and analyses put forward the precision of UMAP in aggregations of the elements according to the cluster areas that were reflected by the high degree of specificity of odor notes regarding the clusters. Indeed, as UMAP is based on the fact that manifold structure exists in the data, UMAP calculation is able to find these structures in the noise of a dataset which is suitable for data visualization. The assignment of smell compounds in the 2-two-dimensional space defined by the calculation shows a distribution of odorants into four main areas, each cluster being dominated by few specific odors and chemical structures. The four clusters gather respectively (i) ketones and bicyclic compounds having “balsamic”/“nutty” odor notes; (ii) unsaturated and aromatic compounds carrying “woody” odor; (iii) aldehydes, sulfur compounds and amines with “sulfurous” or “citrus” odors; (iv) esters and long linear carbon chains sharing “fruity”/“fatty” odor notes. Such association of k-means and AHC clustering with UMAP is the first performed on molecular fingerprints for a dataset related to odors. Therefore, the use of UMAP provides a promising way to improve the understanding of the structure-odor relationships by visualizing high quality embedding of large datasets that were previously unattainable. 1. Buck L, Axel R. A novel multigene family may encode odorant receptors: A molecular basis for odor recognition, Cell, 1991, 175-187. 2. Bushdid C, Magnasco MO, Vosshall LB, Keller A. Humans can discriminate more than 1 trillion olfactory stimuli, Science, 2014, 13701372. 3. Sell CS. The relationship between molecular structure and odour. In: Chemistry and the sense of smell. Oxford: Blackwell Science Publ; 2014. p. 388-419. 4. Genva M, Kemene TK, Deleu M, Lins L, Fauconnier ML. Is it possible to predict the odor of a molecule on the basis of its structure?, Int J Mol Sci, 2019 3018. 5. Licon CC, Bosc G, Sabri M, Mantel M, Fournel A, Bushdid C, et al. Chemical features mining provides new descriptive structure-odor relationships, PLoS Comput Biol, 2019 e1006945. 6. McInnes L, Healy J, Melville J. Umap: Uniform manifold approximation and projection for dimension reduction, arXiv, 2020, 861.
No file

Dates and versions

hal-03442859 , version 1 (23-11-2021)

Identifiers

  • HAL Id : hal-03442859 , version 1

Cite

Marylène Rugard, Thomas Jaylet, Anne Tromelin, Olivier Taboureau, Karine Audouze. Increasing knowledge of odors and molecular structures linkages of smell compounds by comparing UMAP method to other classification approaches. SFCi2021, Lille (France), Sep 2021, Virtual meeting, France. ⟨hal-03442859⟩
48 View
0 Download

Share

Gmail Mastodon Facebook X LinkedIn More