Abstract
Motivated by finding a way to deal with Compositional Data (CoDa) with or without zeroes in a unified way, we build upon the previous projective geometry viewpoint of Faugeras (2023) and use the tools provided by the exterior product and Grassmann's algebra. These allow to represent higher dimensional subspaces as linear objects, called multi-vectors, on which the usual Euclidean scalar product can be extended. Applied to CoDa seen as equivalence classes, this allows to define a pseudo-scalar product and pseudo-norm. Depending on the normalization chosen, it is remarkable that the pseudo-norm obtained is either the same barycentric divergence which was derived in (Faugeras2024a) from the affine geometry viewpoint, or becomes a new, orthogonally invariant, genuine distance on the full non-negative CoDa space. These tools are then used to lay the foundations for further statistical analysis of CoDa: we show how the relative position of a pair of CoDa around their means can be decomposed along its components to form exterior covariance, variance and correlation matrices, along with their corresponding global scalar measure of (co)variation. Gaussian distributions, Mahalanobis distance, Fréchet means, etc.. can then be introduced and we sketch their potential statistical applications. Eventually, we establish some connections with various notions encountered in the literature, like divergences based on quantifying inequalities, or canonical angles between subspaces. The paper is preceded by a tutorial on the exterior product, based on intuitive geometric visualization and familiar linear algebra, in order to make the ideas of the paper accessible to non-specialists.
Reference
Olivier Faugeras, “Log-Free Distance and Covariance Matrix for Compositional Data II: the Projective/Exterior Product Approach”, TSE Working Paper, n. 24-1601, December 2024.
See also
Published in
TSE Working Paper, n. 24-1601, December 2024