CoDa.jl Save

Compositional data analysis in Julia

Project README


This package defines a Composition{D} type representing a D-part composition as defined by Aitchison 1986. In Aitchison's geometry, the D-simplex together with addition (a.k.a. pertubation) and scalar multiplication (a.k.a. scaling) form a vector space, and important properties hold:

  • Scaling invariance
  • Pertubation invariance
  • Permutation invariance
  • Subcompositional coherence

In practice, this means that one can operate on compositional data (i.e. vectors whose entries represent parts of a total) without destroying the ratios of the parts.

Installation

Get the latest stable release with Julia's package manager:

] add CoDa

Usage

Basics

Compositions are static vectors with named parts:

julia> using CoDa

julia> c = Composition(CO₂=2.0, CH₄=0.1, N₂O=0.3)
                  3-part composition
       ┌                                        ┐ 
   CO₂ ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 2.0   
   CH₄ ┤■■ 0.1                                    
   N₂O ┤■■■■■ 0.3                                 
       └                                        ┘ 

julia> CoDa.parts(c)
(:CO₂, :CH₄, :N₂O)

julia> CoDa.components(c)
3-element StaticArrays.SVector{3, Union{Missing, Float64}} with indices SOneTo(3):
 2.0
 0.1
 0.3

julia> c.CO₂
2.0

Default names are added otherwise:

julia> c = Composition(1.0, 0.1, 0.1)
                     3-part composition
      ┌                                        ┐ 
   w1 ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 1.0   
   w2 ┤■■■■ 0.1                                  
   w3 ┤■■■■ 0.1                                  
      └                                        ┘ 

and serve for internal compile-time checks.

Compositions can be added, subtracted, negated, and multiplied by scalars. Other operations are also defined including dot product, induced norm, and distance:

julia> cₒ = Composition(CO₂=1.0, CH₄=0.1, N₂O=0.1)
                  3-part composition
       ┌                                        ┐ 
   CO₂ ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 1.0   
   CH₄ ┤■■■■ 0.1                                  
   N₂O ┤■■■■ 0.1                                  
       └                                        ┘ 

julia> -cₒ
                  3-part composition
       ┌                                        ┐ 
   CO₂ ┤■■ 0.047619047619047616                   
   CH₄ ┤■■■■■■■■■■■■■■■■■■■ 0.47619047619047616   
   N₂O ┤■■■■■■■■■■■■■■■■■■■ 0.47619047619047616   
       └                                        ┘ 

julia> 0.5c
                  3-part composition
       ┌                                        ┐ 
   CO₂ ┤■■■■■■■■■■■■■■■■■■■■ 0.6207690197922022   
   CH₄ ┤■■■■ 0.13880817265812764                  
   N₂O ┤■■■■■■■■ 0.24042280754967013              
       └                                        ┘ 

julia> c - cₒ
                  3-part composition
       ┌                                        ┐ 
   CO₂ ┤■■■■■■■■■■■■■■■■■■■■■■■ 0.3333333333333333  
   CH₄ ┤■■■■■■■■■■■■ 0.16666666666666666          
   N₂O ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.5   
       └                                        ┘ 

julia> c ⋅ cₒ
3.7554028908352994

julia> norm(c)
2.1432393747688687

julia> aitchison(c, cₒ) # Aitchison distance
0.7856640352007868

More complex functions can be defined in terms of these operations. For example, the function below defines the composition line passing through cₒ in the direction of c:

julia> f(λ) = cₒ + λ*c
f (generic function with 1 method)

Finally, two compositions are considered to be equal when their closure is approximately equal:

julia> c == c
true

julia> c == cₒ
false

Log-ratio transformations

Currently, the following log-ratio transformations are implemented:

julia> alr(c)
2-element StaticArrays.SArray{Tuple{2},Float64,1,2} with indices SOneTo(2):
  1.8971199848858813
 -1.0986122886681096

julia> clr(c)
3-element StaticArrays.SArray{Tuple{3},Float64,1,3} with indices SOneTo(3):
  1.6309507528132907
 -1.3647815207407001
 -0.2661692320725906

julia> ilr(c)
2-element StaticArrays.SArray{Tuple{2},Float64,1,2} with indices SOneTo(2):
 -2.1183026052494185
 -0.3259894019031434

and their inverses alrinv, clrinv and ilrinv.

The transforms for tables are defined in the TableTransforms.jl package, they are: Closure, Remainder, ALR, CLR, ILR. These transforms are functors that can be used as follows:

julia> table |> ILR()

Arrays

It is often useful to compose D columns of a table into D-part compositions. The package provides a CoDaArray type that implements the Julia array interface and the Tables.jl interface. We recommend using the function compose(table, cols) to construct such arrays:

julia> table = (a=[1,2,3], b=[4,5,6], c=[7,8,9])
(a = [1, 2, 3], b = [4, 5, 6], c = [7, 8, 9])

julia> ctable = compose(table, (:a,:b))
(c = [7, 8, 9], CODA = Composition{2, (:a, :b)}[1.000 : 4.000, 2.000 : 5.000, 3.000 : 6.000])

julia> ctable.CODA[1]
                2-part composition
     ┌                                        ┐ 
   a ┤■■■■■■■■■ 1.0                             
   b ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 4.0   
     └                                        ┘ 

Random

D-part compositions can be created at random from a Dirichlet distribution:

julia> rand(Composition{3})
                 3-part composition
      ┌                                        ┐ 
   w1 ┤■■■■■■■■■■■■■■■■■ 0.39938229705106565     
   w2 ┤■■■■■■ 0.1491859823748656                 
   w3 ┤■■■■■■■■■■■■■■■■■■■ 0.45143172057406883   
      └                                        ┘

Plots

Separate packages are available for plotting compositional data:

References

This package is heavily influenced by Aitchison's monograph:

  • Aitchison, J. 1986. The Statistical Analysis of Compositional Data

and by other textbooks:

  • den Boogaart, K. & Tolosana-Delgado. 2011. Analyzing Compositional Data with R
  • Pawlowsky-Glahn et al. 2015. Modeling and Analysis of Compositional Data
  • Pawlowsky-Glahn, V. & Buccianti, A. 2011. Compositional Data Analysis - Theory and Applications

Notes

The unicode display of composition objects can be obtained with the following code:

using UnicodePlots
using CoDa

function Base.show(io::IO, mime::MIME"text/plain",
                   c::Composition{D,PARTS}) where {D,PARTS}
  w = CoDa.components(c)
  x = Vector{Float64}()
  p = Vector{Symbol}()
  m = Vector{Symbol}()
  for i in 1:D
    if ismissing(w[i])
      push!(m, PARTS[i])
    else
      push!(p, PARTS[i])
      push!(x, w[i])
    end
  end
  plt = barplot(p, x, title="$D-part composition")
  isempty(m) || annotate!(plt, :t, "missing: $(join(m,", "))")
  show(io, mime, plt)
end

The code is not added to the CoDa.jl package itself because the UnicodePlots.jl package has become a very heavy dependency, see UnicodePlots/issues/291.

Open Source Agenda is not affiliated with "CoDa.jl" Project. README Source: JuliaEarth/CoDa.jl
Stars
59
Open Issues
1
Last Commit
1 month ago
Repository
License
MIT

Open Source Agenda Badge

Open Source Agenda Rating