MultiKDE.jl Save

Multivariate kernel density estimation

Project README

MultiKDE

Actions Status codecov

A kernel density estimation library, what make this one different from other Julia KDE libraries are:

  1. Multidimension: Using product kernel to estimate multi-dimensional kernel density.
  2. Lazy evaluation: Doesn't pre-initialize a KDE, only evaluate points when necessary.
  3. Categorical distribution: This library supports categorical KDE using two specific kernel functions Wang-Ryzin and Aitchson-Aitken, in which the former one is for categorical distribution that is ordered (age, amount...), the latter is for categorical distribution that is unordered (sex, the face of the coin...). When using unordered categorical distribution, non-numeric objects are also supported.

Use

Example [notebook]

One-dimension KDE


using MultiKDE
using Distributions, Random, Plots

# Simulation
bws = [0.05 0.1 0.5]
d = Normal(0, 1)
observations = rand(d, 50)
granularity_1d = 100
x = Vector(LinRange(minimum(observations), maximum(observations), granularity_1d))
ys = []
for bw in bws
    kde = KDEUniv(ContinuousDim(), bw, observations, MultiKDE.gaussian)
    y = [MultiKDE.pdf(kde, _x, keep_all=false) for _x in x]
    push!(ys, y)
end

# Plot
highest = maximum([maximum(y) for y in ys])
plot(x, ys, label=bws, fmt=:svg)
plot!(observations, [highest+0.05 for _ in 1:length(ys)], seriestype=:scatter, label="observations", size=(900, 450), legend=:outertopright)

1d KDE visualization

Multi-dimension KDE


using MultiKDE
using Distributions, Random, Plots

# Simulation
dims = [ContinuousDim(), ContinuousDim()]
bws = [[0.3, 0.3], [0.5, 0.5], [1, 1]]
mn = MvNormal([0, 0], [1, 1])
observations = rand(mn, 50)
observations = [observations[:, i] for i in 1:size(observations, 2)]
observations_x1 = [_obs[1] for _obs in observations]
observations_x2 = [_obs[2] for _obs in observations]
granularity_2d = 100
x1_range = LinRange(minimum(observations_x1), maximum(observations_x1), granularity_2d)
x2_range = LinRange(minimum(observations_x2), maximum(observations_x2), granularity_2d)
x_grid = [[_x1, _x2] for _x1 in x1_range for _x2 in x2_range]
y_grid = []
for bw in bws
    kde = KDEMulti(dims, bw, observations)
    y = [MultiKDE.pdf(kde, _x) for _x in x_grid]
    push!(y_grid, y)
end

# Plot
highest = maximum([maximum(y) for y in y_grid])
plot([_x[1] for _x in x_grid], [_x[2] for _x in x_grid], y_grid, label=[bw[1] for bw in bws][:, :]', size=(900, 450), legend=:outertopright)
plot!(observations_x1, observations_x2, [highest for _ in 1:length(observations)], seriestype=:scatter, label="observations")

2d KDE visualization

Post

MultiKDE.jl: A Lazy Evaluation Multivariate Kernel Density Estimator

Liscense

Licensed under MIT Liscense.

Contact

[email protected]

Open Source Agenda is not affiliated with "MultiKDE.jl" Project. README Source: pizhn/MultiKDE.jl
Stars
33
Open Issues
9
Last Commit
6 months ago
Repository
License
MIT

Open Source Agenda Badge

Open Source Agenda Rating