TMoE (t Mixture-of-Experts) provides a flexible and robust modelling framework for heterogenous data with possibly heavy-tailed distributions and corrupted by atypical observations. TMoE consists of a mixture of K t expert regressors network (of degree p) gated by a softmax gating network (of degree q) and is represented by:
alpha
’s of the softmax
net.beta
’s, scale parameters
sigma
’s, and the degree of freedom (robustness) parameters
nu
’s. TMoE thus generalises mixtures of
(normal, t, and) distributions and mixtures of regressions with these
distributions. For example, when q = 0, we retrieve mixtures of (t-,
or normal) regressions, and when both p = 0 and q = 0, it is a mixture of (t-, or
normal) distributions. It also reduces to the standard (normal, t)
distribution when we only use a single expert (K = 1).Model estimation/learning is performed by a dedicated expectation conditional maximization (ECM) algorithm by maximizing the observed data log-likelihood. We provide simulated examples to illustrate the use of the model in model-based clustering of heterogeneous regression data and in fitting non-linear regression functions.
It was written in R Markdown, using the knitr package for production.
See help(package="meteorits")
for further details and
references provided by citation("meteorits")
.
n <- 500 # Size of the sample
alphak <- matrix(c(0, 8), ncol = 1) # Parameters of the gating network
betak <- matrix(c(0, -2.5, 0, 2.5), ncol = 2) # Regression coefficients of the experts
sigmak <- c(0.5, 0.5) # Standard deviations of the experts
nuk <- c(5, 7) # Degrees of freedom of the experts network t densities
x <- seq.int(from = -1, to = 1, length.out = n) # Inputs (predictors)
# Generate sample of size n
sample <- sampleUnivTMoE(alphak = alphak, betak = betak, sigmak = sigmak,
nuk = nuk, x = x)
y <- sample$y
tmoe <- emTMoE(X = x, Y = y, K, p, q, n_tries, max_iter,
threshold, verbose, verbose_IRLS)
## EM - tMoE: Iteration: 1 | log-likelihood: -510.857949749278
## EM - tMoE: Iteration: 2 | log-likelihood: -508.030146592121
## EM - tMoE: Iteration: 3 | log-likelihood: -507.907746851524
## EM - tMoE: Iteration: 4 | log-likelihood: -507.823082907923
## EM - tMoE: Iteration: 5 | log-likelihood: -507.748229453981
## EM - tMoE: Iteration: 6 | log-likelihood: -507.68245690614
## EM - tMoE: Iteration: 7 | log-likelihood: -507.62526605652
## EM - tMoE: Iteration: 8 | log-likelihood: -507.576019967359
## EM - tMoE: Iteration: 9 | log-likelihood: -507.533992016408
## EM - tMoE: Iteration: 10 | log-likelihood: -507.498414682194
## EM - tMoE: Iteration: 11 | log-likelihood: -507.468519027618
## EM - tMoE: Iteration: 12 | log-likelihood: -507.443564290421
## EM - tMoE: Iteration: 13 | log-likelihood: -507.422858245088
## EM - tMoE: Iteration: 14 | log-likelihood: -507.405769629993
## EM - tMoE: Iteration: 15 | log-likelihood: -507.391734196566
## EM - tMoE: Iteration: 16 | log-likelihood: -507.380255944792
## EM - tMoE: Iteration: 17 | log-likelihood: -507.370904960124
## EM - tMoE: Iteration: 18 | log-likelihood: -507.36331303975
## EM - tMoE: Iteration: 19 | log-likelihood: -507.357168046792
## EM - tMoE: Iteration: 20 | log-likelihood: -507.352207694412
tmoe$summary()
## -------------------------------------
## Fitted t Mixture-of-Experts model
## -------------------------------------
##
## tMoE model with K = 2 experts:
##
## log-likelihood df AIC BIC ICL
## -507.3522 10 -517.3522 -538.4252 -538.4199
##
## Clustering table (Number of observations in each expert):
##
## 1 2
## 249 251
##
## Regression coefficients:
##
## Beta(k = 1) Beta(k = 2)
## 1 0.3023267 0.09336893
## X^1 2.7823902 -2.57016258
##
## Variances:
##
## Sigma2(k = 1) Sigma2(k = 2)
## 0.2936581 0.4678748
tmoe <- emTMoE(X = x, Y = y, K, p, q, n_tries, max_iter,
threshold, verbose, verbose_IRLS)
## EM - tMoE: Iteration: 1 | log-likelihood: -584.092978139093
## EM - tMoE: Iteration: 2 | log-likelihood: -582.848533732287
## EM - tMoE: Iteration: 3 | log-likelihood: -582.016208933981
## EM - tMoE: Iteration: 4 | log-likelihood: -579.12531689739
## EM - tMoE: Iteration: 5 | log-likelihood: -570.419734564476
## EM - tMoE: Iteration: 6 | log-likelihood: -563.030795041062
## EM - tMoE: Iteration: 7 | log-likelihood: -559.923839159163
## EM - tMoE: Iteration: 8 | log-likelihood: -559.147698399153
## EM - tMoE: Iteration: 9 | log-likelihood: -558.52750442801
## EM - tMoE: Iteration: 10 | log-likelihood: -557.787943206678
## EM - tMoE: Iteration: 11 | log-likelihood: -556.922438762561
## EM - tMoE: Iteration: 12 | log-likelihood: -555.939799895987
## EM - tMoE: Iteration: 13 | log-likelihood: -554.914309513033
## EM - tMoE: Iteration: 14 | log-likelihood: -553.988446366955
## EM - tMoE: Iteration: 15 | log-likelihood: -553.217918445533
## EM - tMoE: Iteration: 16 | log-likelihood: -552.582501953256
## EM - tMoE: Iteration: 17 | log-likelihood: -552.080571089671
## EM - tMoE: Iteration: 18 | log-likelihood: -551.710901038979
## EM - tMoE: Iteration: 19 | log-likelihood: -551.454878088029
## EM - tMoE: Iteration: 20 | log-likelihood: -551.284525267856
## EM - tMoE: Iteration: 21 | log-likelihood: -551.173587681486
## EM - tMoE: Iteration: 22 | log-likelihood: -551.101927775332
## EM - tMoE: Iteration: 23 | log-likelihood: -551.055647115517
## EM - tMoE: Iteration: 24 | log-likelihood: -551.025630768823
## EM - tMoE: Iteration: 25 | log-likelihood: -551.00601969915
## EM - tMoE: Iteration: 26 | log-likelihood: -550.993086421861
## EM - tMoE: Iteration: 27 | log-likelihood: -550.984461870635
## EM - tMoE: Iteration: 28 | log-likelihood: -550.978636093769
## EM - tMoE: Iteration: 29 | log-likelihood: -550.974642179188
tmoe$summary()
## -------------------------------------
## Fitted t Mixture-of-Experts model
## -------------------------------------
##
## tMoE model with K = 4 experts:
##
## log-likelihood df AIC BIC ICL
## -550.9746 26 -576.9746 -614.5492 -614.5453
##
## Clustering table (Number of observations in each expert):
##
## 1 2 3 4
## 28 37 31 37
##
## Regression coefficients:
##
## Beta(k = 1) Beta(k = 2) Beta(k = 3) Beta(k = 4)
## 1 -1.050055969 1010.20912 -1800.37898 301.1413005
## X^1 -0.101448548 -105.87776 110.73788 -12.5201956
## X^2 -0.008690962 2.48639 -1.65681 0.1284768
##
## Variances:
##
## Sigma2(k = 1) Sigma2(k = 2) Sigma2(k = 3) Sigma2(k = 4)
## 1.654173 438.9931 578.4052 524.2891