A-quick-tour-of-StMoE

Introduction

StMoE (Skew-t Mixture-of-Experts) provides a flexible and robust modelling framework for heterogenous data with possibly skewed, heavy-tailed distributions and corrupted by atypical observations. StMoE consists of a mixture of K skew-t expert regressors network (of degree p) gated by a softmax gating network (of degree q) and is represented by:

  • The gating network parameters alpha’s of the softmax net.
  • The experts network parameters: The location parameters (regression coefficients) beta’s, scale parameters sigma’s, the skewness parameters lambda’s and the degree of freedom parameters nu’s. StMoE thus generalises mixtures of (normal, skew-normal, t, and skew-t) distributions and mixtures of regressions with these distributions. For example, when q = 0, we retrieve mixtures of (skew-t, t-, skew-normal, or normal) regressions, and when both p = 0 and q = 0, it is a mixture of (skew-t, t-, skew-normal, or normal) distributions. It also reduces to the standard (normal, skew-normal, t, and skew-t) distribution when we only use a single expert (K = 1).

Model estimation/learning is performed by a dedicated expectation conditional maximization (ECM) algorithm by maximizing the observed data log-likelihood. We provide simulated examples to illustrate the use of the model in model-based clustering of heterogeneous regression data and in fitting non-linear regression functions.

It was written in R Markdown, using the knitr package for production.

See help(package="meteorits") for further details and references provided by citation("meteorits").

Application to a simulated dataset

Generate sample

n <- 500 # Size of the sample
alphak <- matrix(c(0, 8), ncol = 1) # Parameters of the gating network
betak <- matrix(c(0, -2.5, 0, 2.5), ncol = 2) # Regression coefficients of the experts
sigmak <- c(0.5, 0.5) # Standard deviations of the experts
lambdak <- c(3, 5) # Skewness parameters of the experts
nuk <- c(5, 7) # Degrees of freedom of the experts network t densities
x <- seq.int(from = -1, to = 1, length.out = n) # Inputs (predictors)

# Generate sample of size n
sample <- sampleUnivStMoE(alphak = alphak, betak = betak, sigmak = sigmak, 
                          lambdak = lambdak, nuk = nuk, x = x)
y <- sample$y

Set up StMoE model parameters

K <- 2 # Number of regressors/experts
p <- 1 # Order of the polynomial regression (regressors/experts)
q <- 1 # Order of the logistic regression (gating network)

Set up EM parameters

n_tries <- 1
max_iter <- 1500
threshold <- 1e-5
verbose <- TRUE
verbose_IRLS <- FALSE

Estimation

stmoe <- emStMoE(X = x, Y = y, K, p, q, n_tries, max_iter, 
                 threshold, verbose, verbose_IRLS)
## EM - StMoE: Iteration: 1 | log-likelihood: -398.43213238129
## EM - StMoE: Iteration: 2 | log-likelihood: -336.676534340666
## EM - StMoE: Iteration: 3 | log-likelihood: -331.272147678873
## EM - StMoE: Iteration: 4 | log-likelihood: -327.694558555582
## EM - StMoE: Iteration: 5 | log-likelihood: -324.757105391609
## EM - StMoE: Iteration: 6 | log-likelihood: -322.062813272894
## EM - StMoE: Iteration: 7 | log-likelihood: -319.477011257783
## EM - StMoE: Iteration: 8 | log-likelihood: -316.958338810191
## EM - StMoE: Iteration: 9 | log-likelihood: -314.506332689806
## EM - StMoE: Iteration: 10 | log-likelihood: -312.150770635552
## EM - StMoE: Iteration: 11 | log-likelihood: -309.918673539302
## EM - StMoE: Iteration: 12 | log-likelihood: -307.831161405797
## EM - StMoE: Iteration: 13 | log-likelihood: -305.910059796075
## EM - StMoE: Iteration: 14 | log-likelihood: -304.168736298827
## EM - StMoE: Iteration: 15 | log-likelihood: -302.61521120093
## EM - StMoE: Iteration: 16 | log-likelihood: -301.233137154752
## EM - StMoE: Iteration: 17 | log-likelihood: -300.007250093681
## EM - StMoE: Iteration: 18 | log-likelihood: -298.929989592741
## EM - StMoE: Iteration: 19 | log-likelihood: -297.985398411848
## EM - StMoE: Iteration: 20 | log-likelihood: -297.159889865796
## EM - StMoE: Iteration: 21 | log-likelihood: -296.439329918919
## EM - StMoE: Iteration: 22 | log-likelihood: -295.81377318565
## EM - StMoE: Iteration: 23 | log-likelihood: -295.272843610563
## EM - StMoE: Iteration: 24 | log-likelihood: -294.805301515864
## EM - StMoE: Iteration: 25 | log-likelihood: -294.396951634114
## EM - StMoE: Iteration: 26 | log-likelihood: -294.03865118935
## EM - StMoE: Iteration: 27 | log-likelihood: -293.724199906059
## EM - StMoE: Iteration: 28 | log-likelihood: -293.447294858881
## EM - StMoE: Iteration: 29 | log-likelihood: -293.203227116674
## EM - StMoE: Iteration: 30 | log-likelihood: -292.987490686194
## EM - StMoE: Iteration: 31 | log-likelihood: -292.796733860318
## EM - StMoE: Iteration: 32 | log-likelihood: -292.628313952162
## EM - StMoE: Iteration: 33 | log-likelihood: -292.480241418299
## EM - StMoE: Iteration: 34 | log-likelihood: -292.349771257304
## EM - StMoE: Iteration: 35 | log-likelihood: -292.234661134085
## EM - StMoE: Iteration: 36 | log-likelihood: -292.132800735838
## EM - StMoE: Iteration: 37 | log-likelihood: -292.042889967497
## EM - StMoE: Iteration: 38 | log-likelihood: -291.964611066828
## EM - StMoE: Iteration: 39 | log-likelihood: -291.896459623868
## EM - StMoE: Iteration: 40 | log-likelihood: -291.836957692395
## EM - StMoE: Iteration: 41 | log-likelihood: -291.784907790534
## EM - StMoE: Iteration: 42 | log-likelihood: -291.739318760895
## EM - StMoE: Iteration: 43 | log-likelihood: -291.699354898641
## EM - StMoE: Iteration: 44 | log-likelihood: -291.664586280772
## EM - StMoE: Iteration: 45 | log-likelihood: -291.634520112614
## EM - StMoE: Iteration: 46 | log-likelihood: -291.608485861599
## EM - StMoE: Iteration: 47 | log-likelihood: -291.585921455668
## EM - StMoE: Iteration: 48 | log-likelihood: -291.566579995594
## EM - StMoE: Iteration: 49 | log-likelihood: -291.550607136155
## EM - StMoE: Iteration: 50 | log-likelihood: -291.537586133563
## EM - StMoE: Iteration: 51 | log-likelihood: -291.527312062964
## EM - StMoE: Iteration: 52 | log-likelihood: -291.519227355934
## EM - StMoE: Iteration: 53 | log-likelihood: -291.512767347862
## EM - StMoE: Iteration: 54 | log-likelihood: -291.50758954475
## EM - StMoE: Iteration: 55 | log-likelihood: -291.50362504771
## EM - StMoE: Iteration: 56 | log-likelihood: -291.500637533478
## EM - StMoE: Iteration: 57 | log-likelihood: -291.498436628236

Summary

stmoe$summary()
## ------------------------------------------
## Fitted Skew t Mixture-of-Experts model
## ------------------------------------------
## 
## StMoE model with K = 2 experts:
## 
##  log-likelihood df       AIC       BIC      ICL
##       -291.4984 12 -303.4984 -328.7861 -330.052
## 
## Clustering table (Number of observations in each expert):
## 
##   1   2 
## 249 251 
## 
## Regression coefficients:
## 
##     Beta(k = 1)  Beta(k = 2)
## 1   -0.01731038 -0.008204154
## X^1  2.54462467 -2.662526958
## 
## Variances:
## 
##  Sigma2(k = 1) Sigma2(k = 2)
##      0.4699626     0.4833262

Plots

Mean curve

stmoe$plot(what = "meancurve")

Confidence regions

stmoe$plot(what = "confregions")

Clusters

stmoe$plot(what = "clusters")

Log-likelihood

stmoe$plot(what = "loglikelihood")

Application to real dataset

Load data

library(MASS)
data("mcycle")
x <- mcycle$times
y <- mcycle$accel

Set up StMoE model parameters

K <- 4 # Number of regressors/experts
p <- 2 # Order of the polynomial regression (regressors/experts)
q <- 1 # Order of the logistic regression (gating network)

Set up EM parameters

n_tries <- 1
max_iter <- 1500
threshold <- 1e-5
verbose <- TRUE
verbose_IRLS <- FALSE

Estimation

stmoe <- emStMoE(X = x, Y = y, K, p, q, n_tries, max_iter, 
                 threshold, verbose, verbose_IRLS)
## EM - StMoE: Iteration: 1 | log-likelihood: -599.704686306091
## EM - StMoE: Iteration: 2 | log-likelihood: -589.177245024554
## EM - StMoE: Iteration: 3 | log-likelihood: -584.258664980734
## EM - StMoE: Iteration: 4 | log-likelihood: -579.780708380152
## EM - StMoE: Iteration: 5 | log-likelihood: -572.593937103229
## EM - StMoE: Iteration: 6 | log-likelihood: -567.533624962108
## EM - StMoE: Iteration: 7 | log-likelihood: -564.916531103629
## EM - StMoE: Iteration: 8 | log-likelihood: -563.636338240914
## EM - StMoE: Iteration: 9 | log-likelihood: -563.032972903022
## EM - StMoE: Iteration: 10 | log-likelihood: -562.708229702697
## EM - StMoE: Iteration: 11 | log-likelihood: -562.442578375562
## EM - StMoE: Iteration: 12 | log-likelihood: -562.139554992207
## EM - StMoE: Iteration: 13 | log-likelihood: -561.797347356582
## EM - StMoE: Iteration: 14 | log-likelihood: -561.51456624407
## EM - StMoE: Iteration: 15 | log-likelihood: -561.347541244204
## EM - StMoE: Iteration: 16 | log-likelihood: -561.243370031966
## EM - StMoE: Iteration: 17 | log-likelihood: -561.157438189918
## EM - StMoE: Iteration: 18 | log-likelihood: -561.076091859316
## EM - StMoE: Iteration: 19 | log-likelihood: -560.99604130374
## EM - StMoE: Iteration: 20 | log-likelihood: -560.916525911356
## EM - StMoE: Iteration: 21 | log-likelihood: -560.837227654519
## EM - StMoE: Iteration: 22 | log-likelihood: -560.758231401767
## EM - StMoE: Iteration: 23 | log-likelihood: -560.683539906393
## EM - StMoE: Iteration: 24 | log-likelihood: -560.610516088278
## EM - StMoE: Iteration: 25 | log-likelihood: -560.538884295335
## EM - StMoE: Iteration: 26 | log-likelihood: -560.468316675564
## EM - StMoE: Iteration: 27 | log-likelihood: -560.398709930566
## EM - StMoE: Iteration: 28 | log-likelihood: -560.329540180392
## EM - StMoE: Iteration: 29 | log-likelihood: -560.260433321012
## EM - StMoE: Iteration: 30 | log-likelihood: -560.191166264344
## EM - StMoE: Iteration: 31 | log-likelihood: -560.120991910242
## EM - StMoE: Iteration: 32 | log-likelihood: -560.049847078626
## EM - StMoE: Iteration: 33 | log-likelihood: -559.977164326401
## EM - StMoE: Iteration: 34 | log-likelihood: -559.902230594832
## EM - StMoE: Iteration: 35 | log-likelihood: -559.824827058357
## EM - StMoE: Iteration: 36 | log-likelihood: -559.744030517946
## EM - StMoE: Iteration: 37 | log-likelihood: -559.659360773843
## EM - StMoE: Iteration: 38 | log-likelihood: -559.569978327468
## EM - StMoE: Iteration: 39 | log-likelihood: -559.475367095079
## EM - StMoE: Iteration: 40 | log-likelihood: -559.374397439111
## EM - StMoE: Iteration: 41 | log-likelihood: -559.265781830517
## EM - StMoE: Iteration: 42 | log-likelihood: -559.147803284952
## EM - StMoE: Iteration: 43 | log-likelihood: -559.018293837663
## EM - StMoE: Iteration: 44 | log-likelihood: -558.874956937144
## EM - StMoE: Iteration: 45 | log-likelihood: -558.715281913706
## EM - StMoE: Iteration: 46 | log-likelihood: -558.536888728305
## EM - StMoE: Iteration: 47 | log-likelihood: -558.337848254137
## EM - StMoE: Iteration: 48 | log-likelihood: -558.118686858278
## EM - StMoE: Iteration: 49 | log-likelihood: -557.889086486191
## EM - StMoE: Iteration: 50 | log-likelihood: -557.685599848232
## EM - StMoE: Iteration: 51 | log-likelihood: -557.55795315765
## EM - StMoE: Iteration: 52 | log-likelihood: -557.501772239777
## EM - StMoE: Iteration: 53 | log-likelihood: -557.47857461365
## EM - StMoE: Iteration: 54 | log-likelihood: -557.467718112311
## EM - StMoE: Iteration: 55 | log-likelihood: -557.461929524426
## EM - StMoE: Iteration: 56 | log-likelihood: -557.458368069358

Summary

stmoe$summary()
## ------------------------------------------
## Fitted Skew t Mixture-of-Experts model
## ------------------------------------------
## 
## StMoE model with K = 4 experts:
## 
##  log-likelihood df       AIC       BIC     ICL
##       -557.4584 30 -587.4584 -630.8136 -630.81
## 
## Clustering table (Number of observations in each expert):
## 
##  1  2  3  4 
## 28 37 31 37 
## 
## Regression coefficients:
## 
##     Beta(k = 1) Beta(k = 2) Beta(k = 3) Beta(k = 4)
## 1   -2.85744704 1019.297923 -1806.17744 309.7337234
## X^1  0.65069122 -107.140081   111.18090 -12.8682522
## X^2 -0.06515735    2.520794    -1.66483   0.1320431
## 
## Variances:
## 
##  Sigma2(k = 1) Sigma2(k = 2) Sigma2(k = 3) Sigma2(k = 4)
##       10.83015      445.0682      572.2567      478.8128

Plots

Mean curve

stmoe$plot(what = "meancurve")

Confidence regions

stmoe$plot(what = "confregions")

Clusters

stmoe$plot(what = "clusters")

Log-likelihood

stmoe$plot(what = "loglikelihood")