remi.mahmoud@institut-agro.fr
JDS 2025
Joint work with D. Causeur
2025-06-05
Context
Functional ANOVA (fANOVA)
Local fANOVA
Simulation study
Application to real-world data
Discussion and perspectives
The functional linear model:
\(t \in {\cal T}\), the following model is assumed:
\[\begin{aligned} Y ( t ) & = & \beta_{0} ( t ) + \beta_{1} (t) x_{1} + \ldots + \beta_{p} ( t ) x_{p} + \varepsilon ( t ) , \label{fANOVAmod} \end{aligned}\]
where:
Pointwise null hypothesis: \(H_{0t}: \beta(t) = 0, \ \forall t \in \cal T\)
Global null hypothesis: \(H_0 = \{H_{0t}, \ t \in \cal T \}\)
A decomposition to take time dependance into account
A proposal made by Sheu et al. 2016 1 and used in Causeur et al. 2020 2
An example of significantly different curves:
We conclude in a difference between mean curves for each conditions…
But another question arises: where does this difference come from ?
Need to find a more accurate approach to conclude
In a nutshell
Inputs:
Remove all intervals from the time frame that are significant.
While the Fanova on the remaining whole time frame is not significant:
Remove all intervals from the time frame that are significant.
While the Fanova on the remaining whole time frame is not significant:
Remove all intervals from the time frame that are significant.
While the Fanova on the remaining whole time frame is not significant:
Remove all intervals from the time frame that are significant.
While the Fanova on the remaining whole time frame is not significant:
Remove all intervals from the time frame that are significant.
While the Fanova on the remaining whole time frame is not significant:
Underlying observations generated by the model: \[Y_{ij} = z_i^T \beta_j + \varepsilon_{ij}\]
with:
Parameters:
\[ \beta(t) = \begin{cases} \mathcal{GP}(m(t), K(t,t')) & \text{if } t \in ]0.3 ; 0.4] \ \cup \ ]0.7 ; 1] \\ 0 & \text{else} \end{cases} \]
with covariance function \(K(t,t') = 0.1*e^{-0.3 |t - t'|}\) and
\[ m(t) = \mathbb{1}_{t \in ]0.3, 0.4] } +0.5 \times \mathbb{1}_{t \in ]0.7, 1] }\]
Lasso regression (Tibshirani):
\(\hat{\beta} = \underset{\beta}{\text{argmin}}(\sum_1^n (y_i - \sum_1^p\beta_j x_{ij})^2 + \lambda \sum_1^p |\beta_k|)\)
Multiple testing procedure (BH correction)
Interval wise testing procedure (Pini et al. 20181)
The moment that every statistician waits for / fears
Fusarium head blight: Fungal disease \(\Rightarrow\) lot of damages (yield / food safety / added value)
Mycotoxins emitted by Fusarium (ex. Nivalénol (NIV))
Example with 320 farms sharing the same agronomic practices
Application of local fANOVA in our case:
\(Y\) is a time serie of a climate variable (ex. water excess) and \(x_1 = \mathbb{1}_{\text{NIV > legal threshold}}\)
Development version available on Github/RemiMahmoud/TrustMe
This is an ongoing work
Promising but contrasting simulation results
Convincing application on some real world cases
Some issues still need to be fixed (choice of hyperparameters, situations of bad results etc.).
Observations represented by curves or functions
Old but new: original studies by Grenander & Rao (1948 / 1952 resp) but huge works from Ramsay and Silverman 20051
Similarly to NN, has arosen in the last years because of higher data availability and computing abilities
Recall of the goal (in our case):
Let \((I_i)_{i = 1,\cdots,m}\) the collection of intervals linked to this difference
(Reminder) In our simulation study, \(m = 2, \ I_1 = ]0.3,0.4] \text{ and } I_2 = ]0.7,1]\)
Metrics used
Mean Overlap: \(\text{Mean Overlap} = \frac{1}{m} \sum_{i=1}^{m} \frac{\text{# points selected in } I_i}{|I_i|}\)
\(\text{Sensitivity} = \frac{\text{TP}}{\text{TP + FN}}\)
\(F1 = \frac{2 \times \text{Mean Overlap} \times \text{Sensitivity}}{\text{Mean Overlap} + \text{Sensitivity}}\)
Other metrics exist (mean distance to closest interval, FDR etc.)
Pointwise null hypothesis: \(H_{0t}: \beta(t) = 0, \ \forall t \in \cal T\)
Global null hypothesis: \(H_0 = \{H_{0t}, \ t \in \cal T \}\)
A common approach:
But generally time dependance not taken into account (Shen et al. 2016) \(\Rightarrow\) Increase risk of type-I error !