Grapevine LeafTraits multisensor NIR. v2.0 standardized NIRS package: 4 spectral source(s), 31 declared target(s). Auto-generated from dataset_card.json (verify before publication).
| Intégrité | 0.00 |
|---|---|
| Artefacts locaux | 1.00 |
| Bruit | 0.01 |
| Outliers PCA | 0.64 |
| Distance à la référence | 0.47 |
| Répétabilité | 0.00 |
| Baseline / forme | 0.80 |
| Structure multi-régimes | 0.90 |
| Diagnostic | Score | Force | Signaux | Interprétation probable |
|---|---|---|---|---|
| Splice / raccord détecteursX4 | 0.74 | forte | Spike rate 1.00, Jump rate 1.00, SNR non dégradé 1.00 | Rupture aux jonctions de détecteurs, calibration locale ou sonde différente. |
| Erreur calibration / référence blancheX2 | 0.72 | moyenne | Mahalanobis / T2 1.00, Baseline/mean/area 1.00, artefacts locaux 1.00 | Décalage systématique entre campagnes, instruments ou référence blanche. |
| Fond différentX2 | 0.68 | moyenne | Mahalanobis / T2 1.00, Baseline/mean/area 1.00, PCA Q 0.54 | Effet systématique du support, blanc/noir, transflectance ou environnement de mesure. |
| Spectre hors domaine valideX2 | 0.64 | moyenne | Mahalanobis / T2 1.00, Structure PCA 1.00, RMS/SAM référence 0.52 | Variété, espèce, lot ou condition différente mais physiquement plausible. |
| Erreur interpolation / rééchantillonnageX1 | 0.63 | moyenne | Spike rate 1.00, Jump rate 1.00, SNR normal/élevé 1.00 | Artefacts numériques ou traitement spectral incorrect. |
| Différence de sonde / géométrieX2 | 0.59 | moyenne | Mahalanobis / T2 1.00, Baseline/mean/area 1.00, PCA Q 0.54 | Modification de l'illumination, collecte, angle ou distance sonde-échantillon. |
| Signature VERA25-likeX4 | 0.56 | moyenne | Spike rate 1.00, Jump rate 1.00, RMS/SAM référence 0.61 | Combinaison possible changement de sonde + splice, amplifiée par géométrie, fond ou calibration. |
| Dataset multi-régimesX2 | 0.55 | moyenne | Structure PCA 1.00, Mahalanobis / T2 1.00, PCA Q 0.54 | Mélange de campagnes, opérateurs, lots, setups ou sous-populations spectrales. |
| Wavelengths | 125 |
|---|---|
| Axis range | 908.1–1676 none |
| Mean spacing | 6.19 none |
| Grid | uniform |
| Observations | 2,079 |
| Value range | 0.183 – 1.14 |
|---|---|
| Mean range | 0.291 – 0.922 |
| Mean level | 0.6891 |
| Area | 529.6 |
| PTP | 0.6303 |
| Noise RMS | 0.00038862 |
| SNR | 1.8e+03 |
| SNR dB | 6e+01 dB |
| Dynamic range | 0.63 |
| Smoothness | 0.003526 |
| Saturated | 0.0% |
| X-outliers | 839 |
| NaN ratio | 0.00% |
|---|---|
| Inf count | 0 |
| Zero ratio | 0.00% |
| Spike count | 16,136 |
| Spike rate | 6.31% |
| Jump count | 3,993 |
| Jump rate | 1.55% |
| Clip fraction | 0.00% |
| Baseline slope | -0.56517 |
|---|---|
| Curvature RMS | 0.0034567 |
| D1 RMS | 0.015147 |
| RMS to mean | 0.028612 |
| RMS p95 | 0.069466 |
| SAM to mean | 0.018983 |
| SAM p95 | 0.047382 |
| Affine offset p95 | 0.11049 |
| Affine gain p95 Δ | 0.099355 |
| Affine residual p95 | 0.013675 |
| Xcorr lag p95 | 0 |
| PCA Q p95/median | 3.9 |
|---|---|
| Hotelling T2 p95/median | 4.4 |
| Mahalanobis H p95/median | 2.1 |
| Repeat groups | 0 |
| Effective rank | 1.4 |
|---|---|
| PCs → 95% var | 2 |
| PCs → 99% var | 3 |
| Top-10 cum. var | 100.0% |
| Famille | Métrique calculée | Valeur | Score | Niveau | Interprétation dataset | Causes typiques | Calcul / scoring |
|---|---|---|---|---|---|---|---|
| Intégrité des données | NaN ratiointegrity.nan_ratio | 0% | 0.00 | faible | Spectre complet | Erreur acquisition/export | count(isnan(X)) / X.sizealert = min(1, nan_ratio / 0.05) |
| Intégrité des données | Inf countintegrity.inf_count | 0 | 0.00 | faible | Normal | Calculs invalides | count(isinf(X))alert = min(1, inf_count / 1) |
| Intégrité des données | Zero ratiointegrity.zero_ratio | 0% | 0.00 | faible | Normal | Export, saturation | count(X == 0) / count(finite X)alert = min(1, zero_ratio / 0.05) |
| Amplitude globale | Mean reflectanceamplitude.mean_reflectance | 0.68911 | 1.00 | fort | Valeur atypique: Trop clair / fond visible ou Trop sombre | Fond, géométrie | mean(X finite)alert reuses baseline/shape drift because absolute reflectance ranges are technology-dependent |
| Amplitude globale | Area under curveamplitude.area_under_curve | 529.59 | 1.00 | fort | Valeur atypique: Différence d'éclairement ou Normal | Distance sonde | trapezoid(mean_spectrum, spectral_axis)alert reuses baseline/shape drift because area scale depends on axis and units |
| Amplitude globale | Peak-to-peak (PTP)amplitude.peak_to_peak | 0.63025 | 0.00 | faible | Variabilité forte | Saturation | max(mean_spectrum) - min(mean_spectrum)alert increases when dynamic range is abnormally flat |
| Amplitude globale | Varianceamplitude.variance | 0.04683 | 0.00 | faible | Normal ou hétérogène | Mauvais contact | var(X finite)alert increases when variance/dynamic range is abnormally flat |
| Bruit | Noise RMSnoise.noise_rms | 0.00038862 | 0.01 | faible | Stable | Lampe, détecteur | median MAD(second derivative) * 1.4826 / sqrt(6)alert = noise_rms / signal_scale, saturated at 5% |
| Bruit | SNRnoise.snr | 1773.2 | 0.00 | faible | Bon signal | Acquisition | mean(abs(X)) / noise_rmsalert decreases with SNR dB; >=40 dB is treated as low alert |
| Bruit | Bandwise SNRnoise.bandwise_snr_min | 762.29 | 0.00 | faible | Zone fiable | Détecteur | min(abs(mean_spectrum) / local second-derivative noise)alert decreases with worst-band SNR dB; >=35 dB is treated as low alert |
| Artefacts locaux | Spike countartefacts.spike_count | 16,136 | 1.00 | fort | Artefacts | Cosmic rays, splice | count robust outliers in second derivativealert follows spike_rate, saturated at 1% |
| Artefacts locaux | Spike rateartefacts.spike_rate | 6.31% | 1.00 | fort | Spectre suspect | Interpolation | spike_count / (n_samples * (n_features - 2))alert = min(1, spike_rate / 0.01) |
| Artefacts locaux | Jump countartefacts.jump_count | 3,993 | 1.00 | fort | Raccord détecteur | Splice | count robust outliers in first derivativealert follows jump_rate, saturated at 1% |
| Artefacts locaux | Jump rateartefacts.jump_rate | 1.55% | 1.00 | fort | Problème spectral | Calibration | jump_count / (n_samples * (n_features - 1))alert = min(1, jump_rate / 0.01) |
| Artefacts locaux | Clip fractionartefacts.clip_fraction | 0.00077% | 0.00 | faible | Normal | Détecteur saturé | fraction of finite cells equal to repeated min/max extremaalert = min(1, clip_fraction / 0.01) |
| Forme spectrale | Baseline slopeshape.baseline_slope | -0.56517 | 1.00 | fort | Dérive | Éclairement | linear slope of mean_spectrum over normalized axisalert = abs(slope / signal_scale), saturated at 0.5 |
| Forme spectrale | Curvature RMSshape.curvature_rms | 0.0034567 | 0.50 | moyen | Forme inhabituelle | Fond, splice | median RMS(second derivative per spectrum)alert = curvature_rms / signal_scale, saturated at 1% |
| Forme spectrale | D1 RMSshape.d1_rms | 0.015147 | 0.44 | moyen | Spectre structuré | Biologie ou artefact | median RMS(first derivative per spectrum)alert = d1_rms / signal_scale, saturated at 5% |
| Outliers multivariés | PCA Q (SPE)outliers.pca_q_ratio | 3.936 | 0.49 | moyen | Spectre atypique | Artefact, mélange | p95(Q/SPE residual) / median(Q/SPE residual)alert = min(1, pca_q_ratio / 8) |
| Outliers multivariés | Hotelling T²outliers.hotelling_t2_ratio | 4.4206 | 0.55 | moyen | Extrême mais cohérent | Variabilité naturelle | p95(Hotelling T2) / median(Hotelling T2)alert = min(1, hotelling_t2_ratio / 8) |
| Outliers multivariés | Mahalanobis Houtliers.mahalanobis_h_ratio | 2.1025 | 0.53 | moyen | Outlier global | Domaine différent | p95(sqrt(T2)) / median(sqrt(T2))alert = min(1, mahalanobis_h_ratio / 4) |
| Comparaison à référence | RMS to mean spectrumreference.rms_to_mean_spectrum_p95 | 0.069466 | 0.40 | faible | Typique | Domain shift | p95 RMS distance to dataset mean spectrumalert = RMS_p95 / signal_scale, saturated at 25% |
| Comparaison à référence | Spectral Angle Mapper (SAM)reference.sam_to_mean_spectrum_p95 | 0.047382 | 0.14 | faible | Similaire | Fond, géométrie | p95 spectral angle to dataset mean spectrumalert = min(1, SAM_p95 / 0.35 rad) |
| Répétabilité | RMS intra-IDrepeatability.rms_intra_id | — | 0.00 | faible | Stable | Positionnement | median RMS distance to repeated-sample centroidalert = RMS_intra_ID / signal_scale, saturated at 10% |
| Répétabilité | SAM intra-IDrepeatability.sam_intra_id | — | 0.00 | faible | Stable | Acquisition | median SAM to repeated-sample centroidalert = min(1, SAM_intra_ID / 0.15 rad) |
| Répétabilité | CV intra-IDrepeatability.cv_intra_id | — | 0.00 | faible | Stable | Opérateur | median within-ID band CValert = min(1, CV_intra_ID / 0.25) |
| Structure du dataset | PCA score densitystructure.pca_score_density | 33.539 | 0.98 | fort | Sous-populations | Lots différents | 1 / median kNN distance in PCA score spacealert follows density_cv/profile structure complexity, not raw density alone |
| Structure du dataset | Local Outlier Factor (LOF)structure.local_outlier_factor_p95 | 2.9542 | 0.98 | fort | Spectre isolé | Cas rares | p95 approximate LOF from PCA-score kNN distancesalert = min(1, max(0, LOF_p95 - 1) / 2) |
| Structure du dataset | Isolation Forest scorestructure.isolation_forest_score_p95 | 0.58117 | 0.98 | fort | Spectre atypique | Diverses causes | p95 IsolationForest anomaly score on PCA scoresalert follows structure complexity; raw score is implementation-dependent |
| Target | max |r| | axis @ max | mean |r| | |r| ≥ .5 |
|---|---|---|---|---|
| WQ | 0.859 | 1.4e+03 | 0.712 | 85.6% |
| rh_s | 0.739 | 1.03e+03 | 0.609 | 76.0% |
| rh_r | 0.917 | 920 | 0.806 | 100.0% |
| Tref | 0.564 | 995 | 0.304 | 8.0% |
| Tleaf | 0.274 | 908 | 0.0683 | 0.0% |
| Qamb | 0.549 | 908 | 0.396 | 2.4% |
| Wavelengths | 257 |
|---|---|
| Axis range | 1350–2550 none |
| Mean spacing | 4.69 none |
| Grid | irregular |
| Observations | 2,079 |
| Value range | 0.0419 – 0.863 |
|---|---|
| Mean range | 0.0877 – 0.706 |
| Mean level | 0.354 |
| Area | 376.2 |
| PTP | 0.618 |
| Noise RMS | 0.0002045 |
| SNR | 1.7e+03 |
| SNR dB | 6e+01 dB |
| Dynamic range | 0.618 |
| Smoothness | 0.001708 |
| Saturated | 0.0% |
| X-outliers | 237 |
| NaN ratio | 0.00% |
|---|---|
| Inf count | 0 |
| Zero ratio | 0.00% |
| Spike count | 23,474 |
| Spike rate | 4.43% |
| Jump count | 0 |
| Jump rate | 0.00% |
| Clip fraction | 0.00% |
| Baseline slope | -0.38156 |
|---|---|
| Curvature RMS | 0.0017026 |
| D1 RMS | 0.010475 |
| RMS to mean | 0.028357 |
| RMS p95 | 0.08065 |
| SAM to mean | 0.022787 |
| SAM p95 | 0.054605 |
| Affine offset p95 | 0.040768 |
| Affine gain p95 Δ | 0.13414 |
| Affine residual p95 | 0.013998 |
| Xcorr lag p95 | 0 |
| PCA Q p95/median | 4.3 |
|---|---|
| Hotelling T2 p95/median | 8.6 |
| Mahalanobis H p95/median | 2.9 |
| Repeat groups | 0 |
| Effective rank | 1.2 |
|---|---|
| PCs → 95% var | 1 |
| PCs → 99% var | 3 |
| Top-10 cum. var | 100.0% |
| Famille | Métrique calculée | Valeur | Score | Niveau | Interprétation dataset | Causes typiques | Calcul / scoring |
|---|---|---|---|---|---|---|---|
| Intégrité des données | NaN ratiointegrity.nan_ratio | 0% | 0.00 | faible | Spectre complet | Erreur acquisition/export | count(isnan(X)) / X.sizealert = min(1, nan_ratio / 0.05) |
| Intégrité des données | Inf countintegrity.inf_count | 0 | 0.00 | faible | Normal | Calculs invalides | count(isinf(X))alert = min(1, inf_count / 1) |
| Intégrité des données | Zero ratiointegrity.zero_ratio | 0% | 0.00 | faible | Normal | Export, saturation | count(X == 0) / count(finite X)alert = min(1, zero_ratio / 0.05) |
| Amplitude globale | Mean reflectanceamplitude.mean_reflectance | 0.35402 | 1.00 | fort | Valeur atypique: Trop clair / fond visible ou Trop sombre | Fond, géométrie | mean(X finite)alert reuses baseline/shape drift because absolute reflectance ranges are technology-dependent |
| Amplitude globale | Area under curveamplitude.area_under_curve | 376.17 | 1.00 | fort | Valeur atypique: Différence d'éclairement ou Normal | Distance sonde | trapezoid(mean_spectrum, spectral_axis)alert reuses baseline/shape drift because area scale depends on axis and units |
| Amplitude globale | Peak-to-peak (PTP)amplitude.peak_to_peak | 0.61801 | 0.00 | faible | Variabilité forte | Saturation | max(mean_spectrum) - min(mean_spectrum)alert increases when dynamic range is abnormally flat |
| Amplitude globale | Varianceamplitude.variance | 0.03239 | 0.00 | faible | Normal ou hétérogène | Mauvais contact | var(X finite)alert increases when variance/dynamic range is abnormally flat |
| Bruit | Noise RMSnoise.noise_rms | 0.0002045 | 0.01 | faible | Stable | Lampe, détecteur | median MAD(second derivative) * 1.4826 / sqrt(6)alert = noise_rms / signal_scale, saturated at 5% |
| Bruit | SNRnoise.snr | 1731.1 | 0.00 | faible | Bon signal | Acquisition | mean(abs(X)) / noise_rmsalert decreases with SNR dB; >=40 dB is treated as low alert |
| Bruit | Bandwise SNRnoise.bandwise_snr_min | 197.18 | 0.00 | faible | Zone fiable | Détecteur | min(abs(mean_spectrum) / local second-derivative noise)alert decreases with worst-band SNR dB; >=35 dB is treated as low alert |
| Artefacts locaux | Spike countartefacts.spike_count | 23,474 | 1.00 | fort | Artefacts | Cosmic rays, splice | count robust outliers in second derivativealert follows spike_rate, saturated at 1% |
| Artefacts locaux | Spike rateartefacts.spike_rate | 4.43% | 1.00 | fort | Spectre suspect | Interpolation | spike_count / (n_samples * (n_features - 2))alert = min(1, spike_rate / 0.01) |
| Artefacts locaux | Jump countartefacts.jump_count | 0 | 0.00 | faible | Continu | Splice | count robust outliers in first derivativealert follows jump_rate, saturated at 1% |
| Artefacts locaux | Jump rateartefacts.jump_rate | 0% | 0.00 | faible | Normal | Calibration | jump_count / (n_samples * (n_features - 1))alert = min(1, jump_rate / 0.01) |
| Artefacts locaux | Clip fractionartefacts.clip_fraction | 0.000374% | 0.00 | faible | Normal | Détecteur saturé | fraction of finite cells equal to repeated min/max extremaalert = min(1, clip_fraction / 0.01) |
| Forme spectrale | Baseline slopeshape.baseline_slope | -0.38156 | 1.00 | fort | Dérive | Éclairement | linear slope of mean_spectrum over normalized axisalert = abs(slope / signal_scale), saturated at 0.5 |
| Forme spectrale | Curvature RMSshape.curvature_rms | 0.0017026 | 0.28 | faible | Lisse | Fond, splice | median RMS(second derivative per spectrum)alert = curvature_rms / signal_scale, saturated at 1% |
| Forme spectrale | D1 RMSshape.d1_rms | 0.010475 | 0.34 | faible | Plat | Biologie ou artefact | median RMS(first derivative per spectrum)alert = d1_rms / signal_scale, saturated at 5% |
| Outliers multivariés | PCA Q (SPE)outliers.pca_q_ratio | 4.2894 | 0.54 | moyen | Spectre atypique | Artefact, mélange | p95(Q/SPE residual) / median(Q/SPE residual)alert = min(1, pca_q_ratio / 8) |
| Outliers multivariés | Hotelling T²outliers.hotelling_t2_ratio | 8.6294 | 1.00 | fort | Extrême mais cohérent | Variabilité naturelle | p95(Hotelling T2) / median(Hotelling T2)alert = min(1, hotelling_t2_ratio / 8) |
| Outliers multivariés | Mahalanobis Houtliers.mahalanobis_h_ratio | 2.9376 | 0.73 | fort | Outlier global | Domaine différent | p95(sqrt(T2)) / median(sqrt(T2))alert = min(1, mahalanobis_h_ratio / 4) |
| Comparaison à référence | RMS to mean spectrumreference.rms_to_mean_spectrum_p95 | 0.08065 | 0.52 | moyen | Spectre différent | Domain shift | p95 RMS distance to dataset mean spectrumalert = RMS_p95 / signal_scale, saturated at 25% |
| Comparaison à référence | Spectral Angle Mapper (SAM)reference.sam_to_mean_spectrum_p95 | 0.054605 | 0.16 | faible | Similaire | Fond, géométrie | p95 spectral angle to dataset mean spectrumalert = min(1, SAM_p95 / 0.35 rad) |
| Répétabilité | RMS intra-IDrepeatability.rms_intra_id | — | 0.00 | faible | Stable | Positionnement | median RMS distance to repeated-sample centroidalert = RMS_intra_ID / signal_scale, saturated at 10% |
| Répétabilité | SAM intra-IDrepeatability.sam_intra_id | — | 0.00 | faible | Stable | Acquisition | median SAM to repeated-sample centroidalert = min(1, SAM_intra_ID / 0.15 rad) |
| Répétabilité | CV intra-IDrepeatability.cv_intra_id | — | 0.00 | faible | Stable | Opérateur | median within-ID band CValert = min(1, CV_intra_ID / 0.25) |
| Structure du dataset | PCA score densitystructure.pca_score_density | 177.42 | 1.00 | fort | Sous-populations | Lots différents | 1 / median kNN distance in PCA score spacealert follows density_cv/profile structure complexity, not raw density alone |
| Structure du dataset | Local Outlier Factor (LOF)structure.local_outlier_factor_p95 | 6.133 | 1.00 | fort | Spectre isolé | Cas rares | p95 approximate LOF from PCA-score kNN distancesalert = min(1, max(0, LOF_p95 - 1) / 2) |
| Structure du dataset | Isolation Forest scorestructure.isolation_forest_score_p95 | 0.59134 | 1.00 | fort | Spectre atypique | Diverses causes | p95 IsolationForest anomaly score on PCA scoresalert follows structure complexity; raw score is implementation-dependent |
| Target | max |r| | axis @ max | mean |r| | |r| ≥ .5 |
|---|---|---|---|---|
| WQ | 0.853 | 2.23e+03 | 0.804 | 100.0% |
| rh_s | 0.386 | 1.41e+03 | 0.283 | 0.0% |
| rh_r | 0.847 | 1.89e+03 | 0.768 | 100.0% |
| Tref | 0.776 | 1.91e+03 | 0.19 | 7.0% |
| Tleaf | 0.41 | 1.92e+03 | 0.137 | 0.0% |
| Qamb | 0.445 | 1.4e+03 | 0.338 | 0.0% |
| Wavelengths | 276 |
|---|---|
| Axis range | 908.1–2550 none |
| Mean spacing | 5.97 none |
| Grid | irregular |
| Observations | 2,079 |
| Value range | -0.0373 – 1.14 |
|---|---|
| Mean range | 0.0761 – 0.922 |
| Mean level | 0.4647 |
| Area | 749.2 |
| PTP | 0.8457 |
| Noise RMS | 0.00029617 |
| SNR | 1.6e+03 |
| SNR dB | 6e+01 dB |
| Dynamic range | 0.846 |
| Smoothness | 0.002792 |
| Saturated | 0.0% |
| X-outliers | 792 |
| NaN ratio | 0.00% |
|---|---|
| Inf count | 0 |
| Zero ratio | 0.00% |
| Spike count | 39,144 |
| Spike rate | 6.87% |
| Jump count | 2,797 |
| Jump rate | 0.49% |
| Clip fraction | 0.00% |
| Baseline slope | -0.90159 |
|---|---|
| Curvature RMS | 0.0027695 |
| D1 RMS | 0.012716 |
| RMS to mean | 0.029728 |
| RMS p95 | 0.072858 |
| SAM to mean | 0.034256 |
| SAM p95 | 0.080953 |
| Affine offset p95 | 0.077528 |
| Affine gain p95 Δ | 0.068242 |
| Affine residual p95 | 0.025152 |
| Xcorr lag p95 | 0 |
| PCA Q p95/median | 3.4 |
|---|---|
| Hotelling T2 p95/median | 3.8 |
| Mahalanobis H p95/median | 1.9 |
| Repeat groups | 0 |
| Effective rank | 1.6 |
|---|---|
| PCs → 95% var | 3 |
| PCs → 99% var | 4 |
| Top-10 cum. var | 99.9% |
| Famille | Métrique calculée | Valeur | Score | Niveau | Interprétation dataset | Causes typiques | Calcul / scoring |
|---|---|---|---|---|---|---|---|
| Intégrité des données | NaN ratiointegrity.nan_ratio | 0% | 0.00 | faible | Spectre complet | Erreur acquisition/export | count(isnan(X)) / X.sizealert = min(1, nan_ratio / 0.05) |
| Intégrité des données | Inf countintegrity.inf_count | 0 | 0.00 | faible | Normal | Calculs invalides | count(isinf(X))alert = min(1, inf_count / 1) |
| Intégrité des données | Zero ratiointegrity.zero_ratio | 0% | 0.00 | faible | Normal | Export, saturation | count(X == 0) / count(finite X)alert = min(1, zero_ratio / 0.05) |
| Amplitude globale | Mean reflectanceamplitude.mean_reflectance | 0.46471 | 1.00 | fort | Valeur atypique: Trop clair / fond visible ou Trop sombre | Fond, géométrie | mean(X finite)alert reuses baseline/shape drift because absolute reflectance ranges are technology-dependent |
| Amplitude globale | Area under curveamplitude.area_under_curve | 749.2 | 1.00 | fort | Valeur atypique: Différence d'éclairement ou Normal | Distance sonde | trapezoid(mean_spectrum, spectral_axis)alert reuses baseline/shape drift because area scale depends on axis and units |
| Amplitude globale | Peak-to-peak (PTP)amplitude.peak_to_peak | 0.8457 | 0.00 | faible | Variabilité forte | Saturation | max(mean_spectrum) - min(mean_spectrum)alert increases when dynamic range is abnormally flat |
| Amplitude globale | Varianceamplitude.variance | 0.079173 | 0.00 | faible | Normal ou hétérogène | Mauvais contact | var(X finite)alert increases when variance/dynamic range is abnormally flat |
| Bruit | Noise RMSnoise.noise_rms | 0.00029617 | 0.01 | faible | Stable | Lampe, détecteur | median MAD(second derivative) * 1.4826 / sqrt(6)alert = noise_rms / signal_scale, saturated at 5% |
| Bruit | SNRnoise.snr | 1569.1 | 0.00 | faible | Bon signal | Acquisition | mean(abs(X)) / noise_rmsalert decreases with SNR dB; >=40 dB is treated as low alert |
| Bruit | Bandwise SNRnoise.bandwise_snr_min | 174.3 | 0.00 | faible | Zone fiable | Détecteur | min(abs(mean_spectrum) / local second-derivative noise)alert decreases with worst-band SNR dB; >=35 dB is treated as low alert |
| Artefacts locaux | Spike countartefacts.spike_count | 39,144 | 1.00 | fort | Artefacts | Cosmic rays, splice | count robust outliers in second derivativealert follows spike_rate, saturated at 1% |
| Artefacts locaux | Spike rateartefacts.spike_rate | 6.87% | 1.00 | fort | Spectre suspect | Interpolation | spike_count / (n_samples * (n_features - 2))alert = min(1, spike_rate / 0.01) |
| Artefacts locaux | Jump countartefacts.jump_count | 2,797 | 0.49 | moyen | Raccord détecteur | Splice | count robust outliers in first derivativealert follows jump_rate, saturated at 1% |
| Artefacts locaux | Jump rateartefacts.jump_rate | 0.489% | 0.49 | moyen | Problème spectral | Calibration | jump_count / (n_samples * (n_features - 1))alert = min(1, jump_rate / 0.01) |
| Artefacts locaux | Clip fractionartefacts.clip_fraction | 0.000349% | 0.00 | faible | Normal | Détecteur saturé | fraction of finite cells equal to repeated min/max extremaalert = min(1, clip_fraction / 0.01) |
| Forme spectrale | Baseline slopeshape.baseline_slope | -0.90159 | 1.00 | fort | Dérive | Éclairement | linear slope of mean_spectrum over normalized axisalert = abs(slope / signal_scale), saturated at 0.5 |
| Forme spectrale | Curvature RMSshape.curvature_rms | 0.0027695 | 0.33 | faible | Lisse | Fond, splice | median RMS(second derivative per spectrum)alert = curvature_rms / signal_scale, saturated at 1% |
| Forme spectrale | D1 RMSshape.d1_rms | 0.012716 | 0.30 | faible | Plat | Biologie ou artefact | median RMS(first derivative per spectrum)alert = d1_rms / signal_scale, saturated at 5% |
| Outliers multivariés | PCA Q (SPE)outliers.pca_q_ratio | 3.4102 | 0.43 | moyen | Spectre atypique | Artefact, mélange | p95(Q/SPE residual) / median(Q/SPE residual)alert = min(1, pca_q_ratio / 8) |
| Outliers multivariés | Hotelling T²outliers.hotelling_t2_ratio | 3.7927 | 0.47 | moyen | Extrême mais cohérent | Variabilité naturelle | p95(Hotelling T2) / median(Hotelling T2)alert = min(1, hotelling_t2_ratio / 8) |
| Outliers multivariés | Mahalanobis Houtliers.mahalanobis_h_ratio | 1.9475 | 0.49 | moyen | Outlier global | Domaine différent | p95(sqrt(T2)) / median(sqrt(T2))alert = min(1, mahalanobis_h_ratio / 4) |
| Comparaison à référence | RMS to mean spectrumreference.rms_to_mean_spectrum_p95 | 0.072858 | 0.34 | faible | Typique | Domain shift | p95 RMS distance to dataset mean spectrumalert = RMS_p95 / signal_scale, saturated at 25% |
| Comparaison à référence | Spectral Angle Mapper (SAM)reference.sam_to_mean_spectrum_p95 | 0.080953 | 0.23 | faible | Similaire | Fond, géométrie | p95 spectral angle to dataset mean spectrumalert = min(1, SAM_p95 / 0.35 rad) |
| Répétabilité | RMS intra-IDrepeatability.rms_intra_id | — | 0.00 | faible | Stable | Positionnement | median RMS distance to repeated-sample centroidalert = RMS_intra_ID / signal_scale, saturated at 10% |
| Répétabilité | SAM intra-IDrepeatability.sam_intra_id | — | 0.00 | faible | Stable | Acquisition | median SAM to repeated-sample centroidalert = min(1, SAM_intra_ID / 0.15 rad) |
| Répétabilité | CV intra-IDrepeatability.cv_intra_id | — | 0.00 | faible | Stable | Opérateur | median within-ID band CValert = min(1, CV_intra_ID / 0.25) |
| Structure du dataset | PCA score densitystructure.pca_score_density | 10.826 | 0.73 | fort | Sous-populations | Lots différents | 1 / median kNN distance in PCA score spacealert follows density_cv/profile structure complexity, not raw density alone |
| Structure du dataset | Local Outlier Factor (LOF)structure.local_outlier_factor_p95 | 2.1955 | 0.60 | moyen | Spectre isolé | Cas rares | p95 approximate LOF from PCA-score kNN distancesalert = min(1, max(0, LOF_p95 - 1) / 2) |
| Structure du dataset | Isolation Forest scorestructure.isolation_forest_score_p95 | 0.56305 | 0.73 | fort | Spectre atypique | Diverses causes | p95 IsolationForest anomaly score on PCA scoresalert follows structure complexity; raw score is implementation-dependent |
| Target | max |r| | axis @ max | mean |r| | |r| ≥ .5 |
|---|---|---|---|---|
| WQ | 0.859 | 1.4e+03 | 0.693 | 82.2% |
| rh_s | 0.739 | 1.03e+03 | 0.573 | 89.1% |
| rh_r | 0.917 | 920 | 0.678 | 83.7% |
| Tref | 0.564 | 995 | 0.316 | 5.1% |
| Tleaf | 0.274 | 908 | 0.0775 | 0.0% |
| Qamb | 0.547 | 908 | 0.35 | 1.1% |
| Wavelengths | 2,101 |
|---|---|
| Axis range | 400–2,500 none |
| Mean spacing | 1 none |
| Grid | uniform |
| Observations | 2,079 |
| Value range | -0.00114 – 1.23 |
|---|---|
| Mean range | 0.00928 – 0.81 |
| Mean level | 0.4862 |
| Area | 1021 |
| PTP | 0.8011 |
| Noise RMS | 1.1581e-05 |
| SNR | 4.2e+04 |
| SNR dB | 9e+01 dB |
| Dynamic range | 0.801 |
| Smoothness | 0.0001406 |
| Saturated | 0.0% |
| X-outliers | 593 |
| NaN ratio | 0.00% |
|---|---|
| Inf count | 0 |
| Zero ratio | 0.00% |
| Spike count | 189,107 |
| Spike rate | 4.33% |
| Jump count | 192,921 |
| Jump rate | 4.42% |
| Clip fraction | 0.00% |
| Baseline slope | -0.025943 |
|---|---|
| Curvature RMS | 0.00011825 |
| D1 RMS | 0.0020168 |
| RMS to mean | 0.04301 |
| RMS p95 | 0.12254 |
| SAM to mean | 0.048416 |
| SAM p95 | 0.10745 |
| Affine offset p95 | 0.079443 |
| Affine gain p95 Δ | 0.1589 |
| Affine residual p95 | 0.05105 |
| Xcorr lag p95 | 0 |
| PCA Q p95/median | 3.2 |
|---|---|
| Hotelling T2 p95/median | 4.3 |
| Mahalanobis H p95/median | 2.1 |
| Repeat groups | 0 |
| Effective rank | 2 |
|---|---|
| PCs → 95% var | 3 |
| PCs → 99% var | 6 |
| Top-10 cum. var | 99.8% |
| Famille | Métrique calculée | Valeur | Score | Niveau | Interprétation dataset | Causes typiques | Calcul / scoring |
|---|---|---|---|---|---|---|---|
| Intégrité des données | NaN ratiointegrity.nan_ratio | 0% | 0.00 | faible | Spectre complet | Erreur acquisition/export | count(isnan(X)) / X.sizealert = min(1, nan_ratio / 0.05) |
| Intégrité des données | Inf countintegrity.inf_count | 0 | 0.00 | faible | Normal | Calculs invalides | count(isinf(X))alert = min(1, inf_count / 1) |
| Intégrité des données | Zero ratiointegrity.zero_ratio | 0% | 0.00 | faible | Normal | Export, saturation | count(X == 0) / count(finite X)alert = min(1, zero_ratio / 0.05) |
| Amplitude globale | Mean reflectanceamplitude.mean_reflectance | 0.48621 | 0.18 | faible | Trop sombre | Fond, géométrie | mean(X finite)alert reuses baseline/shape drift because absolute reflectance ranges are technology-dependent |
| Amplitude globale | Area under curveamplitude.area_under_curve | 1021.4 | 0.18 | faible | Normal | Distance sonde | trapezoid(mean_spectrum, spectral_axis)alert reuses baseline/shape drift because area scale depends on axis and units |
| Amplitude globale | Peak-to-peak (PTP)amplitude.peak_to_peak | 0.8011 | 0.00 | faible | Variabilité forte | Saturation | max(mean_spectrum) - min(mean_spectrum)alert increases when dynamic range is abnormally flat |
| Amplitude globale | Varianceamplitude.variance | 0.076824 | 0.00 | faible | Normal ou hétérogène | Mauvais contact | var(X finite)alert increases when variance/dynamic range is abnormally flat |
| Bruit | Noise RMSnoise.noise_rms | 1.1581e-05 | 0.00 | faible | Stable | Lampe, détecteur | median MAD(second derivative) * 1.4826 / sqrt(6)alert = noise_rms / signal_scale, saturated at 5% |
| Bruit | SNRnoise.snr | 41984 | 0.00 | faible | Bon signal | Acquisition | mean(abs(X)) / noise_rmsalert decreases with SNR dB; >=40 dB is treated as low alert |
| Bruit | Bandwise SNRnoise.bandwise_snr_min | 113.62 | 0.00 | faible | Zone fiable | Détecteur | min(abs(mean_spectrum) / local second-derivative noise)alert decreases with worst-band SNR dB; >=35 dB is treated as low alert |
| Artefacts locaux | Spike countartefacts.spike_count | 189,107 | 1.00 | fort | Artefacts | Cosmic rays, splice | count robust outliers in second derivativealert follows spike_rate, saturated at 1% |
| Artefacts locaux | Spike rateartefacts.spike_rate | 4.33% | 1.00 | fort | Spectre suspect | Interpolation | spike_count / (n_samples * (n_features - 2))alert = min(1, spike_rate / 0.01) |
| Artefacts locaux | Jump countartefacts.jump_count | 192,921 | 1.00 | fort | Raccord détecteur | Splice | count robust outliers in first derivativealert follows jump_rate, saturated at 1% |
| Artefacts locaux | Jump rateartefacts.jump_rate | 4.42% | 1.00 | fort | Problème spectral | Calibration | jump_count / (n_samples * (n_features - 1))alert = min(1, jump_rate / 0.01) |
| Artefacts locaux | Clip fractionartefacts.clip_fraction | 4.58e-05% | 0.00 | faible | Normal | Détecteur saturé | fraction of finite cells equal to repeated min/max extremaalert = min(1, clip_fraction / 0.01) |
| Forme spectrale | Baseline slopeshape.baseline_slope | -0.025943 | 0.06 | faible | Stable | Éclairement | linear slope of mean_spectrum over normalized axisalert = abs(slope / signal_scale), saturated at 0.5 |
| Forme spectrale | Curvature RMSshape.curvature_rms | 0.00011825 | 0.01 | faible | Lisse | Fond, splice | median RMS(second derivative per spectrum)alert = curvature_rms / signal_scale, saturated at 1% |
| Forme spectrale | D1 RMSshape.d1_rms | 0.0020168 | 0.05 | faible | Plat | Biologie ou artefact | median RMS(first derivative per spectrum)alert = d1_rms / signal_scale, saturated at 5% |
| Outliers multivariés | PCA Q (SPE)outliers.pca_q_ratio | 3.2137 | 0.40 | faible | Conforme | Artefact, mélange | p95(Q/SPE residual) / median(Q/SPE residual)alert = min(1, pca_q_ratio / 8) |
| Outliers multivariés | Hotelling T²outliers.hotelling_t2_ratio | 4.3113 | 0.54 | moyen | Extrême mais cohérent | Variabilité naturelle | p95(Hotelling T2) / median(Hotelling T2)alert = min(1, hotelling_t2_ratio / 8) |
| Outliers multivariés | Mahalanobis Houtliers.mahalanobis_h_ratio | 2.0764 | 0.52 | moyen | Outlier global | Domaine différent | p95(sqrt(T2)) / median(sqrt(T2))alert = min(1, mahalanobis_h_ratio / 4) |
| Comparaison à référence | RMS to mean spectrumreference.rms_to_mean_spectrum_p95 | 0.12254 | 0.61 | moyen | Spectre différent | Domain shift | p95 RMS distance to dataset mean spectrumalert = RMS_p95 / signal_scale, saturated at 25% |
| Comparaison à référence | Spectral Angle Mapper (SAM)reference.sam_to_mean_spectrum_p95 | 0.10745 | 0.31 | faible | Similaire | Fond, géométrie | p95 spectral angle to dataset mean spectrumalert = min(1, SAM_p95 / 0.35 rad) |
| Répétabilité | RMS intra-IDrepeatability.rms_intra_id | — | 0.00 | faible | Stable | Positionnement | median RMS distance to repeated-sample centroidalert = RMS_intra_ID / signal_scale, saturated at 10% |
| Répétabilité | SAM intra-IDrepeatability.sam_intra_id | — | 0.00 | faible | Stable | Acquisition | median SAM to repeated-sample centroidalert = min(1, SAM_intra_ID / 0.15 rad) |
| Répétabilité | CV intra-IDrepeatability.cv_intra_id | — | 0.00 | faible | Stable | Opérateur | median within-ID band CValert = min(1, CV_intra_ID / 0.25) |
| Structure du dataset | PCA score densitystructure.pca_score_density | 2.5152 | 0.89 | fort | Sous-populations | Lots différents | 1 / median kNN distance in PCA score spacealert follows density_cv/profile structure complexity, not raw density alone |
| Structure du dataset | Local Outlier Factor (LOF)structure.local_outlier_factor_p95 | 2.6731 | 0.84 | fort | Spectre isolé | Cas rares | p95 approximate LOF from PCA-score kNN distancesalert = min(1, max(0, LOF_p95 - 1) / 2) |
| Structure du dataset | Isolation Forest scorestructure.isolation_forest_score_p95 | 0.57491 | 0.89 | fort | Spectre atypique | Diverses causes | p95 IsolationForest anomaly score on PCA scoresalert follows structure complexity; raw score is implementation-dependent |
| Target | max |r| | axis @ max | mean |r| | |r| ≥ .5 |
|---|---|---|---|---|
| WQ | 0.618 | 2,252 | 0.313 | 28.9% |
| rh_s | 0.431 | 2,186 | 0.258 | 0.0% |
| rh_r | 0.86 | 802 | 0.583 | 59.7% |
| Tref | 0.809 | 856 | 0.539 | 59.3% |
| Tleaf | 0.376 | 1,662 | 0.237 | 0.0% |
| Qamb | 0.407 | 1,667 | 0.322 | 0.0% |
| Famille | Métrique | Ce qu’elle détecte | Forte valeur = | Faible valeur = | Causes typiques | Calcul / score |
|---|---|---|---|---|---|---|
| Intégrité des données | NaN ratio | Données manquantes | Spectre corrompu | Spectre complet | Erreur acquisition/export | count(isnan(X)) / X.sizealert = min(1, nan_ratio / 0.05) |
| Intégrité des données | Inf count | Valeurs infinies | Corruption | Normal | Calculs invalides | count(isinf(X))alert = min(1, inf_count / 1) |
| Intégrité des données | Zero ratio | Colonnes ou cellules nulles | Spectre tronqué | Normal | Export, saturation | count(X == 0) / count(finite X)alert = min(1, zero_ratio / 0.05) |
| Amplitude globale | Mean reflectance | Niveau moyen | Trop clair / fond visible | Trop sombre | Fond, géométrie | mean(X finite)alert reuses baseline/shape drift because absolute reflectance ranges are technology-dependent |
| Amplitude globale | Area under curve | Intensité globale | Différence d'éclairement | Normal | Distance sonde | trapezoid(mean_spectrum, spectral_axis)alert reuses baseline/shape drift because area scale depends on axis and units |
| Amplitude globale | Peak-to-peak (PTP) | Dynamique | Variabilité forte | Spectre plat | Saturation | max(mean_spectrum) - min(mean_spectrum)alert increases when dynamic range is abnormally flat |
| Amplitude globale | Variance | Variabilité spectrale | Normal ou hétérogène | Spectre plat | Mauvais contact | var(X finite)alert increases when variance/dynamic range is abnormally flat |
| Bruit | Noise RMS | Bruit haute fréquence | Bruité | Stable | Lampe, détecteur | median MAD(second derivative) * 1.4826 / sqrt(6)alert = noise_rms / signal_scale, saturated at 5% |
| Bruit | SNR | Qualité signal | Bon signal | Mauvais signal | Acquisition | mean(abs(X)) / noise_rmsalert decreases with SNR dB; >=40 dB is treated as low alert |
| Bruit | Bandwise SNR | Bruit localisé | Zone fiable | Zone problématique | Détecteur | min(abs(mean_spectrum) / local second-derivative noise)alert decreases with worst-band SNR dB; >=35 dB is treated as low alert |
| Artefacts locaux | Spike count | Pics étroits | Artefacts | Spectre propre | Cosmic rays, splice | count robust outliers in second derivativealert follows spike_rate, saturated at 1% |
| Artefacts locaux | Spike rate | Densité de pics | Spectre suspect | Normal | Interpolation | spike_count / (n_samples * (n_features - 2))alert = min(1, spike_rate / 0.01) |
| Artefacts locaux | Jump count | Discontinuités | Raccord détecteur | Continu | Splice | count robust outliers in first derivativealert follows jump_rate, saturated at 1% |
| Artefacts locaux | Jump rate | Fréquence de sauts | Problème spectral | Normal | Calibration | jump_count / (n_samples * (n_features - 1))alert = min(1, jump_rate / 0.01) |
| Artefacts locaux | Clip fraction | Saturation | Clipping | Normal | Détecteur saturé | fraction of finite cells equal to repeated min/max extremaalert = min(1, clip_fraction / 0.01) |
| Forme spectrale | Baseline slope | Pente globale | Dérive | Stable | Éclairement | linear slope of mean_spectrum over normalized axisalert = abs(slope / signal_scale), saturated at 0.5 |
| Forme spectrale | Curvature RMS | Courbure | Forme inhabituelle | Lisse | Fond, splice | median RMS(second derivative per spectrum)alert = curvature_rms / signal_scale, saturated at 1% |
| Forme spectrale | D1 RMS | Variabilité locale | Spectre structuré | Plat | Biologie ou artefact | median RMS(first derivative per spectrum)alert = d1_rms / signal_scale, saturated at 5% |
| Outliers multivariés | PCA Q (SPE) | Non expliqué par PCA | Spectre atypique | Conforme | Artefact, mélange | p95(Q/SPE residual) / median(Q/SPE residual)alert = min(1, pca_q_ratio / 8) |
| Outliers multivariés | Hotelling T² | Extrême dans PCA | Extrême mais cohérent | Central | Variabilité naturelle | p95(Hotelling T2) / median(Hotelling T2)alert = min(1, hotelling_t2_ratio / 8) |
| Outliers multivariés | Mahalanobis H | Distance au nuage | Outlier global | Population normale | Domaine différent | p95(sqrt(T2)) / median(sqrt(T2))alert = min(1, mahalanobis_h_ratio / 4) |
| Comparaison à référence | RMS to mean spectrum | Distance moyenne | Spectre différent | Typique | Domain shift | p95 RMS distance to dataset mean spectrumalert = RMS_p95 / signal_scale, saturated at 25% |
| Comparaison à référence | Spectral Angle Mapper (SAM) | Différence de forme | Forme différente | Similaire | Fond, géométrie | p95 spectral angle to dataset mean spectrumalert = min(1, SAM_p95 / 0.35 rad) |
| Répétabilité | RMS intra-ID | Reproductibilité | Mauvaise répétabilité | Stable | Positionnement | median RMS distance to repeated-sample centroidalert = RMS_intra_ID / signal_scale, saturated at 10% |
| Répétabilité | SAM intra-ID | Variation de forme | Instable | Stable | Acquisition | median SAM to repeated-sample centroidalert = min(1, SAM_intra_ID / 0.15 rad) |
| Répétabilité | CV intra-ID | Variabilité interne | Mauvais contrôle | Stable | Opérateur | median within-ID band CValert = min(1, CV_intra_ID / 0.25) |
| Structure du dataset | PCA score density | Clusters | Sous-populations | Homogène | Lots différents | 1 / median kNN distance in PCA score spacealert follows density_cv/profile structure complexity, not raw density alone |
| Structure du dataset | Local Outlier Factor (LOF) | Anomalie locale | Spectre isolé | Population normale | Cas rares | p95 approximate LOF from PCA-score kNN distancesalert = min(1, max(0, LOF_p95 - 1) / 2) |
| Structure du dataset | Isolation Forest score | Anomalie globale | Spectre atypique | Normal | Diverses causes | p95 IsolationForest anomaly score on PCA scoresalert follows structure complexity; raw score is implementation-dependent |
| Technologie | Adaptations / métriques | Anomalies ciblées | Commentaire pratique |
|---|---|---|---|
| UV-Vis 300-1000 nm | Baseline, pente globale, dérive aux bords 300-350 et 900-1000; métriques par zones | Lumière parasite, mauvais blanc, saturation, faible signal aux extrémités | Les bords sont souvent instables; calculer aussi des scores edge/middle. |
| UV-Vis 300-1000 nm | Saturation / clipping proche absorbance max ou réflectance max | Signal écrêté | Très important si absorption forte. |
| UV-Vis 300-1000 nm | Red-edge, position de maximum, ratios de bandes si végétal | Décalage biologique ou artefact optique | Aide à distinguer changement réel et problème d'acquisition. |
| UV-Vis 300-1000 nm | Smoothness / roughness index | Bruit haute fréquence | Souvent plus informatif que le SNR seul. |
| MIR / ATR-FTIR | ATR contact quality index: intensité globale, aire totale, profondeur des bandes clés | Mauvais contact cristal-échantillon | Crucial: beaucoup d'anomalies viennent du contact ATR. |
| MIR / ATR-FTIR | CO2 / H2O atmospheric bands | Mauvaise correction atmosphérique | Pics parasites fréquents. |
| MIR / ATR-FTIR | Baseline curvature / rubber-band residual | Diffusion, contact, dérive baseline | Très utile avant PCA. |
| MIR / ATR-FTIR | Peak position shift | Mauvais alignement spectral / calibration | Important en FTIR car de petits shifts comptent. |
| MIR / ATR-FTIR | Band area ratios sur bandes connues | Spectre chimiquement incohérent | À adapter par matrice: polysaccharides, protéines, lipides, etc. |
| HS-MS | Total Ion Current (TIC), Base Peak Intensity (BPI) | Injection faible, ionisation instable | Équivalent MS du niveau global spectral. |
| HS-MS | Nombre de pics détectés | Spectre pauvre ou trop bruité | Trop peu = mauvais signal; trop = bruit/contamination. |
| HS-MS | Mass accuracy / m/z drift | Problème calibration masse | Fondamental en HRMS. |
| HS-MS | Retention time drift si LC/GC-MS | Dérive chromatographique | À suivre sur standards/QC pools. |
| HS-MS | Blank contamination score | Contaminants / carry-over | Comparer échantillons vs blancs. |
| HS-MS | Internal standard CV | Variabilité instrumentale | Très robuste si standards disponibles. |
| HS-MS | Missingness par feature | Instabilité de détection | Crucial pour filtrer les variables. |
| Avec répétitions | RMS intra-échantillon | Répétabilité globale | Applicable à toutes les technologies. |
| Avec répétitions | SAM / corrélation intra-échantillon | Répétabilité de forme | Très utile pour spectres. |
| Avec répétitions | CV intra-échantillon par bande / feature | Répétabilité locale | Détecte les zones instables. |
| Avec répétitions | ICC ou variance components | Part variance échantillon vs technique | Très utile si plusieurs répétitions par sample. |
| Avec répétitions | Distance au centroïde intra-ID | Répétition aberrante | Permet de flagger la mauvaise répétition plutôt que le sample entier. |
| Famille de bug potentiel | Méthodes à ajouter | Ce que ça détecte | État dans l’explorateur |
|---|---|---|---|
| Shift spectral global | Corrélation spectre moyen inter-dataset, DTW, cross-correlation, comparaison positions de pics | Décalage en longueur d'onde, mauvais alignement, interpolation différente | Partiellement calculé: cross-correlation lag et dispersion des positions de pics vs spectre moyen. |
| Baseline / offset / gain | Régression chaque spectre vs spectre moyen: x = a + b ref + residual; suivi de a, b, RMS résiduel | Offset additif, effet multiplicatif, dérive de baseline | Calculé dans reference.affine_*. |
| Mélange de lignes / mauvais appariement X-M-Y | Vérification index, hash des lignes, duplication ID, distance spectrale intra-ID, labels incohérents | Lignes mélangées, metadata mal alignées, Y attribué au mauvais spectre | Partiellement couvert par répétabilité intra-ID; checks index/hash à ajouter au pipeline canonical. |
| Fuite d'information / répétitions mal splitées | GroupKFold par sample_id vs StratifiedKFold random; audit des partitions par sample_id | Performance artificiellement bonne due aux répétitions | Nécessite splits et benchmark modèle; non calculé par la carte descriptive. |
| Label bugs | Échantillons proches en X mais Y différents, confident learning, erreurs systématiques FP/FN | Y inversés, erreurs de saisie, classes ambiguës | Nécessite Y et/ou modèle; recommandé pour l'explorateur supervisé. |
| Sous-domaines cachés | PCA/UMAP/t-SNE + clustering non supervisé + association avec dataset/Y/date/operator | Lots, campagnes, sondes, backgrounds non renseignés | Partiellement calculé par structure PCA/LOF; UMAP/t-SNE hors carte statique. |
| Artefacts localisés inconnus | Carte wavelength x dataset: différence moyenne, différence variance, KS par longueur d'onde | Régions spectrales anormales non anticipées | À calculer au niveau banque quand plusieurs datasets partagent un axe spectral. |
| Ruptures instrumentales | Discontinuités dans dérivées, changepoint detection | Splice, raccord détecteur, saut local non prévu | Calculé par jump/spike rates; changepoint plus avancé à ajouter. |
| Mélange / contamination spectrale | NMF / unmixing / reconstruction par convex hull | Composante externe: fond, plastique, sol | Non calculé automatiquement; nécessite hypothèses de composants ou grande bibliothèque. |
| Features instables mais prédictives | Importance modèle vs instabilité QC par variable | Modèle qui apprend un artefact plutôt qu'un signal biologique | Nécessite modèle supervisé; recommandé pour rapports de benchmark. |
| n / missing | 2,079 / 0 |
|---|---|
| Classes | 1,168 |
| Balance (entropy) | 0.97 |
| Imbalance ratio | 11 |
| Top class | 4,83831026999362 (11) |
| n / missing | 2,079 / 643 |
|---|---|
| Classes | 1,423 |
| Balance (entropy) | 1 |
| Imbalance ratio | 2 |
| Top class | 0,730133752950433 (2) |
| n / missing | 2,079 / 1,993 |
|---|---|
| Mean ± SD | 88.62 ± 15 |
| Median | 87 |
| Range | 61 – 137 |
| CV | 0.17 |
| Skew / kurtosis | 0.43 / 0.56 |
| Normal? | yes |
| n / missing | 2,079 / 1,972 |
|---|---|
| Classes | 107 |
| Balance (entropy) | 1 |
| Imbalance ratio | 1 |
| Top class | 1,639896968 (1) |
| n / missing | 2,079 / 1,974 |
|---|---|
| Classes | 102 |
| Balance (entropy) | 1 |
| Imbalance ratio | 2 |
| Top class | -0,637 (2) |
| n / missing | 2,079 / 1,975 |
|---|---|
| Classes | 104 |
| Balance (entropy) | 1 |
| Imbalance ratio | 1 |
| Top class | 183,344392601678 (1) |
| n / missing | 2,079 / 1,975 |
|---|---|
| Classes | 104 |
| Balance (entropy) | 1 |
| Imbalance ratio | 1 |
| Top class | 11,3140818666648 (1) |
| n / missing | 2,079 / 1 |
|---|---|
| Classes | 2,074 |
| Balance (entropy) | 1 |
| Imbalance ratio | 2 |
| Top class | 0,3240405 (2) |
| n / missing | 2,079 / 1 |
|---|---|
| Classes | 1,739 |
| Balance (entropy) | 0.99 |
| Imbalance ratio | 5 |
| Top class | 2,918605 (5) |
| n / missing | 2,079 / 1 |
|---|---|
| Classes | 2,072 |
| Balance (entropy) | 1 |
| Imbalance ratio | 2 |
| Top class | 0,0147365 (2) |
| n / missing | 2,079 / 1 |
|---|---|
| Classes | 2,077 |
| Balance (entropy) | 1 |
| Imbalance ratio | 2 |
| Top class | 0,0002330015 (2) |
| n / missing | 2,079 / 1 |
|---|---|
| Classes | 2,077 |
| Balance (entropy) | 1 |
| Imbalance ratio | 2 |
| Top class | 2,3235295 (2) |
| n / missing | 2,079 / 1 |
|---|---|
| Classes | 2,076 |
| Balance (entropy) | 1 |
| Imbalance ratio | 2 |
| Top class | 2,5061955 (2) |
| n / missing | 2,079 / 1 |
|---|---|
| Classes | 2,077 |
| Balance (entropy) | 1 |
| Imbalance ratio | 2 |
| Top class | 4,5957205 (2) |
| n / missing | 2,079 / 1 |
|---|---|
| Classes | 2,075 |
| Balance (entropy) | 1 |
| Imbalance ratio | 2 |
| Top class | 2,441038 (2) |
| n / missing | 2,079 / 1 |
|---|---|
| Classes | 2,077 |
| Balance (entropy) | 1 |
| Imbalance ratio | 2 |
| Top class | 22,7950135 (2) |
| n / missing | 2,079 / 1 |
|---|---|
| Classes | 2,077 |
| Balance (entropy) | 1 |
| Imbalance ratio | 2 |
| Top class | 22,859375 (2) |
| n / missing | 2,079 / 1 |
|---|---|
| Classes | 2,077 |
| Balance (entropy) | 1 |
| Imbalance ratio | 2 |
| Top class | 45,213667 (2) |
| n / missing | 2,079 / 1 |
|---|---|
| Classes | 2,076 |
| Balance (entropy) | 1 |
| Imbalance ratio | 2 |
| Top class | 110,3550185 (2) |
| n / missing | 2,079 / 1 |
|---|---|
| Classes | 2,077 |
| Balance (entropy) | 1 |
| Imbalance ratio | 2 |
| Top class | 108,7147905 (2) |
| n / missing | 2,079 / 1 |
|---|---|
| Classes | 2,075 |
| Balance (entropy) | 1 |
| Imbalance ratio | 2 |
| Top class | 0,3937105 (2) |
| n / missing | 2,079 / 1 |
|---|---|
| Classes | 2,077 |
| Balance (entropy) | 1 |
| Imbalance ratio | 2 |
| Top class | 57,317551 (2) |
| n / missing | 2,079 / 2,071 |
|---|---|
| Mean ± SD | 47.38 ± 7.56 |
| Median | 45 |
| Range | 38 – 59 |
| CV | 0.16 |
| Skew / kurtosis | 0.33 / -1.2 |
| Normal? | yes |
| n / missing | 2,079 / 2,072 |
|---|---|
| Mean ± SD | 47.43 ± 8.7 |
| Median | 48 |
| Range | 33 – 58 |
| CV | 0.183 |
| Skew / kurtosis | -0.51 / -0.25 |
| n / missing | 2,079 / 2,071 |
|---|---|
| Mean ± SD | 32.25 ± 1.16 |
| Median | 32 |
| Range | 31 – 34 |
| CV | 0.0361 |
| Skew / kurtosis | 0.81 / -0.5 |
| Normal? | yes |
| n / missing | 2,079 / 2,066 |
|---|---|
| Mean ± SD | 32.85 ± 1.41 |
| Median | 33 |
| Range | 31 – 35 |
| CV | 0.0428 |
| Skew / kurtosis | 0.11 / -1 |
| Normal? | yes |
| n / missing | 2,079 / 2,045 |
|---|---|
| Mean ± SD | 101 ± 0 |
| Median | 101 |
| Range | 101 – 101 |
| CV | 0 |
| n / missing | 2,079 / 1,045 |
|---|---|
| Mean ± SD | 913.1 ± 470 |
| Median | 842 |
| Range | 131 – 1,932 |
| CV | 0.515 |
| Skew / kurtosis | 0.42 / -0.85 |
| Normal? | no |
| n / missing | 2,079 / 1 |
|---|---|
| Classes | 2,074 |
| Balance (entropy) | 1 |
| Imbalance ratio | 2 |
| Top class | 0,3240405 (2) |
| n / missing | 2,079 / 536 |
|---|---|
| Classes | 246 |
| Balance (entropy) | 1 |
| Imbalance ratio | 13 |
| Top class | V1576 (13) |
| n / missing | 2,079 / 0 |
|---|---|
| Classes | 1 |
| Balance (entropy) | 0 |
| Imbalance ratio | 1 |
| Top class | not_available_in_grapevine_source (2,079) |
| n / missing | 2,079 / 0 |
|---|---|
| Classes | 2,079 |
| Balance (entropy) | 1 |
| Imbalance ratio | 1 |
| Top class | 4005_20/07/21 (1) |
| n / missing | 2,079 / 0 |
|---|---|
| Classes | 2 |
| Balance (entropy) | 0.82 |
| Imbalance ratio | 3 |
| Top class | Greenhouse (1,543) |
| n / missing | 2,079 / 0 |
|---|---|
| Classes | 2 |
| Balance (entropy) | 0.29 |
| Imbalance ratio | 2e+01 |
| Top class | HT (1,972) |
| n / missing | 2,079 / 0 |
|---|---|
| Classes | 2 |
| Balance (entropy) | 0.006 |
| Imbalance ratio | 2,078 |
| Top class | True (2,078) |
| Alignment level | observation |
|---|---|
| Sample id available | no |
| Samples | 2,079 |
| Observations (total) | 8,316 |
| Reps per sample | min 1 · mean 1 · max 1 |
| original | historical_splits_documented_not_applied: 2,079 documented · not applied |
|---|
| Contributor | GRAPEVINE LeafTraits local source files |
|---|---|
| Origin · script [manual] | source_to_standard.py — standardization script (maintainer-only) |
| Tier | private |
|---|---|
| License | LicenseRef-not-cleared |
| Permitted use | Research and benchmarking; private use only. |
| Access policy | Manual download / private-use-only per source. |
| Redistribution | Rights retained from source metadata; review before public redistribution. |
| Content version | 1.0.0 |
| Schema / protocol | 2.0 |
| Content hash | 2b36cd90db0658ae… |
| Processing hash | 22f21b9ac57b6f69… |
| Metadata hash | bda8a3d59ea36308… |
# pip install nirs4all-datasets from nirs4all_datasets import get # private dataset — export requires a Dataverse token ds = get("grapevine_leaftraits_multisensor_nir", token="…") X, y = ds.x(), ds.y() print(X.shape, y.shape)
Metadata downloads are available for public datasets only. The dataset bytes are never served here — fetch them from the origin / DOI above.