Datasets
ASTRAI datasets are grouped into three categories: Physical Simulations (A),
Real Supernova Observations (B), and GenAI Simulator (C).
Each dataset comes with a minimal open sample (CSV, 5 rows) for quick inspection and the full files in compact formats (e.g. .parquet) for research use.
FAIR artefacts (metadata, README, provenance, dictionary, and citation) are being added incrementally and are clearly marked below.
Need the full files? See “Licence & Citation” for terms and preferred citation, then follow the repository or the contact instructions where noted.
A. Physical Simulations
A1. Physical models (4-parameters)
Overview. Semi-analytic explosions with four core parameters controlling energy, radius, ejecta mass, and radioactive contribution. The release includes time-dependent observables plus generation metadata.
Intended use: parameter-recovery benchmarks, uncertainty calibration, and sanity checks against semi-analytic expectations.
Primary files
Last updated: 2025-08-23
Preview (CSV)
First 5 rows from a tiny sample file for curves; the full dataset is available via the csv download above.
First 5 rows from a tiny sample file for parameters; the full dataset is available via the csv download above.
FAIR artefacts (status)
- Metadata record· metadata.json
- README· README.md
- Data dictionary· dictionary.csv
- Provenance & methods· provenance.md
- Licensing & citation· LICENCE · citation.txt
A2. Physical models (7-parameters)
Overview. An expanded physical grid with seven parameters to capture a broader range of progenitor and CSM scenarios. Useful for ablation studies and robustness tests when moving from compact to richer physical descriptions.
Intended use: stress-testing model generalisation and examining parameter identifiability under increased realism.
Primary files
Last updated:
Preview (CSV)
First 5 rows from a tiny sample file for curves; the full dataset is available via the .parquet download above.
First 5 rows from a tiny sample file for parameters; the full dataset is available via the .parquet download above.
FAIR artefacts (status)
- Metadata record · metadata.json
- README · README.md
- Data dictionary · dictionary.csv
- Provenance & methods · provenance.md
- Licensing & citation · LICENCE · citation.txt
B. Real Supernova Observations
B1. Real observations
Overview. Curated multi-band observations with cadence, passbands, uncertainty model, and basic quality flags. Where possible, records are mapped to standard naming and include minimal provenance notes.
Intended use: evaluating methods trained on synthetic data, domain shift studies, and end-to-end validation on real light curves.
Primary files
Last updated:
Preview (CSV)
First 5 rows from a tiny sample file.
FAIR artefacts (status)
- Metadata record · metadata.json
- README · README.md
- Data dictionary · dictionary.csv
- Provenance & methods · provenance.md
- Licensing & citation · LICENCE
C. GenAI Simulator
C1. Synthetic GenAI data
Overview. Synthetic light curves produced by the ASTRAI GenAI model to augment coverage where observations are sparse.
Paired .parquet files provide light time series; releases will ship with transparent generation settings.
Intended use: data augmentation, controlled experiments on cadence/noise, and benchmarking generalisation.
Primary files
Last updated:
Preview (CSV)
First 5 rows from a tiny sample file.
FAIR artefacts (status)
- Metadata record · metadata.json
- README · README.md
- Data dictionary · dictionary.csv
- Provenance & methods · provenance.md
- Licensing & citation · LICENCE
Source Code
Heads up: FAIR artefacts are being published in stages. Items marked “Coming soon” will appear in the next updates; “External” links point to project-controlled sources (e.g., GitHub or a data catalogue) when appropriate.
ASTRAI Core Repository
Planned contents
- Training & evaluation scripts
- Model architectures (physical + GenAI variants)
- Data loaders and preprocessing utilities
- Reproducible configs and example notebooks
Publications
This section lists journal & conference submissions, technical diagrams/notes, and selected
Journal & Conference Submissions
-
A Dual AI Framework for the Automatic Characterisation of low-interacting H-rich SNe within the ASTRAI (Advanced Supernova Transient Research with Artificial Intelligence) project
Abstract Link to ScienceDirect
Show short note
Scope: early results on physical-model light curves; baselines & GenAI augmentation plan.
Diagrams & Technical Notes
-
Four panes show how Radius, Mass, Energy and Nickel mass shape light-curve morphology in the 4-parameter synthetic dataset.
-
Mean effect of ±0.5/1.0/1.5 σ parameter variations on the LC for the 7-parameter model, with a zoom on the first 10 days after the explosion.
-
Log-log variance of each parameter's contribution to the LC across epochs (7-parameter model), highlighting which features are constrained at early vs late times.
-
Decomposition of the observation-mask pipeline (daylight, Moon, Sun, weather, combined mask, sampling and lag) used to emulate realistic LSST cadence.
-
Synthetic LC after undersampling and corruption, fed through PPReg → LCGen to obtain the reconstructed curve.
-
End-to-end schema: Original LC → Preproc/Augment → PPReg → LCGen → Generated LC, with characterisation, generation and reconstruction losses.
-
Detailed view of the residual MLP blocks (Linear + LeakyReLU + BatchNorm) chosen for the PPReg (left) and LCGen (right) after the hyperparameter search.
-
Single multi-target PPReg variant: one MLP regresses all physical parameters at once before LCGen.
-
Ensemble PPReg variant: one specialised MLP per physical parameter, then aggregated into the LCGen input.
-
Per-epoch RMSE for generation, reconstruction and reconstruction+noise pipelines, showing higher error around the peak and stable accuracy on the tail.
-
Char→Gen reconstruction of SN 2000cb against observed bolometric LC, with residuals and inferred 4-parameter values.
-
Char→Gen reconstruction of SN 2006au against observed bolometric LC, with residuals and inferred parameters.
-
Char→Gen reconstruction of SN 2006V against observed bolometric LC, with residuals and inferred parameters.
-
Char→Gen reconstruction of SN 2009E against observed bolometric LC, with residuals and inferred parameters.
-
Char→Gen reconstruction of SN 2018hna against observed bolometric LC, with residuals and inferred parameters.
-
Char→Gen reconstruction of SN 2020bah against observed bolometric LC, with residuals and inferred parameters.
-
Char→Gen reconstruction of SN 2021aatd against observed bolometric LC, with residuals and inferred parameters.
-
Char→Gen reconstruction of SN 2021wun against observed bolometric LC, with residuals and inferred parameters.
Licence & Citation
To support ethical reuse and proper attribution, ASTRAI provides default licensing and citation templates for datasets and software.
Important: if a dataset or repository includes its own LICENSE, citation.txt, or DOI,
that local file overrides the defaults below. Always prefer the per-item files when present.
If you adapt the datasets or code, indicate changes and, where practical, link back to this hub so others can find the original materials.
Licence & how to cite
Licence (default): Creative Commons Attribution 4.0 International (CC BY 4.0). You must provide appropriate credit and indicate if changes were made. Read the licence.
Recommended citation (plain text)
Andrea Claudio Grasso, Vincenzo Del Zoppo, Alessio Mezzina, Marco Cataldo, Giuseppe Puglisi, Fabio Spampinato, Stefano Pio Cosentino, Maria Letizia Pumo, Luca Naso,
A dual AI framework for the automatic characterisation of low-interacting H-rich SNe within the ASTRAI (Advanced Supernova Transient Research with Artificial Intelligence) project,
Astronomy and Computing,
Volume 56,
2026,
101116,
ISSN 2213-1337,
https://doi.org/10.1016/j.ascom.2026.101116.
(https://www.sciencedirect.com/science/article/pii/S2213133726000582)
Abstract: Rapidly increasing data volumes from high cadence surveys, and the still larger streams anticipated from the Legacy Survey of Space and Time (LSST), pose a major challenge for the timely and accurate analysis of transient phenomena such as core-collapse Supernovae (CC SNe). Traditional approaches based on semi-analytical or hydrodynamical models, while physically grounded, hinder the large-scale characterisation of these transient events. We present a novel machine learning approach, focusing on low-interacting hydrogen-rich (H-rich) SNe. This approach encompasses two stages: a generative model produces synthetic light curves (LCs) to augment sparse or incomplete datasets, and deep learning models characterise LCs by regressing fundamental physical parameters. The characterisation stage uses an ensemble of neural networks, each specialised to regress a single parameter, improving interpretability, modularity and robustness. To account for observational systematics, we employ data augmentation strategies that mimic realistic noise and cadence conditions. Benchmarking against semi-analytical and hydrodynamical simulations demonstrates that the proposed generative method recovers LCs with error comparable to or smaller than observational noise, while achieving inference speeds orders of magnitude faster than traditional modelling pipelines (with a throughput beyond 105 LCs per second on a single GPU hardware). This could enable large-scale characterisation of H-rich SNe within modern and upcoming survey data streams, preparing the scientific community to handle the massive LCs flow expected from LSST.
Keywords: Supernovae; Transient astrophysics; LSST; Light curves; Deep learning; GenAI; Synthetic
dataURL: https://astrai.koexai.com/resources/ Licence: CC BY 4.0.
BibTeX paper
@article{GRASSO2026101116,
title = {A dual AI framework for the automatic characterisation of low-interacting H-rich SNe within the ASTRAI (Advanced Supernova Transient Research with Artificial Intelligence) project},
journal = {Astronomy and Computing},
volume = {56},
pages = {101116},
year = {2026},
issn = {2213-1337},
doi = {https://doi.org/10.1016/j.ascom.2026.101116},
url = {https://www.sciencedirect.com/science/article/pii/S2213133726000582},
author = {Andrea Claudio Grasso and Vincenzo {Del Zoppo} and Alessio Mezzina and Marco Cataldo and Giuseppe Puglisi and Fabio Spampinato and Stefano Pio Cosentino and Maria Letizia Pumo and Luca Naso},
keywords = {Supernovae, Transient astrophysics, LSST, Light curves, Deep learning, GenAI, Synthetic data},
abstract = {Rapidly increasing data volumes from high cadence surveys, and the still larger streams anticipated from the Legacy Survey of Space and Time (LSST), pose a major challenge for the timely and accurate analysis of transient phenomena such as core-collapse Supernovae (CC SNe). Traditional approaches based on semi-analytical or hydrodynamical models, while physically grounded, hinder the large-scale characterisation of these transient events. We present a novel machine learning approach, focusing on low-interacting hydrogen-rich (H-rich) SNe. This approach encompasses two stages: a generative model produces synthetic light curves (LCs) to augment sparse or incomplete datasets, and deep learning models characterise LCs by regressing fundamental physical parameters. The characterisation stage uses an ensemble of neural networks, each specialised to regress a single parameter, improving interpretability, modularity and robustness. To account for observational systematics, we employ data augmentation strategies that mimic realistic noise and cadence conditions. Benchmarking against semi-analytical and hydrodynamical simulations demonstrates that the proposed generative method recovers LCs with error comparable to or smaller than observational noise, while achieving inference speeds orders of magnitude faster than traditional modelling pipelines (with a throughput beyond 105 LCs per second on a single GPU hardware). This could enable large-scale characterisation of H-rich SNe within modern and upcoming survey data streams, preparing the scientific community to handle the massive LCs flow expected from LSST.}
Tip: if a dataset provides its own citation.txt or DOI, please use that instead of the template above.
Software — Licence & how to cite
Licence (intended): gpl v3 license (to be confirmed in the repository).
A copy of the licence will be included as LICENSE in the repo.
About GPL v3.
Recommended software citation (plain text)
ASTRAI Project (2025). ASTRAI Core (v0.1) — Generative models and characterisation tools. Source code. URL: https://astrai.koexai.com/resources/ Licence: GPL V3.
Software BibTeX (template)
@software{astrai_core_v0_1_2025,
author = {Koexai Srl},
title = {ASTRAI Source Code},
year = {2025},
version = {0.1},
url = {https://astrai.koexai.com/resources/},
license = {GPL V3},
note = {Replace with repository URL and tag when public}
}
ASTRAI