
Anomaly Detection at Scale: From 30 Million to 1 Billion Sources
Abstract: Astrophysical anomalies – objects with unusual features, morphologies, or properties – are critical for understanding diverse processes including hierarchical assembly from mergers, environmental effects on galaxies, and cosmological effects such as lensing. To identify rare objects, we developed AnomalyMatch, a semi-supervised machine learning method successfully applied across multiple scales: discovering 57 gravitational lenses among 600,000 JWST sources, 61 jellyfish galaxies in 380,000 Euclid Q1 sources, ~1,000 new anomalies from 99.6 million sources in the Hubble Legacy Archive. We now aim to apply this approach to Euclid’s first data release, where the survey area expands from 63.1 to ~2,000 square degrees and source counts grow from ~30 million to ~1 billion. At this scale, conventional image creation and storage methods become impractical. We therefore introduce Cutana, a parallelized, memory-aware cutout creator generating thousands of images per second. We demonstrate how Cutana and AnomalyMatch work synergistically to enable anomaly detection at unprecedented scales and present our planned projects for Euclid DR1.