Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

IRSA Tutorials

Euclid Q1 Merged Objects HATS Catalog: Introduction

This tutorial is an introduction to the content and format of the Euclid Q1 Merged Objects HATS Catalog. Later tutorials in this series will show how to load quality samples. See Euclid Tutorial Notebooks: Catalogs for a list of tutorials in this series.

Learning Goals

In this tutorial, we will:

1. Introduction

The Euclid Q1 catalogs were derived from Euclid photometry and spectroscopy, taken by the Visible Camera (VIS) and the Near-Infrared Spectrometer and Photometer (NISP), and from photometry taken by other ground-based instruments. The data include several flux measurements per band, several redshift estimates, several morphology parameters, etc. Each was derived for different science goals using different algorithms or configurations.

The Euclid Q1 Merged Objects HATS Catalog was produced by IRSA by joining 14 of the original catalogs on object ID (column: object_id). Following the Hierarchical Adaptive Tiling Scheme HATS framework, the data were then partitioned spatially (by right ascension and declination) and written as an Apache Parquet dataset.

The catalog is served from an AWS S3 cloud storage bucket. Access is free and no credentials are required.

2. Imports

# # Uncomment the next line to install dependencies if needed.
# %pip install hpgeom pandas pyarrow
import hpgeom  # Find HEALPix indexes from RA and Dec
import pyarrow.compute as pc  # Filter the catalog
import pyarrow.dataset  # Load the catalog
import pyarrow.fs  # Simple S3 filesystem pointer
import pyarrow.parquet  # Load the schema

3. Load Parquet Metadata

First we’ll load the Parquet schema (column information) of the Merged Objects catalog so we can use it in later sections. The Parquet schema is accessible from a few locations, all of which include the column names and types. Here, we load it from the _common_metadata file because it also includes the column units and descriptions.

# AWS S3 paths.
s3_bucket = "nasa-irsa-euclid-q1"
dataset_prefix = "contributed/q1/merged_objects/hats/euclid_q1_merged_objects-hats/dataset"

dataset_path = f"{s3_bucket}/{dataset_prefix}"
schema_path = f"{dataset_path}/_common_metadata"

# S3 pointer. Use `anonymous=True` to access without credentials.
s3 = pyarrow.fs.S3FileSystem(anonymous=True)
# Load the Parquet schema.
schema = pyarrow.parquet.read_schema(schema_path, filesystem=s3)

# There are almost 1600 columns in this dataset.
print(f"{len(schema)} columns in the Euclid Q1 Merged Objects catalog")
1594 columns in the Euclid Q1 Merged Objects catalog

4. Merged Objects Catalog Contents

The Merged Objects catalog contains data from 14 Euclid Q1 tables, joined on the column object_id. The tables were produced by three Euclid processing functions: MER (multi-wavelength mosaics on common spatial and pixel scales), PHZ (photometric redshifts), and SPE (spectroscopy). The subsections below include the table names, links to reference papers, URLs to the original table schemas, and examples of how the column names were transformed for the Merged Objects catalog.

The original tables’ column names are mostly in all caps. In the Merged Objects catalog and the catalogs available through IRSA’s TAP service, all column names have been lower-cased. In addition, all non-alphanumeric characters have been replaced with an underscore for compatibility with various libraries and services. Finally, the original table name has been prepended to column names in the Merged Objects catalog, both for provenance and to avoid duplicates. An example that includes all of these transformations is: E(B-V) -> physparamqso_e_b_v_.

Three columns have special names that differ from the standard naming convention described above:

Seven additional columns have been added to the Merged Objects catalog that are not in the original Euclid tables. They are described below, after the Euclid tables.

4.1 MER tables

The Euclid MER processing function produced three tables. The reference paper is Euclid Collaboration: Romelli et al., 2025 (hereafter, Romelli). The tables are:

Main table (mer)

Morphology (morph)

Cutouts (cutouts)

Find all columns from these tables in the Parquet schema:

mer_prefixes = ["mer_", "morph_", "cutouts_"]
mer_col_counts = {p: len([n for n in schema.names if n.startswith(p)]) for p in mer_prefixes}

print(f"MER tables: {sum(mer_col_counts.values())} columns total")
for prefix, count in mer_col_counts.items():
    print(f"  {prefix}: {count}")
MER tables: 593 columns total
  mer_: 466
  morph_: 103
  cutouts_: 24

4.2 PHZ tables

The Euclid PHZ processing function produced eight tables. The reference paper is Euclid Collaboration: Tucci et al., 2025 (hereafter, Tucci). The tables are:

Photometric Redshifts (phz)

Classifications (class)

Galaxy Physical Parameters (physparam)

Galaxy SEDs (galaxysed)

QSO Physical Parameters (physparamqso)

Star Parameters (starclass)

Star SEDs (starsed)

NIR Physical Parameters (physparamnir)

Find all columns from these tables in the Parquet schema:

phz_prefixes = ["phz_", "class_", "physparam_", "galaxysed_", "physparamqso_",
                "starclass_", "starsed_", "physparamnir_"]
phz_col_counts = {p: len([n for n in schema.names if n.startswith(p)]) for p in phz_prefixes}

print(f"PHZ tables: {sum(phz_col_counts.values())} columns total")
for prefix, count in phz_col_counts.items():
    print(f"  {prefix}: {count}")
PHZ tables: 567 columns total
  phz_: 60
  class_: 12
  physparam_: 92
  galaxysed_: 119
  physparamqso_: 55
  starclass_: 54
  starsed_: 119
  physparamnir_: 56

4.3 SPE tables

The Euclid SPE processing function produced three tables from which data are included in the Merged Objects catalog. The reference paper is Euclid Collaboration: Le Brun et al., 2025 (hereafter, Le Brun).

These tables required special handling because they contain multiple rows per object (identified by column object_id). The tables were pivoted before being joined so that the Merged Objects catalog contains one row per object. The pivoted columns were named by combining at least the table name, the original column name, and the rank of the redshift estimate (i.e., the value in the original ‘SPE_RANK’ column).

The tables are:

Spectroscopic Redshifts (z)

Spectral Line Measurements (lines)

Models (models)

Find all columns from these tables in the Parquet schema:

spe_prefixes = ["z_", "lines_", "models_"]
spe_col_counts = {p: len([n for n in schema.names if n.startswith(p)]) for p in spe_prefixes}

print(f"SPE tables: {sum(spe_col_counts.values())} columns total")
for prefix, count in spe_col_counts.items():
    print(f"  {prefix[:-1]}: {count}")
SPE tables: 424 columns total
  z: 159
  lines: 165
  models: 100

4.4 Additional columns

The following columns were added to the Merged Objects catalog but do not appear in the original Euclid tables.

Euclid columns:

HEALPix columns:

These HEALPix indexes correspond to the object’s RA and Dec coordinates. They are useful for spatial queries, as demonstrated in the Euclid Deep Fields section below.

The HEALPix, Euclid object ID, and Euclid tile ID columns appear first:

schema.names[:5]
['_healpix_29', '_healpix_19', '_healpix_9', 'tileid', 'object_id']

HATS columns:

These are the HATS partitioning columns. They appear in the Parquet file names but are not included inside the files. However, PyArrow automatically makes them available as regular columns when the dataset is loaded as demonstrated in these tutorials.

The HATS columns appear at the end:

schema.names[-3:]
['Norder', 'Dir', 'Npix']

4.5 Find columns of interest

The subsections above show how to find all columns from a given Euclid table as well as the additional columns. Here we show some additional techniques for finding columns.

# Access the data type using the `field` method.
schema.field("mer_flux_y_2fwhm_aper")
pyarrow.Field<mer_flux_y_2fwhm_aper: float>
# The column metadata includes unit and description.
# Parquet metadata is always stored as bytestrings, which are denoted by a leading 'b'.
schema.field("mer_flux_y_2fwhm_aper").metadata
{b'unit': b'uJy', b'description': b'NIR Y band source aperture photometry flux (2 FWHM diameter) on PSF-matched images'}

Euclid Q1 offers many flux measurements, both from Euclid detections and from external ground-based surveys. They are given in microjanskys, so all flux columns can be found by searching the metadata for this unit.

# Find all flux columns.
flux_columns = [field.name for field in schema if field.metadata[b"unit"] == b"uJy"]

print(f"{len(flux_columns)} flux columns. First four are:")
flux_columns[:4]
585 flux columns. First four are:
['mer_flux_vis_1fwhm_aper', 'mer_flux_vis_2fwhm_aper', 'mer_flux_vis_3fwhm_aper', 'mer_flux_vis_4fwhm_aper']

Columns associated with external surveys are identified by the inclusion of “ext” in the name.

external_flux_columns = [name for name in flux_columns if "ext" in name]
print(f"{len(external_flux_columns)} flux columns from external surveys. First four are:")
external_flux_columns[:4]
282 flux columns from external surveys. First four are:
['mer_flux_u_ext_decam_1fwhm_aper', 'mer_flux_u_ext_decam_2fwhm_aper', 'mer_flux_u_ext_decam_3fwhm_aper', 'mer_flux_u_ext_decam_4fwhm_aper']

5. Euclid Deep Fields

Euclid Q1 includes data from three Euclid Deep Fields: EDF-N (North), EDF-S (South), EDF-F (Fornax; also in the southern hemisphere). There is also a small amount of data from a fourth field: LDN1641 (Lynds’ Dark Nebula 1641), which was observed for technical reasons during Euclid’s verification phase. The fields are described in Euclid Collaboration: Aussel et al., 2025 and can be seen on this skymap.

The regions are well separated, so we can distinguish them using a simple cone search without having to be too picky about the radius. We can load data more efficiently using the HEALPix order 9 pixels that cover each area rather than using RA and Dec values directly. These will be used in later tutorials.

# EDF-N (Euclid Deep Field - North)
ra, dec, radius = 269.733, 66.018, 4  # 20 sq deg
edfn_k9_pixels = hpgeom.query_circle(hpgeom.order_to_nside(9), ra, dec, radius, inclusive=True)

# EDF-S (Euclid Deep Field - South)
ra, dec, radius = 61.241, -48.423, 5  # 23 sq deg
edfs_k9_pixels = hpgeom.query_circle(hpgeom.order_to_nside(9), ra, dec, radius, inclusive=True)

# EDF-F (Euclid Deep Field - Fornax)
ra, dec, radius = 52.932, -28.088, 3  # 10 sq deg
edff_k9_pixels = hpgeom.query_circle(hpgeom.order_to_nside(9), ra, dec, radius, inclusive=True)

# LDN1641 (Lynds' Dark Nebula 1641)
ra, dec, radius = 85.74, -8.39, 1.5  # 6 sq deg
ldn_k9_pixels = hpgeom.query_circle(hpgeom.order_to_nside(9), ra, dec, radius, inclusive=True)

6. Basic Query

To demonstrate a basic query, we’ll search for objects with a galaxy photometric redshift estimate of 6.0 (largest possible). Other tutorials in this series will show more complex queries, and describe the redshifts and other data in more detail. PyArrow dataset filters are described at Filtering by Expressions, and the list of available functions is at Compute Functions.

dataset = pyarrow.dataset.dataset(dataset_path, partitioning="hive", filesystem=s3, schema=schema)

highz_objects = dataset.to_table(
    columns=["object_id", "phz_phz_median"], filter=pc.field("phz_phz_median") == 6
).to_pandas()
highz_objects
Loading...

About this notebook

Authors: Troy Raen, Vandana Desai, Andreas Faisst, Shoubaneh Hemmati, Jaladh Singhal, Brigitta Sipőcz, Jessica Krick, the IRSA Data Science Team, and the Euclid NASA Science Center at IPAC (ENSCI).

Updated: 2025-12-23

Contact: IRSA Helpdesk