From 6dcc5d02ff848ef01ec6763eb5af042ae38cd5f4 Mon Sep 17 00:00:00 2001 From: Jaladh Singhal Date: Tue, 26 May 2026 21:00:05 -0700 Subject: [PATCH 1/4] Add tutorial for accessing ZTF DR24 light curves from HATS catalog --- tutorials/ztf/ztf_lightcurves.md | 434 +++++++++++++++++++++++++++++++ 1 file changed, 434 insertions(+) create mode 100644 tutorials/ztf/ztf_lightcurves.md diff --git a/tutorials/ztf/ztf_lightcurves.md b/tutorials/ztf/ztf_lightcurves.md new file mode 100644 index 00000000..19f3e612 --- /dev/null +++ b/tutorials/ztf/ztf_lightcurves.md @@ -0,0 +1,434 @@ +--- +authors: +- name: Jaladh Singhal +- name: IRSA Data Science Team +jupytext: + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.19.3 +kernelspec: + display_name: irsa-tutorials + language: python + name: python3 +--- + +(ztf-lightcurves-lsdb)= +# Access ZTF DR24 Light Curves from HATS Catalog + ++++ + +## Learning Goals + +By the end of this tutorial, you will learn how to: + +- Open ZTF DR24 HATS catalogs for light curves and the Objects Table using `lsdb`. +- Retrieve light curves for specific sources by ZTF object IDs using an index search. +- Retrieve light curves for sources in a sky region using a cone search on RA and Dec. +- Cross-reference the Objects Table to enrich cone search results with per-source variability statistics. +- Plot ZTF light curves filtered by variability. + ++++ + +## Introduction + +The ZTF DR24 enhanced data products at IRSA include two [HATS](https://irsa.ipac.caltech.edu/docs/parquet_catalogs/#hats) (Hierarchical Adaptive Tiling Scheme) catalogs hosted on AWS S3: + +- **Lightcurves catalog**: one row per ZTF object, with a nested column storing the full photometry time series — timestamps, magnitudes, uncertainties, and quality flags. +- **Objects Table**: one row per ZTF object per band, with collapsed light curve metrics such as magnitude RMS, chi-squared variability statistic, number of good observations, and mean magnitude. + +These HATS catalogs offer a scalable, cloud-native alternative to the ZTF light curve service, enabling efficient access especially when the service is overloaded. +The [lsdb](https://docs.lsdb.io/en/latest/index.html) Python library provides a convenient interface for working with HATS catalogs, including spatial queries and object-ID-based lookups. + +This tutorial covers two common entry points for accessing ZTF light curves: + +1. **Object IDs**: you have specific ZTF object IDs — from a previous query, a catalog crossmatch, or a published source list — and want their light curves directly. +2. **RA/Dec**: you have sky coordinates and want all ZTF sources within a given radius. + +Both approaches are demonstrated below. An optional section then shows how to join the position search results with the Objects Table to select and plot the most variable sources using robust variability statistics. + +For more context on ZTF DR24 data products, refer to the [ZTF DR24 release notes](https://irsa.ipac.caltech.edu/data/ZTF/docs/releases/ztf_release_notes_latest) and [explanatory supplement](https://irsa.ipac.caltech.edu/data/ZTF/docs/ztf_explanatory_supplement.pdf) at IRSA. + ++++ + +## Imports + +```{code-cell} ipython3 +# Uncomment the next line to install dependencies if needed. +# !pip install s3fs "lsdb>=0.6.6,<0.8" pyarrow pandas numpy astropy matplotlib +``` + +```{code-cell} ipython3 +import s3fs +import lsdb +import pyarrow.parquet as pq +from astropy.coordinates import SkyCoord +import numpy as np +import pandas as pd +from astropy import units as u +import os +import matplotlib.pyplot as plt +from dask.distributed import Client +``` + +```{code-cell} ipython3 +pd.set_option("display.max_colwidth", None) +pd.set_option("display.min_rows", 18) +``` + +## 1. Locate ZTF DR24 HATS Catalogs in the Cloud + +From IRSA's [cloud data access page](https://irsa.ipac.caltech.edu/cloud_access/), we identify the S3 bucket and path prefixes for the ZTF DR24 HATS catalogs: + +```{code-cell} ipython3 +ztf_bucket = "ipac-irsa-ztf" +ztf_lc_hats_prefix = "ztf/enhanced/dr24/lc/hats" # Light curves catalog +ztf_objects_hats_prefix = "ztf/enhanced/dr24/objects/hats" # Objects table +``` + +[s3fs](https://s3fs.readthedocs.io/en/latest/) provides a filesystem-like Python interface for AWS S3 buckets. +First, we create an S3 client: + +```{code-cell} ipython3 +s3 = s3fs.S3FileSystem(anon=True) +``` + +Let's list the contents of the ZTF DR24 lightcurves HATS **collection**: + +```{code-cell} ipython3 +s3.ls(f"{ztf_bucket}/{ztf_lc_hats_prefix}") +``` + +In this collection, you can see collection properties, catalog, index table, and margin cache in order. +You can explore more directories to see how this HATS collection follows the directory structure described in IRSA's documentation on [HATS partitioning and HATS Collections](https://irsa.ipac.caltech.edu/docs/parquet_catalogs/#hats). + +As per the documentation, the Parquet file containing the schema for this catalog is stored in `dataset/_common_metadata`. +Let's save its path for later use (using the catalog name identified from the listing above): + +```{code-cell} ipython3 +ztf_lc_schema_path = "ztf_dr24_lc-hats/dataset/_common_metadata" # ztf_dr24_lc-hats is the catalog name identified above +``` + +Similarly, let's list the ZTF DR24 Objects Table HATS collection: + +```{code-cell} ipython3 +s3.ls(f"{ztf_bucket}/{ztf_objects_hats_prefix}") +``` + +```{code-cell} ipython3 +ztf_objects_schema_path = "ztf_dr24_objects-hats/dataset/_common_metadata" # ztf_dr24_objects-hats is the catalog name identified above +``` + +## 2. Explore the Catalog Schemas + +Before querying the catalogs, let's inspect what columns are available in each. +We read schemas from the `_common_metadata` files, which also contain column metadata such as units and descriptions: + +```{code-cell} ipython3 +def pq_schema_to_df(schema): + """Convert a PyArrow schema to a Pandas DataFrame.""" + return pd.DataFrame( + [ + ( + field.name, + str(field.type), + field.metadata.get(b"unit", b"").decode(), + field.metadata.get(b"description", b"").decode() + ) + for field in schema + ], + columns=["name", "type", "unit", "description"] + ) +``` + +```{code-cell} ipython3 +ztf_lc_schema = pq.read_schema( + f"s3://{ztf_bucket}/{ztf_lc_hats_prefix}/{ztf_lc_schema_path}", + filesystem=s3 +) +ztf_lc_schema_df = pq_schema_to_df(ztf_lc_schema) +ztf_lc_schema_df +``` + +Notice the `lightcurve` column — this is a **nested column** that stores the full photometric time series for each ZTF object. +Each element of `lightcurve` is itself a table with columns including: + +- `hmjd`: Heliocentric-based Modified Julian Date of each observation +- `mag` / `magerr`: Magnitude and its uncertainty +- `clrcoeff`: Linear color coefficient term from photometric calibration +- `catflags`: Photometric/image quality flags encoded as bits (described in the [explanatory supplement](https://irsa.ipac.caltech.edu/data/ZTF/docs/ztf_explanatory_supplement.pdf) section 13.6; set `catflags == 0` to keep only clean epochs) + +```{code-cell} ipython3 +ztf_lc_columns = ["objectid", "objra", "objdec", "filterid", "nepochs", "lightcurve"] +``` + +## 3. Get Light Curves by Object ID + +If you have specific ZTF object IDs, you can retrieve their light curves directly using an index search — no spatial filter needed. +This is the fastest approach for targeted lookups. + +### 3.1 Open the Light Curves Catalog + +We open the ZTF DR24 light curves HATS catalog. No data is read yet — lsdb opens catalogs [lazily](https://docs.lsdb.io/en/latest/tutorials/lazy_operations.html): + +```{code-cell} ipython3 +ztf_lc_catalog = lsdb.open_catalog( + f"s3://{ztf_bucket}/{ztf_lc_hats_prefix}", + columns=ztf_lc_columns +) +ztf_lc_catalog +``` + +### 3.2 Identify the Index Column + +The ZTF DR24 light curves HATS catalog ships with an ancillary index table that enables fast lookups by object ID. +Let's identify which column is indexed: + +```{code-cell} ipython3 +ztf_lc_idx_column = list(ztf_lc_catalog.hc_collection.all_indexes.keys())[0] +print(f"Index column: {ztf_lc_idx_column}") +``` + +### 3.3 Perform an Index Search + +We use the same object IDs from the [ZTF light curve API docs](https://irsa.ipac.caltech.edu/docs/program_interface/ztf_lightcurve_api.html) multi-object example — you can compare results from this tutorial directly with that service. +In your workflow, these IDs might come from a previous query, a catalog crossmatch, or a published source table: + +```{code-cell} ipython3 +object_ids = [686103400034440, 686103400106565] +object_ids +``` + +```{code-cell} ipython3 +ztf_lcs_by_id = ztf_lc_catalog.id_search(values={ztf_lc_idx_column: object_ids}) +ztf_lcs_by_id +``` + +### 3.4 Compute and Inspect the Results + +```{code-cell} ipython3 +ztf_lcs_by_id_df = ztf_lcs_by_id.compute() +ztf_lcs_by_id_df +``` + +```{code-cell} ipython3 +print(f"Found {len(ztf_lcs_by_id_df)} light curves for {len(object_ids)} objects.") +``` + +Each row is one ZTF object. The `lightcurve` column contains a nested DataFrame per object. +Let's inspect the light curve of the first object: + +```{code-cell} ipython3 +ztf_lcs_by_id_df['lightcurve'].iloc[0] +``` + +### 3.5 Plot Light Curves + +```{code-cell} ipython3 +fig, axs = plt.subplots(len(ztf_lcs_by_id_df), 1, + figsize=(10, 4 * len(ztf_lcs_by_id_df)), + constrained_layout=True) + +if len(ztf_lcs_by_id_df) == 1: + axs = [axs] + +for ax, (_, row) in zip(axs, ztf_lcs_by_id_df.iterrows()): + lc = row['lightcurve'].query("catflags == 0") + title = f"ZTF Object {row['objectid']} (RA={row['objra']:.4f}°, Dec={row['objdec']:.4f}°)" + pts = ax.plot(lc['hmjd'], lc['mag'], '.', markersize=4, zorder=3) + ax.errorbar( + lc['hmjd'], lc['mag'], yerr=lc['magerr'], + fmt='none', ecolor=pts[0].get_color(), elinewidth=0.8, alpha=0.3, zorder=2 + ) + ax.set_ylabel("Magnitude") + ax.set_xlabel("HMJD") + ax.invert_yaxis() + ax.set_title(title, fontsize=10) + +fig.suptitle("ZTF DR24 Light Curves — Object ID Search Results", fontsize=13, y=1.02) +plt.show() +``` + +## 4. Get Light Curves by Sky Position + +If you have sky coordinates and want all ZTF sources within a given area, use a cone search. + +### 4.1 Define a Spatial Filter + +We use the same sky position as the [ZTF light curve API docs](https://irsa.ipac.caltech.edu/docs/program_interface/ztf_lightcurve_api.html) positional example: + +```{code-cell} ipython3 +target = SkyCoord(ra=298.0025, dec=29.87147, unit="deg") # same as ZTF light curve API docs positional example +search_radius = 5 * u.arcsec +``` + +Using lsdb, we define a cone [search object](https://docs.lsdb.io/en/latest/tutorials/region_selection.html#4.-The-Search-object) for this region: + +```{code-cell} ipython3 +spatial_filter = lsdb.ConeSearch( + ra=target.ra.deg, + dec=target.dec.deg, + radius_arcsec=search_radius.to(u.arcsec).value +) +``` + +### 4.2 Define Row Filters + +In addition to the spatial filter, we can pre-filter rows using Parquet column statistics. +Here we keep only objects with more than 100 epochs, focusing on well-sampled light curves: + +```{code-cell} ipython3 +row_filters = [["nepochs", ">", 100]] +``` + +### 4.3 Open the Filtered Light Curves Catalog + +We open the catalog with both filters applied. lsdb evaluates this lazily — no data is read yet: + +```{code-cell} ipython3 +ztf_lc_cone = lsdb.open_catalog( + f"s3://{ztf_bucket}/{ztf_lc_hats_prefix}", + search_filter=spatial_filter, + columns=ztf_lc_columns, + filters=row_filters +) +ztf_lc_cone +``` + +Notice that only the partitions overlapping the cone are included, avoiding reads of the full catalog. + +### 4.4 Compute and Inspect the Results + +Now we execute the query by calling `compute()`. The ZTF DR24 LC catalog stores full nested light curves per HATS partition — each partition can be several gigabytes regardless of cone size. We create a Dask client with `memory_limit=None` to avoid per-worker memory caps: + +```{code-cell} ipython3 +def get_nworkers(catalog): + return min(os.cpu_count(), catalog.npartitions + 1) + +with Client(n_workers=get_nworkers(ztf_lc_cone), + threads_per_worker=1, + memory_limit=None # each partition can be several GB; avoid per-worker cap + ) as client: + print(f"You can monitor progress in the Dask dashboard at {client.dashboard_link}") + ztf_lc_cone_df = ztf_lc_cone.compute() +``` + +```{code-cell} ipython3 +ztf_lc_cone_df +``` + +```{code-cell} ipython3 +print(f"Found {len(ztf_lc_cone_df)} ZTF light curves for the search criteria.") +``` + +Each row corresponds to one ZTF object. The `lightcurve` column contains a nested DataFrame per object: + +```{code-cell} ipython3 +ztf_lc_cone_df['lightcurve'].iloc[0] +``` + +## 5. [Optional] Look Up Additional Info from the Objects Table + +```{note} +This section is optional — skip it if you only need the raw light curves from section 4. +``` + +### 5.1 Explore the Objects Table Schema + +The Objects Table contains per-band summary statistics for each ZTF source. +Let's inspect its schema to identify columns of interest: + +```{code-cell} ipython3 +ztf_objects_schema = pq.read_schema( + f"s3://{ztf_bucket}/{ztf_objects_hats_prefix}/{ztf_objects_schema_path}", + filesystem=s3 +) +pq_schema_to_df(ztf_objects_schema) +``` + +We'll select a subset of columns useful for characterizing variable sources: + +```{code-cell} ipython3 +ztf_objects_columns = ['oid', 'ra', 'dec', 'filtercode', 'ngoodobsrel', 'chisq', 'magrms', 'meanmag', 'medianabsdev'] +``` + +### 5.2 Open the Objects Table + +We reuse the same `spatial_filter` from section 4 to retrieve Objects Table entries for the same sky region: + +```{code-cell} ipython3 +ztf_objects_cone = lsdb.open_catalog( + f"s3://{ztf_bucket}/{ztf_objects_hats_prefix}", + search_filter=spatial_filter, + columns=ztf_objects_columns +) +ztf_objects_cone +``` + +### 5.3 Compute and Inspect + +```{code-cell} ipython3 +with Client(n_workers=get_nworkers(ztf_objects_cone), + threads_per_worker=1, + memory_limit=None) as client: + ztf_objects_cone_df = ztf_objects_cone.compute() +``` + +```{code-cell} ipython3 +ztf_objects_cone_df +``` + +### 5.4 Merge Objects Table Info into Light Curves + +We join the Objects Table with the position search light curves on the shared object ID: + +```{code-cell} ipython3 +objects_cols_to_merge = ['oid', 'filtercode', 'ngoodobsrel', 'chisq', 'magrms', 'meanmag', 'medianabsdev'] +combined_df = ztf_lc_cone_df.merge( + ztf_objects_cone_df[objects_cols_to_merge], + left_on='objectid', + right_on='oid', + how='inner' +) +combined_df +``` + +## 6. Plot Most Variable Light Curves from the Position Search + +Using the `chisq` column from the Objects Table, we select the top 3 most variable sources from the position search and plot their light curves annotated with summary statistics: + +```{code-cell} ipython3 +most_variable = combined_df.nlargest(3, 'chisq') + +fig, axs = plt.subplots(len(most_variable), 1, + figsize=(10, 4 * len(most_variable)), + constrained_layout=True) + +if len(most_variable) == 1: + axs = [axs] + +for ax, (_, row) in zip(axs, most_variable.iterrows()): + lc = row['lightcurve'].query("catflags == 0") + title = (f"ZTF Object {row['objectid']} ({row['filtercode']} band)\n" + f"χ²={row['chisq']:.2f}, RMS mag={row['magrms']:.4f}, " + f"mean mag={row['meanmag']:.3f}, N good obs={int(row['ngoodobsrel'])}") + pts = ax.plot(lc['hmjd'], lc['mag'], '.', markersize=4, zorder=3) + ax.errorbar( + lc['hmjd'], lc['mag'], yerr=lc['magerr'], + fmt='none', ecolor=pts[0].get_color(), elinewidth=0.8, alpha=0.3, zorder=2 + ) + ax.set_ylabel("Magnitude") + ax.set_xlabel("HMJD") + ax.invert_yaxis() + ax.set_title(title, fontsize=10) + +fig.suptitle("Most Variable ZTF DR24 Sources from Position Search (annotated with Objects Table data)", fontsize=13, y=1.02) +plt.show() +``` + +## About this notebook + +Updated: 2026-05-26 + +Contact: the [IRSA Helpdesk](https://irsa.ipac.caltech.edu/docs/help_desk.html) with questions or to report problems. From 9192ca336ef2ac60f833f140ab7c3eaf25089b9a Mon Sep 17 00:00:00 2001 From: Jaladh Singhal Date: Wed, 27 May 2026 16:09:41 -0700 Subject: [PATCH 2/4] Fix narrative and some cleanup --- tutorials/ztf/ztf_lightcurves.md | 74 ++++++++++++++++++-------------- 1 file changed, 41 insertions(+), 33 deletions(-) diff --git a/tutorials/ztf/ztf_lightcurves.md b/tutorials/ztf/ztf_lightcurves.md index 19f3e612..47142c9a 100644 --- a/tutorials/ztf/ztf_lightcurves.md +++ b/tutorials/ztf/ztf_lightcurves.md @@ -1,7 +1,8 @@ --- authors: - name: Jaladh Singhal -- name: IRSA Data Science Team +- name: Troy Raen +- name: "Brigitta Sip\u0151cz" jupytext: text_representation: extension: .md @@ -14,7 +15,7 @@ kernelspec: name: python3 --- -(ztf-lightcurves-lsdb)= +(ztf-lightcurves)= # Access ZTF DR24 Light Curves from HATS Catalog +++ @@ -25,9 +26,9 @@ By the end of this tutorial, you will learn how to: - Open ZTF DR24 HATS catalogs for light curves and the Objects Table using `lsdb`. - Retrieve light curves for specific sources by ZTF object IDs using an index search. -- Retrieve light curves for sources in a sky region using a cone search on RA and Dec. +- Retrieve light curves for sources in a sky region using a cone search. - Cross-reference the Objects Table to enrich cone search results with per-source variability statistics. -- Plot ZTF light curves filtered by variability. +- Plot ZTF light curves (filtered by variability statistics). +++ @@ -38,7 +39,7 @@ The ZTF DR24 enhanced data products at IRSA include two [HATS](https://irsa.ipac - **Lightcurves catalog**: one row per ZTF object, with a nested column storing the full photometry time series — timestamps, magnitudes, uncertainties, and quality flags. - **Objects Table**: one row per ZTF object per band, with collapsed light curve metrics such as magnitude RMS, chi-squared variability statistic, number of good observations, and mean magnitude. -These HATS catalogs offer a scalable, cloud-native alternative to the ZTF light curve service, enabling efficient access especially when the service is overloaded. +These HATS catalogs offer a scalable, cloud-native alternative to the [ZTF light curve service](https://irsa.ipac.caltech.edu/docs/program_interface/ztf_lightcurve_api.html), enabling efficient access especially when the service is overloaded. The [lsdb](https://docs.lsdb.io/en/latest/index.html) Python library provides a convenient interface for working with HATS catalogs, including spatial queries and object-ID-based lookups. This tutorial covers two common entry points for accessing ZTF light curves: @@ -56,7 +57,7 @@ For more context on ZTF DR24 data products, refer to the [ZTF DR24 release notes ```{code-cell} ipython3 # Uncomment the next line to install dependencies if needed. -# !pip install s3fs "lsdb>=0.6.6,<0.8" pyarrow pandas numpy astropy matplotlib +# !pip install s3fs "lsdb>=0.6.6,<0.8" pyarrow pandas astropy matplotlib ``` ```{code-cell} ipython3 @@ -64,7 +65,6 @@ import s3fs import lsdb import pyarrow.parquet as pq from astropy.coordinates import SkyCoord -import numpy as np import pandas as pd from astropy import units as u import os @@ -120,7 +120,7 @@ s3.ls(f"{ztf_bucket}/{ztf_objects_hats_prefix}") ztf_objects_schema_path = "ztf_dr24_objects-hats/dataset/_common_metadata" # ztf_dr24_objects-hats is the catalog name identified above ``` -## 2. Explore the Catalog Schemas +## 2. Explore the Catalog Schema Before querying the catalogs, let's inspect what columns are available in each. We read schemas from the `_common_metadata` files, which also contain column metadata such as units and descriptions: @@ -152,12 +152,8 @@ ztf_lc_schema_df ``` Notice the `lightcurve` column — this is a **nested column** that stores the full photometric time series for each ZTF object. -Each element of `lightcurve` is itself a table with columns including: - -- `hmjd`: Heliocentric-based Modified Julian Date of each observation -- `mag` / `magerr`: Magnitude and its uncertainty -- `clrcoeff`: Linear color coefficient term from photometric calibration -- `catflags`: Photometric/image quality flags encoded as bits (described in the [explanatory supplement](https://irsa.ipac.caltech.edu/data/ZTF/docs/ztf_explanatory_supplement.pdf) section 13.6; set `catflags == 0` to keep only clean epochs) +Each element of `lightcurve` is itself a table with columns including `hmjd`, `mag`,`magerr`, `clrcoeff` and `catflags`. +We save the list of columns interesting to us for later use when opening the catalog with `lsdb`: ```{code-cell} ipython3 ztf_lc_columns = ["objectid", "objra", "objdec", "filterid", "nepochs", "lightcurve"] @@ -197,7 +193,6 @@ In your workflow, these IDs might come from a previous query, a catalog crossmat ```{code-cell} ipython3 object_ids = [686103400034440, 686103400106565] -object_ids ``` ```{code-cell} ipython3 @@ -207,6 +202,8 @@ ztf_lcs_by_id ### 3.4 Compute and Inspect the Results +Now we execute the query we planned in previous steps by calling `compute()`. This is where the data is read into memory as a Pandas DataFrame. + ```{code-cell} ipython3 ztf_lcs_by_id_df = ztf_lcs_by_id.compute() ztf_lcs_by_id_df @@ -224,6 +221,7 @@ ztf_lcs_by_id_df['lightcurve'].iloc[0] ``` ### 3.5 Plot Light Curves +When plotting the light curves, it's important to note that we apply `catflags == 0` filter to keep only clean epochs (as described in the [explanatory supplement](https://irsa.ipac.caltech.edu/data/ZTF/docs/ztf_explanatory_supplement.pdf) section 13.6). ```{code-cell} ipython3 fig, axs = plt.subplots(len(ztf_lcs_by_id_df), 1, @@ -234,7 +232,7 @@ if len(ztf_lcs_by_id_df) == 1: axs = [axs] for ax, (_, row) in zip(axs, ztf_lcs_by_id_df.iterrows()): - lc = row['lightcurve'].query("catflags == 0") + lc = row['lightcurve'].query("catflags == 0") # to keep only clean epochs title = f"ZTF Object {row['objectid']} (RA={row['objra']:.4f}°, Dec={row['objdec']:.4f}°)" pts = ax.plot(lc['hmjd'], lc['mag'], '.', markersize=4, zorder=3) ax.errorbar( @@ -256,10 +254,10 @@ If you have sky coordinates and want all ZTF sources within a given area, use a ### 4.1 Define a Spatial Filter -We use the same sky position as the [ZTF light curve API docs](https://irsa.ipac.caltech.edu/docs/program_interface/ztf_lightcurve_api.html) positional example: +We use the same sky position as the [ZTF light curve API docs](https://irsa.ipac.caltech.edu/docs/program_interface/ztf_lightcurve_api.html) positional example but you can specify any coordinates and search radius you want: ```{code-cell} ipython3 -target = SkyCoord(ra=298.0025, dec=29.87147, unit="deg") # same as ZTF light curve API docs positional example +target = SkyCoord(ra=298.0025, dec=29.87147, unit="deg") search_radius = 5 * u.arcsec ``` @@ -276,10 +274,13 @@ spatial_filter = lsdb.ConeSearch( ### 4.2 Define Row Filters In addition to the spatial filter, we can pre-filter rows using Parquet column statistics. -Here we keep only objects with more than 100 epochs, focusing on well-sampled light curves: +Here we keep only objects with more than 50 epochs, focusing on well-sampled light curves: ```{code-cell} ipython3 -row_filters = [["nepochs", ">", 100]] +row_filters = [ + ["nepochs", ">", 50], + # additional filters can be added here if desired + ] ``` ### 4.3 Open the Filtered Light Curves Catalog @@ -300,7 +301,7 @@ Notice that only the partitions overlapping the cone are included, avoiding read ### 4.4 Compute and Inspect the Results -Now we execute the query by calling `compute()`. The ZTF DR24 LC catalog stores full nested light curves per HATS partition — each partition can be several gigabytes regardless of cone size. We create a Dask client with `memory_limit=None` to avoid per-worker memory caps: +Now we execute the query by calling `compute()`. The ZTF DR24 LC catalog stores full nested light curves per HATS partition. We wrap the compute call in a Dask client to parallelize if multiple partitions are involved, and to monitor progress in the Dask dashboard. ```{code-cell} ipython3 def get_nworkers(catalog): @@ -315,11 +316,11 @@ with Client(n_workers=get_nworkers(ztf_lc_cone), ``` ```{code-cell} ipython3 -ztf_lc_cone_df +print(f"Found {len(ztf_lc_cone_df)} ZTF light curves for the search criteria.") ``` ```{code-cell} ipython3 -print(f"Found {len(ztf_lc_cone_df)} ZTF light curves for the search criteria.") +ztf_lc_cone_df.head(5) ``` Each row corresponds to one ZTF object. The `lightcurve` column contains a nested DataFrame per object: @@ -331,7 +332,7 @@ ztf_lc_cone_df['lightcurve'].iloc[0] ## 5. [Optional] Look Up Additional Info from the Objects Table ```{note} -This section is optional — skip it if you only need the raw light curves from section 4. +This section is optional — skip it if you don't need additional information beyond the raw light curves from section 4. ``` ### 5.1 Explore the Objects Table Schema @@ -347,7 +348,7 @@ ztf_objects_schema = pq.read_schema( pq_schema_to_df(ztf_objects_schema) ``` -We'll select a subset of columns useful for characterizing variable sources: +We'll select a subset of columns useful for characterizing and annotating variable sources: ```{code-cell} ipython3 ztf_objects_columns = ['oid', 'ra', 'dec', 'filtercode', 'ngoodobsrel', 'chisq', 'magrms', 'meanmag', 'medianabsdev'] @@ -355,7 +356,7 @@ ztf_objects_columns = ['oid', 'ra', 'dec', 'filtercode', 'ngoodobsrel', 'chisq', ### 5.2 Open the Objects Table -We reuse the same `spatial_filter` from section 4 to retrieve Objects Table entries for the same sky region: +We reuse the same `spatial_filter` from section 4 to retrieve Objects Table entries for the same sky region. This is important for ensuring we only retrieve rows relevant to the light curves we got from the position search. ```{code-cell} ipython3 ztf_objects_cone = lsdb.open_catalog( @@ -381,12 +382,11 @@ ztf_objects_cone_df ### 5.4 Merge Objects Table Info into Light Curves -We join the Objects Table with the position search light curves on the shared object ID: +We merge the Objects Table with the position search light curves on the shared object ID via an inner join: ```{code-cell} ipython3 -objects_cols_to_merge = ['oid', 'filtercode', 'ngoodobsrel', 'chisq', 'magrms', 'meanmag', 'medianabsdev'] combined_df = ztf_lc_cone_df.merge( - ztf_objects_cone_df[objects_cols_to_merge], + ztf_objects_cone_df, left_on='objectid', right_on='oid', how='inner' @@ -396,11 +396,17 @@ combined_df ## 6. Plot Most Variable Light Curves from the Position Search -Using the `chisq` column from the Objects Table, we select the top 3 most variable sources from the position search and plot their light curves annotated with summary statistics: +Using the `chisq` column, we rudimentarily select the top 3 most variable sources from the position search results combined with objects table. ```{code-cell} ipython3 +# most_variable = ztf_lc_cone_df # uncomment if you skipped section 5, and comment the line below most_variable = combined_df.nlargest(3, 'chisq') +most_variable +``` + +Then we plot their light curves annotated with summary statistics: +```{code-cell} ipython3 fig, axs = plt.subplots(len(most_variable), 1, figsize=(10, 4 * len(most_variable)), constrained_layout=True) @@ -409,7 +415,7 @@ if len(most_variable) == 1: axs = [axs] for ax, (_, row) in zip(axs, most_variable.iterrows()): - lc = row['lightcurve'].query("catflags == 0") + lc = row['lightcurve'].query("catflags == 0") # to keep only clean epochs title = (f"ZTF Object {row['objectid']} ({row['filtercode']} band)\n" f"χ²={row['chisq']:.2f}, RMS mag={row['magrms']:.4f}, " f"mean mag={row['meanmag']:.3f}, N good obs={int(row['ngoodobsrel'])}") @@ -429,6 +435,8 @@ plt.show() ## About this notebook -Updated: 2026-05-26 +**Updated:** 2026-05-27 + +**Contact:** the [IRSA Helpdesk](https://irsa.ipac.caltech.edu/docs/help_desk.html) with questions or to report problems. -Contact: the [IRSA Helpdesk](https://irsa.ipac.caltech.edu/docs/help_desk.html) with questions or to report problems. +**AI Acknowledgement:** This tutorial was developed with the assistance of AI tools. From bfad5882cdf342598ec1abfe87129139d4b6cdb2 Mon Sep 17 00:00:00 2001 From: Jaladh Singhal Date: Wed, 27 May 2026 16:26:19 -0700 Subject: [PATCH 3/4] Add ztf notebook to TOC --- toc.yml | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/toc.yml b/toc.yml index e70ab95c..804533be 100644 --- a/toc.yml +++ b/toc.yml @@ -60,6 +60,10 @@ project: - title: Spitzer children: - file: tutorials/spitzer/plot_Spitzer_IRS_spectra.md + - title: ZTF + children: + - title: DR24 Light Curves (HATS) + file: tutorials/ztf/ztf_lightcurves.md - title: Simulated Data file: tutorials/simulated-data/simulated.md children: From 716de78371e5ac7a36e9bd2a7baaca7cc5d405ee Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Brigitta=20Sip=C5=91cz?= Date: Wed, 27 May 2026 20:39:27 -0700 Subject: [PATCH 4/4] Adding new lsdb hats notebook to the ignore list for oldestdeps test --- tox.ini | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tox.ini b/tox.ini index 53d20428..474cad22 100644 --- a/tox.ini +++ b/tox.ini @@ -50,7 +50,7 @@ install_command = # lsdb has tighter minimum dependencies, deal with it here for now, long term handle it from the notebook metadata # We need to do this here before the dependencies are installed to work around deps conflicts # SED fitting notebook uses numpy 2.0+ functionality, ignore it from the oldest job - oldestdeps: bash -c "echo tutorials/techniques-and-tools/irsa-hats-with-lsdb >> ignore_testing; echo tutorials/simulated-data/OpenUniverse2024/openuniverse2024_SED_fit.md >> ignore_testing; sed -i -e 's|lsdb|\#lsdb|g' tutorial_requirements.txt && python -I -m pip install $@" + oldestdeps: bash -c "echo tutorials/techniques-and-tools/irsa-hats-with-lsdb >> ignore_testing; echo tutorials/ztf/ztf_lightcurves >> ignore_testing; echo tutorials/simulated-data/OpenUniverse2024/openuniverse2024_SED_fit.md >> ignore_testing; sed -i -e 's|lsdb|\#lsdb|g' tutorial_requirements.txt && python -I -m pip install $@" # Adding back the default install command; commented out version for clear cases, more complex one if we need to add more conditional skips # !oldestdeps: python -I -m pip install {opts} {packages}