Batch Module

Multi-source batch photometry over sky regions.

Configuration and utilities for batch photometry.

class spxquery.batch.config.BatchConfig(center_ra: float, center_dec: float, radius: float, catalog_path: ~pathlib._local.Path, coverage_mode: str = 'any', bands: ~typing.List[str] | None = None, mjd_range: ~typing.Tuple[float, float] | None = None, max_images: int = 500, output_dir: ~pathlib._local.Path = <factory>, max_download_workers: int = 4, max_extract_workers: int = 12, photometry: ~spxquery.core.config.PhotometryConfig = <factory>, num_buckets: int = 64, keep_bucket_files: bool = False)[source]

Bases: object

Configuration for multi-source batch photometry over a sky region.

Parameters:
  • center_ra (float) – Sky region center in degrees (ICRS).

  • center_dec (float) – Sky region center in degrees (ICRS).

  • radius (float) – Search radius in degrees.

  • catalog_path (Path) – CSV file with columns targetid, ra, dec.

  • coverage_mode (str) – "any" (INTERSECTS) or "full" (CONTAINS).

  • bands (list of str or None) – Bands to query, e.g. ["D1", "D3"]. None = all.

  • mjd_range (tuple of (float, float) or None) – (mjd_min, mjd_max) to filter observations by time. None = no time filter (all epochs).

  • max_images (int) – Safety gate — raise if query returns more images than this.

  • output_dir (Path) – Root directory for all batch outputs.

  • max_download_workers (int) – Parallel download threads.

  • max_extract_workers (int) – Parallel extraction processes (spawn-based).

  • photometry (PhotometryConfig) – Photometry parameters forwarded to extraction.

  • num_buckets (int) – Hash-partition buckets for aggregation.

  • keep_bucket_files (bool) – Keep temporary bucket CSVs after aggregation.

center_ra: float
center_dec: float
radius: float
catalog_path: Path
coverage_mode: str = 'any'
bands: List[str] | None = None
mjd_range: Tuple[float, float] | None = None
max_images: int = 500
output_dir: Path
max_download_workers: int = 4
max_extract_workers: int = 12
photometry: PhotometryConfig
num_buckets: int = 64
keep_bucket_files: bool = False
property image_dir: Path
property per_image_dir: Path
property lightcurve_dir: Path
property bucket_dir: Path
__init__(center_ra: float, center_dec: float, radius: float, catalog_path: ~pathlib._local.Path, coverage_mode: str = 'any', bands: ~typing.List[str] | None = None, mjd_range: ~typing.Tuple[float, float] | None = None, max_images: int = 500, output_dir: ~pathlib._local.Path = <factory>, max_download_workers: int = 4, max_extract_workers: int = 12, photometry: ~spxquery.core.config.PhotometryConfig = <factory>, num_buckets: int = 64, keep_bucket_files: bool = False) None
spxquery.batch.config.load_catalog(catalog_path: Path) List[Source][source]

Load a source catalog CSV into a list of Source objects.

Expected columns: targetid, ra, dec.

Region-based query for SPHEREx full-frame images.

Delegates to spxquery.core.query.query_spherex_region().

spxquery.batch.query.query_region_observations(config: BatchConfig) QueryResults[source]

Query SPHEREx archive for full-frame images covering a sky region.

Thin wrapper that translates BatchConfig fields into query_spherex_region() parameters.

Parameters:

config (BatchConfig) – Batch configuration with region definition and query parameters.

Returns:

Matching observations with download URLs.

Return type:

QueryResults

Multi-source aperture photometry extraction from SPHEREx images.

spxquery.batch.extract.process_single_image(image_path: Path, sources: List[Source], config: PhotometryConfig, output_dir: Path, skip_existing: bool = True) Path | None[source]

Extract aperture photometry for all catalog sources in one image.

Optimized for batch processing: pre-computes shared arrays (background mask, error map, pixel scale) once per image, then uses local cutouts for per-source photometry instead of operating on the full image.

Parameters:
  • image_path (Path) – Path to a SPHEREx MEF FITS file.

  • sources (list of Source) – All catalog sources to check.

  • config (PhotometryConfig) – Photometry extraction parameters.

  • output_dir (Path) – Directory for per-image CSV output.

  • skip_existing (bool) – Skip images that already have an output CSV.

Returns:

Path to the output CSV, or None if skipped / no results.

Return type:

Path or None

spxquery.batch.extract.run_extraction(image_dir: Path, sources: List[Source], config: PhotometryConfig, output_dir: Path, n_workers: int = 12, skip_existing: bool = True) int[source]

Run multi-source extraction across all images in a directory.

Parameters:
  • image_dir (Path) – Directory containing SPHEREx FITS files (searched recursively).

  • sources (list of Source) – Catalog sources to extract photometry for.

  • config (PhotometryConfig) – Photometry parameters.

  • output_dir (Path) – Per-image CSV output directory.

  • n_workers (int) – Number of parallel workers.

  • skip_existing (bool) – Skip images with existing output CSVs.

Returns:

Number of newly processed images.

Return type:

int

Aggregate per-image photometry CSVs into per-source light curves.

spxquery.batch.aggregate.aggregate_lightcurves(image_csv_dir: Path, lightcurve_dir: Path, bucket_dir: Path, num_buckets: int = 64, clean: bool = False, keep_bucket_files: bool = False) int[source]

Aggregate per-image CSVs into individual source light curves.

Two-phase bucket design keeps memory bounded:
  1. Stream per-image CSVs into hash-partitioned bucket files.

  2. Process one bucket at a time, sort, write per-source CSVs.

Batch photometry pipeline — orchestrates query, download, extract, aggregate.

spxquery.batch.pipeline.load_query_summary(output_dir: Path) dict[source]

Load a previously saved query_summary.yaml.

Parameters:

output_dir (Path) – Root batch output directory containing the YAML file.

class spxquery.batch.pipeline.BatchPipeline(config: BatchConfig)[source]

Bases: object

Multi-source batch photometry pipeline.

Four stages: query -> download -> extract -> aggregate. Each stage can be run independently for resumable execution.

Parameters:

config (BatchConfig) – Region, catalog, and processing configuration.

__init__(config: BatchConfig)[source]
run_query()[source]

Stage 1: Query IRSA for full-frame images covering the region.

run_download(skip_existing: bool = True) List[DownloadResult][source]

Stage 2: Download full-frame FITS images (no cutouts).

run_extract(skip_existing: bool = True) int[source]

Stage 3: Extract multi-source photometry from each image.

run_aggregate(clean: bool = False) int[source]

Stage 4: Aggregate per-image CSVs into per-source light curves.

run_all(skip_existing: bool = True)[source]

Run all four stages sequentially.

spxquery.batch.pipeline.run_batch(catalog: str, center_ra: float, center_dec: float, radius: float, output_dir: str = 'batch_output', bands: List[str] | None = None, coverage_mode: str = 'any', max_images: int = 500, max_download_workers: int = 4, max_extract_workers: int = 12, skip_existing: bool = True, photometry_config: PhotometryConfig | None = None) BatchPipeline[source]

Run the full batch photometry pipeline with one function call.

Parameters:
  • catalog (str) – Path to CSV with columns targetid, ra, dec.

  • center_ra (float) – Region center in degrees.

  • center_dec (float) – Region center in degrees.

  • radius (float) – Region radius in degrees.

  • output_dir (str) – Root output directory.

  • bands (list of str or None) – Bands to query. None = all.

  • coverage_mode (str) – "any" (INTERSECTS) or "full" (CONTAINS).

  • max_images (int) – Safety gate — raise if exceeded.

  • max_download_workers (int) – Parallel download threads.

  • max_extract_workers (int) – Parallel extraction processes.

  • skip_existing (bool) – Resume mode — skip already-processed images.

  • photometry_config (PhotometryConfig or None) – Override default photometry parameters.

Returns:

The pipeline instance (for inspecting results).

Return type:

BatchPipeline