Batch Photometry

The batch photometry module (spxquery.batch) enables multi-source aperture photometry over a sky region. Unlike the single-source pipeline which downloads small cutouts around one target, the batch module queries for full-frame images covering a circular region, then extracts photometry for all catalog sources in each image simultaneously.

When to Use Batch vs. Single-Source

Feature	Single-Source (`SPXQueryPipeline`)	Batch (`spxquery.batch`)
Targets	One source per run	Multiple sources from catalog CSV
Downloads	Cutout images (~100 KB each)	Full-frame images (~70 MB each)
Query	Point search (CONESearch)	Region search (CIRCLE + INTERSECTS)
Output	Per-source light curve + plot	Per-source light curves (CSV only)
Use case	Detailed analysis of one object	Survey of many objects in a region

Quick Start

from spxquery.batch import run_batch

run_batch(
    catalog="sources.csv",
    center_ra=270.0,
    center_dec=66.56,
    radius=1.0,
    bands=["D3", "D4"],
)

This single call will:

Query the IRSA TAP service for full-frame images covering a 1° radius circle
Download matching images in D3 and D4 bands
Extract aperture photometry for all catalog sources in each image
Aggregate per-image results into per-source light curves

Catalog Format

The source catalog must be a CSV file with columns targetid, ra, dec:

targetid,ra,dec
39633458707826492,265.623,66.531
39633451346821630,266.445,65.636
39633453829850190,266.794,65.983

Additional columns (flux, redshift, etc.) are ignored. Coordinates must be in degrees (ICRS).

Configuration

from pathlib import Path
from spxquery.batch import BatchConfig, run_batch
from spxquery.core.config import PhotometryConfig

config = BatchConfig(
    # Sky region
    center_ra=270.0,
    center_dec=66.56,
    radius=1.0,
    catalog_path=Path("sources.csv"),

    # Query filters
    coverage_mode="any",
    bands=["D3", "D4"],
    mjd_range=(60800, 61000),

    # Safety
    max_images=500,

    # Output
    output_dir=Path("batch_output"),

    # Parallelism
    max_download_workers=4,
    max_extract_workers=12,

    # Photometry parameters (forwarded to extraction)
    photometry=PhotometryConfig(
        aperture_method="fwhm",
        fwhm_multiplier=2.5,
        background_method="window",
        window_size=30,
        subtract_zodi=True,
    ),
)

run_batch(config)

Configuration Parameters

Region and Query

Parameter	Type	Default	Description
`center_ra`	float	required	Region center RA in degrees (0–360)
`center_dec`	float	required	Region center Dec in degrees (−90 to +90)
`radius`	float	required	Search radius in degrees
`catalog_path`	Path	required	CSV file with `targetid`, `ra`, `dec` columns
`coverage_mode`	str	`"any"`	`"any"` = image overlaps region; `"full"` = image fully contains region
`bands`	list[str]	`None`	Filter by band, e.g. `["D1", "D3"]`. `None` = all bands
`mjd_range`	tuple	`None`	Time filter as `(mjd_min, mjd_max)`. `None` = no filter
`max_images`	int	500	Raise error if query returns more images than this

Processing

Parameter	Type	Default	Description
`max_download_workers`	int	4	Parallel download threads
`max_extract_workers`	int	12	Parallel extraction processes
`output_dir`	Path	`"batch_output"`	Root directory for all outputs
`num_buckets`	int	64	Hash-partition buckets for aggregation
`keep_bucket_files`	bool	`False`	Keep temporary bucket CSVs after aggregation
`photometry`	PhotometryConfig	defaults	Photometry parameters (see Parameter Configuration)

Coverage Modes

The coverage mode controls which images are selected from the archive:

`any` (INTERSECTS)

Selects images whose footprint overlaps with the search circle. This is the most inclusive mode — it returns all images that touch the region, even if only a small corner overlaps.

Query: INTERSECTS(p.poly, CIRCLE('ICRS', ra, dec, radius)) = 1

Use when you want maximum coverage and don’t mind some images only partially covering your region.

`full` (CONTAINS)

Selects images that fully contain the search circle. This ensures every returned image covers the entire region, so all catalog sources are present in every image.

Query: CONTAINS(CIRCLE('ICRS', ra, dec, radius), p.poly) = 1

Use when you need complete source coverage across all images (e.g., for consistent light curves). Since SPHEREx full-frame images are ~3.7° across, a full search with radius < 1° will return a smaller but more complete subset.

Band and Time Filtering

Band Selection

SPHEREx has 6 detectors covering different wavelength ranges:

Band	Wavelength (μm)	Resolving Power
D1	0.75–1.09	R ≈ 39
D2	1.10–1.62	R ≈ 41
D3	1.63–2.41	R ≈ 41
D4	2.42–3.82	R ≈ 35
D5	3.83–4.41	R ≈ 112
D6	4.42–5.00	R ≈ 128

To query only specific bands:

config = BatchConfig(
    ...,
    bands=["D3", "D4"],  # Only near-infrared
)

Setting bands=None (default) queries all 6 bands.

MJD Range

Filter observations by Modified Julian Date to restrict to a specific time window:

config = BatchConfig(
    ...,
    mjd_range=(60800, 61000),  # ~200 days
)

This is applied as a post-query filter. Use it to limit the number of downloaded images when the region has extensive temporal coverage.

Pipeline Stages

The batch pipeline has four stages: Query → Download → Extract → Aggregate.

Query

Queries the IRSA TAP service using ADQL spatial predicates. The search region is defined by a circle (center RA/Dec + radius). Results include download URLs, observation IDs, band information, and time stamps.

Download

Downloads full-frame FITS images from IRSA. Uses the same parallel download engine as the single-source pipeline, but without cutout parameters.

Extract

For each image, the extraction stage:

Reads the MEF file once (IMAGE, FLAGS, VARIANCE, ZODI extensions)
Projects all catalog sources onto the image via batch WCS transformation
Filters to sources within the field of view
Extracts aperture photometry for each in-FOV source using pre-computed shared arrays:
- Background quality mask (combined bitmask)
- Error array (sqrt of variance)
- Pixel scale
Writes per-image CSV files

Aggregate

Combines per-image CSVs into per-source light curves using hash-partitioned bucket aggregation:

Partition all per-image rows into hash buckets by target_id
Sort each bucket by (target_id, mjd)
Write one CSV per source

This approach avoids loading the entire dataset into memory.

Output Structure

batch_output/
├── images/                              # Downloaded full-frame FITS files
│   ├── level2_2025W25_1B_0263_4D3_*.fits
│   └── ...
├── per_image/                           # Per-image photometry CSVs
│   ├── level2_2025W25_1B_0263_4D3_*_photometry.csv
│   └── ...
├── lightcurves/                         # Per-source light curves
│   ├── 39633458707826492.csv
│   ├── 39633451346821630.csv
│   └── ...
└── query_summary.yaml                   # Query metadata (region, bands, observations)

Query Summary YAML

After run_query(), a query_summary.yaml is saved to the output directory with the query metadata:

query_time: "2026-05-18T14:30:00"
region:
  center_ra: 270.0
  center_dec: 66.6
  radius_deg: 0.3
  coverage_mode: full
filters:
  bands: [D3]
  mjd_range: [60791.0, 60793.0]
n_observations: 18
band_counts: {D3: 18}
time_span_days: 1.4
observations:
  - obs_id: "2025W17_4B_0277_1"
    band: D3
    mjd: 60791.575318
    wavelength_um: 2.0150
    download_url: "https://..."

Load it programmatically:

from spxquery.batch import load_query_summary

summary = load_query_summary("batch_output/")
print(f"Found {summary['n_observations']} observations across {summary['band_counts']}")

Per-Image CSV Columns

Each per-image CSV contains photometry for all in-FOV sources from one observation:

Column	Unit	Description
`target_id`	—	Source identifier from catalog
`ra`, `dec`	deg	Source coordinates
`obs_id`	—	Observation ID
`band`	—	Detector band (D1–D6)
`mjd`	days	Modified Julian Date
`x`, `y`	pixels	Pixel coordinates on image
`flux`	μJy	Background-subtracted flux
`flux_error`	μJy	Flux uncertainty
`mag_ab`	mag	AB magnitude
`mag_ab_error`	mag	Magnitude uncertainty
`wavelength`	μm	Central wavelength
`bandwidth`	μm	Bandpass width
`flag`	—	Combined pixel flags (bitwise OR)
`bg_level`	uJy/arcsec²	Estimated background per pixel
`bg_error`	uJy/arcsec²	Background uncertainty
`aperture_radius`	pixels	Aperture radius used
`filename`	—	Source FITS filename

Light Curve CSV Columns

Each light curve CSV contains all observations for one source across all images:

obs_id,band,mjd,x,y,flux,flux_error,mag_ab,mag_ab_error,wavelength,bandwidth,flag,bg_level,bg_error,aperture_radius

Step-by-Step API

For more control over individual stages:

from pathlib import Path
from spxquery.batch import BatchPipeline, BatchConfig
from spxquery.core.config import PhotometryConfig

config = BatchConfig(
    center_ra=270.0,
    center_dec=66.56,
    radius=1.0,
    catalog_path=Path("sources.csv"),
    bands=["D3"],
    coverage_mode="full",
    output_dir=Path("batch_output"),
)

pipeline = BatchPipeline(config)

# Run stages individually
pipeline.run_query()       # TAP query → observations list
pipeline.run_download()    # Parallel download → images/
pipeline.run_extract()     # Multi-source extraction → per_image/
pipeline.run_aggregate()   # Bucket aggregation → lightcurves/

# Or run all at once
pipeline.run_all()

Incremental Execution

The extract stage supports incremental processing — if a per-image CSV already exists, that image is skipped:

# First run: processes all images
pipeline.run_extract()

# Later: only new images are processed
pipeline.run_extract()  # skip_existing=True by default

Performance

The batch extraction is optimized for processing many sources across many images:

Pre-computed shared arrays: Error map, background quality mask, and pixel scale are computed once per image (not per source)
Local cutout photometry: Aperture photometry operates on small cutouts instead of full 2040×2040 images
Batch WCS projection: All source coordinates are projected in a single WCS call
Combined bitmask flag filtering: Single bitwise operation replaces per-flag loops
Bucket-based aggregation: Memory-efficient aggregation via hash partitioning

Typical performance (single-threaded, single image):

Sources in FOV	Time per image
5	~110 ms
17	~85 ms
34	~85 ms

I/O (reading FITS from disk) dominates the per-image cost. With 12 parallel workers, throughput scales near-linearly for I/O-unbound cases.