Batch Photometry
The batch photometry module (spxquery.batch) enables multi-source aperture photometry over a sky region. Unlike the single-source pipeline which downloads small cutouts around one target, the batch module queries for full-frame images covering a circular region, then extracts photometry for all catalog sources in each image simultaneously.
When to Use Batch vs. Single-Source
Feature |
Single-Source ( |
Batch ( |
|---|---|---|
Targets |
One source per run |
Multiple sources from catalog CSV |
Downloads |
Cutout images (~100 KB each) |
Full-frame images (~70 MB each) |
Query |
Point search (CONESearch) |
Region search (CIRCLE + INTERSECTS) |
Output |
Per-source light curve + plot |
Per-source light curves (CSV only) |
Use case |
Detailed analysis of one object |
Survey of many objects in a region |
Quick Start
from spxquery.batch import run_batch
run_batch(
catalog="sources.csv",
center_ra=270.0,
center_dec=66.56,
radius=1.0,
bands=["D3", "D4"],
)
This single call will:
Query the IRSA TAP service for full-frame images covering a 1° radius circle
Download matching images in D3 and D4 bands
Extract aperture photometry for all catalog sources in each image
Aggregate per-image results into per-source light curves
Catalog Format
The source catalog must be a CSV file with columns targetid, ra, dec:
targetid,ra,dec
39633458707826492,265.623,66.531
39633451346821630,266.445,65.636
39633453829850190,266.794,65.983
Additional columns (flux, redshift, etc.) are ignored. Coordinates must be in degrees (ICRS).
Configuration
from pathlib import Path
from spxquery.batch import BatchConfig, run_batch
from spxquery.core.config import PhotometryConfig
config = BatchConfig(
# Sky region
center_ra=270.0,
center_dec=66.56,
radius=1.0,
catalog_path=Path("sources.csv"),
# Query filters
coverage_mode="any",
bands=["D3", "D4"],
mjd_range=(60800, 61000),
# Safety
max_images=500,
# Output
output_dir=Path("batch_output"),
# Parallelism
max_download_workers=4,
max_extract_workers=12,
# Photometry parameters (forwarded to extraction)
photometry=PhotometryConfig(
aperture_method="fwhm",
fwhm_multiplier=2.5,
background_method="window",
window_size=30,
subtract_zodi=True,
),
)
run_batch(config)
Configuration Parameters
Region and Query
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
float |
required |
Region center RA in degrees (0–360) |
|
float |
required |
Region center Dec in degrees (−90 to +90) |
|
float |
required |
Search radius in degrees |
|
Path |
required |
CSV file with |
|
str |
|
|
|
list[str] |
|
Filter by band, e.g. |
|
tuple |
|
Time filter as |
|
int |
500 |
Raise error if query returns more images than this |
Processing
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
int |
4 |
Parallel download threads |
|
int |
12 |
Parallel extraction processes |
|
Path |
|
Root directory for all outputs |
|
int |
64 |
Hash-partition buckets for aggregation |
|
bool |
|
Keep temporary bucket CSVs after aggregation |
|
PhotometryConfig |
defaults |
Photometry parameters (see Parameter Configuration) |
Coverage Modes
The coverage mode controls which images are selected from the archive:
any (INTERSECTS)
Selects images whose footprint overlaps with the search circle. This is the most inclusive mode — it returns all images that touch the region, even if only a small corner overlaps.
Query: INTERSECTS(p.poly, CIRCLE('ICRS', ra, dec, radius)) = 1
Use when you want maximum coverage and don’t mind some images only partially covering your region.
full (CONTAINS)
Selects images that fully contain the search circle. This ensures every returned image covers the entire region, so all catalog sources are present in every image.
Query: CONTAINS(CIRCLE('ICRS', ra, dec, radius), p.poly) = 1
Use when you need complete source coverage across all images (e.g., for consistent light curves). Since SPHEREx full-frame images are ~3.7° across, a full search with radius < 1° will return a smaller but more complete subset.
Band and Time Filtering
Band Selection
SPHEREx has 6 detectors covering different wavelength ranges:
Band |
Wavelength (μm) |
Resolving Power |
|---|---|---|
D1 |
0.75–1.09 |
R ≈ 39 |
D2 |
1.10–1.62 |
R ≈ 41 |
D3 |
1.63–2.41 |
R ≈ 41 |
D4 |
2.42–3.82 |
R ≈ 35 |
D5 |
3.83–4.41 |
R ≈ 112 |
D6 |
4.42–5.00 |
R ≈ 128 |
To query only specific bands:
config = BatchConfig(
...,
bands=["D3", "D4"], # Only near-infrared
)
Setting bands=None (default) queries all 6 bands.
MJD Range
Filter observations by Modified Julian Date to restrict to a specific time window:
config = BatchConfig(
...,
mjd_range=(60800, 61000), # ~200 days
)
This is applied as a post-query filter. Use it to limit the number of downloaded images when the region has extensive temporal coverage.
Pipeline Stages
The batch pipeline has four stages: Query → Download → Extract → Aggregate.
Query
Queries the IRSA TAP service using ADQL spatial predicates. The search region is defined by a circle (center RA/Dec + radius). Results include download URLs, observation IDs, band information, and time stamps.
Download
Downloads full-frame FITS images from IRSA. Uses the same parallel download engine as the single-source pipeline, but without cutout parameters.
Extract
For each image, the extraction stage:
Reads the MEF file once (IMAGE, FLAGS, VARIANCE, ZODI extensions)
Projects all catalog sources onto the image via batch WCS transformation
Filters to sources within the field of view
Extracts aperture photometry for each in-FOV source using pre-computed shared arrays:
Background quality mask (combined bitmask)
Error array (sqrt of variance)
Pixel scale
Writes per-image CSV files
Aggregate
Combines per-image CSVs into per-source light curves using hash-partitioned bucket aggregation:
Partition all per-image rows into hash buckets by
target_idSort each bucket by
(target_id, mjd)Write one CSV per source
This approach avoids loading the entire dataset into memory.
Output Structure
batch_output/
├── images/ # Downloaded full-frame FITS files
│ ├── level2_2025W25_1B_0263_4D3_*.fits
│ └── ...
├── per_image/ # Per-image photometry CSVs
│ ├── level2_2025W25_1B_0263_4D3_*_photometry.csv
│ └── ...
├── lightcurves/ # Per-source light curves
│ ├── 39633458707826492.csv
│ ├── 39633451346821630.csv
│ └── ...
└── query_summary.yaml # Query metadata (region, bands, observations)
Query Summary YAML
After run_query(), a query_summary.yaml is saved to the output directory with the query metadata:
query_time: "2026-05-18T14:30:00"
region:
center_ra: 270.0
center_dec: 66.6
radius_deg: 0.3
coverage_mode: full
filters:
bands: [D3]
mjd_range: [60791.0, 60793.0]
n_observations: 18
band_counts: {D3: 18}
time_span_days: 1.4
observations:
- obs_id: "2025W17_4B_0277_1"
band: D3
mjd: 60791.575318
wavelength_um: 2.0150
download_url: "https://..."
Load it programmatically:
from spxquery.batch import load_query_summary
summary = load_query_summary("batch_output/")
print(f"Found {summary['n_observations']} observations across {summary['band_counts']}")
Per-Image CSV Columns
Each per-image CSV contains photometry for all in-FOV sources from one observation:
Column |
Unit |
Description |
|---|---|---|
|
— |
Source identifier from catalog |
|
deg |
Source coordinates |
|
— |
Observation ID |
|
— |
Detector band (D1–D6) |
|
days |
Modified Julian Date |
|
pixels |
Pixel coordinates on image |
|
μJy |
Background-subtracted flux |
|
μJy |
Flux uncertainty |
|
mag |
AB magnitude |
|
mag |
Magnitude uncertainty |
|
μm |
Central wavelength |
|
μm |
Bandpass width |
|
— |
Combined pixel flags (bitwise OR) |
|
uJy/arcsec² |
Estimated background per pixel |
|
uJy/arcsec² |
Background uncertainty |
|
pixels |
Aperture radius used |
|
— |
Source FITS filename |
Light Curve CSV Columns
Each light curve CSV contains all observations for one source across all images:
obs_id,band,mjd,x,y,flux,flux_error,mag_ab,mag_ab_error,wavelength,bandwidth,flag,bg_level,bg_error,aperture_radius
Step-by-Step API
For more control over individual stages:
from pathlib import Path
from spxquery.batch import BatchPipeline, BatchConfig
from spxquery.core.config import PhotometryConfig
config = BatchConfig(
center_ra=270.0,
center_dec=66.56,
radius=1.0,
catalog_path=Path("sources.csv"),
bands=["D3"],
coverage_mode="full",
output_dir=Path("batch_output"),
)
pipeline = BatchPipeline(config)
# Run stages individually
pipeline.run_query() # TAP query → observations list
pipeline.run_download() # Parallel download → images/
pipeline.run_extract() # Multi-source extraction → per_image/
pipeline.run_aggregate() # Bucket aggregation → lightcurves/
# Or run all at once
pipeline.run_all()
Incremental Execution
The extract stage supports incremental processing — if a per-image CSV already exists, that image is skipped:
# First run: processes all images
pipeline.run_extract()
# Later: only new images are processed
pipeline.run_extract() # skip_existing=True by default
Performance
The batch extraction is optimized for processing many sources across many images:
Pre-computed shared arrays: Error map, background quality mask, and pixel scale are computed once per image (not per source)
Local cutout photometry: Aperture photometry operates on small cutouts instead of full 2040×2040 images
Batch WCS projection: All source coordinates are projected in a single WCS call
Combined bitmask flag filtering: Single bitwise operation replaces per-flag loops
Bucket-based aggregation: Memory-efficient aggregation via hash partitioning
Typical performance (single-threaded, single image):
Sources in FOV |
Time per image |
|---|---|
5 |
~110 ms |
17 |
~85 ms |
34 |
~85 ms |
I/O (reading FITS from disk) dominates the per-image cost. With 12 parallel workers, throughput scales near-linearly for I/O-unbound cases.