# Batch Photometry The batch photometry module (`spxquery.batch`) enables multi-source aperture photometry over a sky region. Unlike the single-source pipeline which downloads small cutouts around one target, the batch module queries for **full-frame images** covering a circular region, then extracts photometry for **all catalog sources** in each image simultaneously. ## When to Use Batch vs. Single-Source | Feature | Single-Source (`SPXQueryPipeline`) | Batch (`spxquery.batch`) | |---------|-------------------------------------|--------------------------| | Targets | One source per run | Multiple sources from catalog CSV | | Downloads | Cutout images (~100 KB each) | Full-frame images (~70 MB each) | | Query | Point search (CONESearch) | Region search (CIRCLE + INTERSECTS) | | Output | Per-source light curve + plot | Per-source light curves (CSV only) | | Use case | Detailed analysis of one object | Survey of many objects in a region | ## Quick Start ```python from spxquery.batch import run_batch run_batch( catalog="sources.csv", center_ra=270.0, center_dec=66.56, radius=1.0, bands=["D3", "D4"], ) ``` This single call will: 1. Query the IRSA TAP service for full-frame images covering a 1° radius circle 2. Download matching images in D3 and D4 bands 3. Extract aperture photometry for all catalog sources in each image 4. Aggregate per-image results into per-source light curves ## Catalog Format The source catalog must be a CSV file with columns `targetid`, `ra`, `dec`: ```csv targetid,ra,dec 39633458707826492,265.623,66.531 39633451346821630,266.445,65.636 39633453829850190,266.794,65.983 ``` Additional columns (flux, redshift, etc.) are ignored. Coordinates must be in degrees (ICRS). ## Configuration ```python from pathlib import Path from spxquery.batch import BatchConfig, run_batch from spxquery.core.config import PhotometryConfig config = BatchConfig( # Sky region center_ra=270.0, center_dec=66.56, radius=1.0, catalog_path=Path("sources.csv"), # Query filters coverage_mode="any", bands=["D3", "D4"], mjd_range=(60800, 61000), # Safety max_images=500, # Output output_dir=Path("batch_output"), # Parallelism max_download_workers=4, max_extract_workers=12, # Photometry parameters (forwarded to extraction) photometry=PhotometryConfig( aperture_method="fwhm", fwhm_multiplier=2.5, background_method="window", window_size=30, subtract_zodi=True, ), ) run_batch(config) ``` ### Configuration Parameters #### Region and Query | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `center_ra` | float | required | Region center RA in degrees (0–360) | | `center_dec` | float | required | Region center Dec in degrees (−90 to +90) | | `radius` | float | required | Search radius in degrees | | `catalog_path` | Path | required | CSV file with `targetid`, `ra`, `dec` columns | | `coverage_mode` | str | `"any"` | `"any"` = image overlaps region; `"full"` = image fully contains region | | `bands` | list[str] | `None` | Filter by band, e.g. `["D1", "D3"]`. `None` = all bands | | `mjd_range` | tuple | `None` | Time filter as `(mjd_min, mjd_max)`. `None` = no filter | | `max_images` | int | 500 | Raise error if query returns more images than this | #### Processing | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `max_download_workers` | int | 4 | Parallel download threads | | `max_extract_workers` | int | 12 | Parallel extraction processes | | `output_dir` | Path | `"batch_output"` | Root directory for all outputs | | `num_buckets` | int | 64 | Hash-partition buckets for aggregation | | `keep_bucket_files` | bool | `False` | Keep temporary bucket CSVs after aggregation | | `photometry` | PhotometryConfig | defaults | Photometry parameters (see {doc}`parameters`) | ## Coverage Modes The coverage mode controls which images are selected from the archive: ### `any` (INTERSECTS) Selects images whose footprint **overlaps** with the search circle. This is the most inclusive mode — it returns all images that touch the region, even if only a small corner overlaps. ``` Query: INTERSECTS(p.poly, CIRCLE('ICRS', ra, dec, radius)) = 1 ``` Use when you want maximum coverage and don't mind some images only partially covering your region. ### `full` (CONTAINS) Selects images that **fully contain** the search circle. This ensures every returned image covers the entire region, so all catalog sources are present in every image. ``` Query: CONTAINS(CIRCLE('ICRS', ra, dec, radius), p.poly) = 1 ``` Use when you need complete source coverage across all images (e.g., for consistent light curves). Since SPHEREx full-frame images are ~3.7° across, a `full` search with radius < 1° will return a smaller but more complete subset. ## Band and Time Filtering ### Band Selection SPHEREx has 6 detectors covering different wavelength ranges: | Band | Wavelength (μm) | Resolving Power | |------|-----------------|-----------------| | D1 | 0.75–1.09 | R ≈ 39 | | D2 | 1.10–1.62 | R ≈ 41 | | D3 | 1.63–2.41 | R ≈ 41 | | D4 | 2.42–3.82 | R ≈ 35 | | D5 | 3.83–4.41 | R ≈ 112 | | D6 | 4.42–5.00 | R ≈ 128 | To query only specific bands: ```python config = BatchConfig( ..., bands=["D3", "D4"], # Only near-infrared ) ``` Setting `bands=None` (default) queries all 6 bands. ### MJD Range Filter observations by Modified Julian Date to restrict to a specific time window: ```python config = BatchConfig( ..., mjd_range=(60800, 61000), # ~200 days ) ``` This is applied as a post-query filter. Use it to limit the number of downloaded images when the region has extensive temporal coverage. ## Pipeline Stages The batch pipeline has four stages: **Query → Download → Extract → Aggregate**. ### Query Queries the IRSA TAP service using ADQL spatial predicates. The search region is defined by a circle (center RA/Dec + radius). Results include download URLs, observation IDs, band information, and time stamps. ### Download Downloads full-frame FITS images from IRSA. Uses the same parallel download engine as the single-source pipeline, but without cutout parameters. ### Extract For each image, the extraction stage: 1. Reads the MEF file once (IMAGE, FLAGS, VARIANCE, ZODI extensions) 2. Projects all catalog sources onto the image via batch WCS transformation 3. Filters to sources within the field of view 4. Extracts aperture photometry for each in-FOV source using pre-computed shared arrays: - Background quality mask (combined bitmask) - Error array (sqrt of variance) - Pixel scale 5. Writes per-image CSV files ### Aggregate Combines per-image CSVs into per-source light curves using hash-partitioned bucket aggregation: 1. Partition all per-image rows into hash buckets by `target_id` 2. Sort each bucket by `(target_id, mjd)` 3. Write one CSV per source This approach avoids loading the entire dataset into memory. ## Output Structure ``` batch_output/ ├── images/ # Downloaded full-frame FITS files │ ├── level2_2025W25_1B_0263_4D3_*.fits │ └── ... ├── per_image/ # Per-image photometry CSVs │ ├── level2_2025W25_1B_0263_4D3_*_photometry.csv │ └── ... ├── lightcurves/ # Per-source light curves │ ├── 39633458707826492.csv │ ├── 39633451346821630.csv │ └── ... └── query_summary.yaml # Query metadata (region, bands, observations) ``` ### Query Summary YAML After `run_query()`, a `query_summary.yaml` is saved to the output directory with the query metadata: ```yaml query_time: "2026-05-18T14:30:00" region: center_ra: 270.0 center_dec: 66.6 radius_deg: 0.3 coverage_mode: full filters: bands: [D3] mjd_range: [60791.0, 60793.0] n_observations: 18 band_counts: {D3: 18} time_span_days: 1.4 observations: - obs_id: "2025W17_4B_0277_1" band: D3 mjd: 60791.575318 wavelength_um: 2.0150 download_url: "https://..." ``` Load it programmatically: ```python from spxquery.batch import load_query_summary summary = load_query_summary("batch_output/") print(f"Found {summary['n_observations']} observations across {summary['band_counts']}") ``` ### Per-Image CSV Columns Each per-image CSV contains photometry for all in-FOV sources from one observation: | Column | Unit | Description | |--------|------|-------------| | `target_id` | — | Source identifier from catalog | | `ra`, `dec` | deg | Source coordinates | | `obs_id` | — | Observation ID | | `band` | — | Detector band (D1–D6) | | `mjd` | days | Modified Julian Date | | `x`, `y` | pixels | Pixel coordinates on image | | `flux` | μJy | Background-subtracted flux | | `flux_error` | μJy | Flux uncertainty | | `mag_ab` | mag | AB magnitude | | `mag_ab_error` | mag | Magnitude uncertainty | | `wavelength` | μm | Central wavelength | | `bandwidth` | μm | Bandpass width | | `flag` | — | Combined pixel flags (bitwise OR) | | `bg_level` | uJy/arcsec² | Estimated background per pixel | | `bg_error` | uJy/arcsec² | Background uncertainty | | `aperture_radius` | pixels | Aperture radius used | | `filename` | — | Source FITS filename | ### Light Curve CSV Columns Each light curve CSV contains all observations for one source across all images: ``` obs_id,band,mjd,x,y,flux,flux_error,mag_ab,mag_ab_error,wavelength,bandwidth,flag,bg_level,bg_error,aperture_radius ``` ## Step-by-Step API For more control over individual stages: ```python from pathlib import Path from spxquery.batch import BatchPipeline, BatchConfig from spxquery.core.config import PhotometryConfig config = BatchConfig( center_ra=270.0, center_dec=66.56, radius=1.0, catalog_path=Path("sources.csv"), bands=["D3"], coverage_mode="full", output_dir=Path("batch_output"), ) pipeline = BatchPipeline(config) # Run stages individually pipeline.run_query() # TAP query → observations list pipeline.run_download() # Parallel download → images/ pipeline.run_extract() # Multi-source extraction → per_image/ pipeline.run_aggregate() # Bucket aggregation → lightcurves/ # Or run all at once pipeline.run_all() ``` ### Incremental Execution The extract stage supports incremental processing — if a per-image CSV already exists, that image is skipped: ```python # First run: processes all images pipeline.run_extract() # Later: only new images are processed pipeline.run_extract() # skip_existing=True by default ``` ## Performance The batch extraction is optimized for processing many sources across many images: - **Pre-computed shared arrays**: Error map, background quality mask, and pixel scale are computed once per image (not per source) - **Local cutout photometry**: Aperture photometry operates on small cutouts instead of full 2040×2040 images - **Batch WCS projection**: All source coordinates are projected in a single WCS call - **Combined bitmask flag filtering**: Single bitwise operation replaces per-flag loops - **Bucket-based aggregation**: Memory-efficient aggregation via hash partitioning Typical performance (single-threaded, single image): | Sources in FOV | Time per image | |----------------|---------------| | 5 | ~110 ms | | 17 | ~85 ms | | 34 | ~85 ms | I/O (reading FITS from disk) dominates the per-image cost. With 12 parallel workers, throughput scales near-linearly for I/O-unbound cases.