Pipeline Architecture
SPXQuery uses a flexible, resumable pipeline architecture that processes SPHEREx data through four distinct stages.
Four-Stage Pipeline
The pipeline executes in this order:
1. Query Stage
Query the IRSA SPHEREx archive using TAP (Table Access Protocol).
What it does:
Searches for observations matching your source coordinates (RA/Dec)
Filters by spectral bands (D1-D6)
Resolves datalink URLs for each observation
Saves query results and metadata
Output:
results/query_summary.yaml- Observation metadata, time span, data size
Key features:
Automatic coordinate matching within search radius
Band filtering (query specific bands or all)
URL caching to avoid repeated datalink queries
2. Download Stage
Download FITS files from IRSA with optional cutout support.
What it does:
Downloads spectral images via HTTP
Applies cutout parameters if specified (reduces file size by 90%)
Organizes files by band (data/D1/, data/D2/, etc.)
Tracks download progress with parallel workers
Output:
data/D*/- FITS files organized by spectral bandDownload progress logging
Key features:
Parallel downloads (configurable workers, default: 4)
Skip existing files to enable resume
Retry logic with exponential backoff
Progress tracking for large datasets
3. Processing Stage
Extract aperture photometry from FITS files.
What it does:
Parses Multi-Extension FITS (MEF) structure
Extracts flux using circular aperture photometry (fixed or FWHM-based sizing)
Estimates background using annulus or window method
Subtracts zodiacal background from ZODI extension
Handles pixel flags for quality assessment
Repairs variance for flagged pixels with valid flux
Computes flux uncertainties from variance maps
Output:
results/photometry.json- Per-observation photometry resultsPhotometry metadata (aperture size, background estimation method)
Key features:
Adaptive apertures: FWHM-based sizing with PSF extraction (optional)
Dual background methods: Annulus (traditional) or window (crowded fields)
Variance repair: Automatic handling of NaN variance for flagged pixels
Zodiacal light subtraction (from ZODI extension)
Pixel flag tracking (FLAGS extension)
Spectral WCS handling for wavelength extraction
Parallel processing (configurable workers, default: 10)
4. Visualization Stage
Generate publication-quality plots with quality control.
What it does:
Creates combined spectral and temporal plots
Applies quality filtering (SNR threshold, bad pixel flags)
Marks rejected measurements with visual indicators
Generates light curve CSV file
Output:
results/combined_plot.png- Multi-panel visualizationresults/lightcurve.csv- Time-series photometry data
Key features:
Quality control: good measurements (filled circles) vs. rejected (gray crosses)
Customizable colormaps, marker sizes, and figure parameters
Respects user’s matplotlibrc settings
Optional magnitude vs. flux plotting
Pipeline Execution Modes
SPXQuery supports three execution modes:
One-Click Execution
Run all stages automatically:
from spxquery.core.pipeline import run_pipeline
run_pipeline(
ra=304.69,
dec=42.44,
output_dir="output",
cutout_size="200px"
)
Step-by-Step Execution
Run individual stages with dependency checking:
from spxquery import SPXQueryPipeline, Source, QueryConfig
source = Source(ra=304.69, dec=42.44, name="my_source")
config = QueryConfig(source=source, output_dir="output")
pipeline = SPXQueryPipeline(config)
# Run stages individually
pipeline.run_query()
pipeline.run_download(skip_existing=True)
pipeline.run_processing()
pipeline.run_visualization()
The pipeline automatically checks dependencies - you cannot run processing before completing download.
Resumable Execution
The pipeline saves state after each stage to {source_name}.yaml. Resume from interruptions:
# Load configuration from saved state
config = QueryConfig.from_saved_state(
source_name="my_source",
output_dir="output"
)
pipeline = SPXQueryPipeline(config)
pipeline.resume() # Automatically runs remaining stages
What gets saved:
Completed stages
Query results (observations, time span, data size)
Downloaded file paths
Photometry results
All configuration parameters
Stage Dependencies
The pipeline enforces these dependencies:
query: No dependencies (always runs first)
download: Requires
queryprocessing: Requires
query+downloadvisualization: Requires
query+download+processing
If you try to run a stage without its dependencies, the pipeline will raise an error.
Customizing Pipeline Stages
You can customize which stages to run:
# Only query and download, skip processing
pipeline = SPXQueryPipeline(config, pipeline_stages=["query", "download"])
pipeline.run_full_pipeline()
This is useful for:
Downloading data for later analysis
Re-running specific stages with different parameters
Integrating SPXQuery into custom workflows
State Persistence
State files ({source_name}.yaml) contain:
# Pipeline state
stage: complete
completed_stages:
- query
- download
- processing
- visualization
pipeline_stages:
- query
- download
- processing
- visualization
# Query results
query_results:
observations: [...]
time_span_days: 33.1
total_size_gb: 0.51
# Downloaded files and photometry
downloaded_files: [...]
photometry_results: [...]
This enables:
Resume after interruptions (network failures, crashes)
Audit trail of completed work
Configuration recovery (auto-load parameters from saved state)
Error Handling
The pipeline handles common errors gracefully:
Network failures: Retry logic with exponential backoff (configurable)
Missing files: Skip and continue processing remaining files
Invalid FITS: Log error and skip observation
Photometry failures: Mark as bad and continue
Interrupted execution: Resume from last completed stage
Errors are logged to help diagnose issues without stopping the entire pipeline.