STAC Generation¶

Generate STAC (SpatioTemporal Asset Catalog) metadata for GeoParquet datasets.

What is STAC?¶

STAC is a specification for describing geospatial data with standardized metadata. It enables dataset discovery and cataloging on platforms and catalogs.

Single File → STAC Item¶

Generate a STAC Item JSON for a single GeoParquet file:

gpio stac roads.parquet roads.json \
  --bucket s3://source.coop/my-org/roads/

Creates roads.json with:

Bounding box from data
GeoParquet asset link
PMTiles overview (if overview.pmtiles exists)
Projection information (CRS, geometry types)

Partitioned Dataset → STAC Collection¶

Generate Collection + Items for partitioned datasets:

gpio stac partitioned/ . \
  --bucket s3://source.coop/my-org/roads/

Creates:

collection.json - Overall dataset metadata in output directory
partitioned/usa.json, can.json, etc. - Per-partition Items co-located with data

STAC Best Practice: Items are written alongside their parquet files, not in a separate directory. This follows the STAC principle of co-locating metadata with data for better organization and discoverability.

Public URL Mapping¶

Convert S3 URIs to public HTTPS URLs:

gpio stac data.parquet output.json \
  --bucket s3://my-bucket/roads/ \
  --public-url https://data.example.com/roads/

Use --public-url to map S3 bucket prefixes to public HTTPS URLs for your assets.

PMTiles Overviews¶

STAC automatically detects PMTiles overview files for map visualization.

Detection rules:

Exactly 1 .pmtiles file in directory → included as asset
0 files → warning, continue without overview
1 files → error, clean up duplicates

Create PMTiles overview:

Use tippecanoe to create PMTiles from your vector data.

Standard naming: Use overview.pmtiles for consistency.

Overwriting Existing STAC Files¶

If the output location already contains a valid STAC Collection or Item, the command will error to prevent accidental overwrites:

# Error if output already exists
gpio stac data.parquet output.json --bucket s3://...

# Use --overwrite to allow overwriting
gpio stac data.parquet output.json --bucket s3://... --overwrite

Note: The command will error if the input is a pure STAC file (no parquet files). If the input directory contains both STAC files and parquet files, it will generate from the parquet files.

Validation¶

Check STAC compliance:

gpio check stac output.json

Validates:

STAC spec compliance
Required fields
Asset href resolution (local files)
Best practices

End-to-End Workflow¶

# 1. Convert to optimized GeoParquet
gpio convert roads.geojson roads.parquet

# 2. Partition by country
gpio partition admin roads.parquet partitioned/ \
  --dataset gaul --levels country

# 3. Create PMTiles overview (optional, see https://github.com/felt/tippecanoe)

# 4. Generate STAC collection
# Items written next to parquet files, collection.json in partitioned/
gpio stac partitioned/ partitioned/ \
  --bucket s3://my-bucket/roads/ \
  --public-url https://data.example.com/roads/

# 5. Validate
gpio check stac partitioned/collection.json

# 6. Upload to S3 (external)
# Single sync uploads both data and metadata together
aws s3 sync partitioned/ s3://my-bucket/roads/

Directory structure after step 4:

partitioned/
├── collection.json          # Collection metadata
├── overview.pmtiles         # Optional overview
├── usa.parquet
├── usa.json                 # STAC Item for USA
├── can.parquet
├── can.json                 # STAC Item for Canada
└── ...

Options¶

Custom IDs¶

# Custom Item ID
gpio stac data.parquet output.json \
  --item-id my-roads \
  --bucket s3://...

# Custom Collection ID
gpio stac partitions/ output/ \
  --collection-id global-roads \
  --bucket s3://...

Verbose Output¶

gpio stac data.parquet output.json \
  --bucket s3://... \
  --verbose

Metadata Extracted¶

STAC Items automatically include:

Bounding box - Calculated from geometry data
Geometry - GeoJSON Polygon from dataset extent
CRS - From GeoParquet metadata (EPSG code, PROJJSON, or WKT)
Geometry types - From GeoParquet metadata
Datetime - From file modification time
Assets - GeoParquet file and PMTiles overview (if present)
Links - Self link, and collection link (for items in collections)

Best Practices¶

Co-locate metadata with data - Items are automatically written alongside parquet files
Use consistent naming - overview.pmtiles for PMTiles files
Validate before publishing - Run gpio check stac before upload
Include PMTiles - Enables interactive map visualization
Use public URLs - Map S3 URIs to HTTPS with --public-url for web access
Custom IDs - Use meaningful IDs for better discoverability
Single directory uploads - With co-located metadata, upload data and STAC files together