STAC Generation¶
Generate STAC (SpatioTemporal Asset Catalog) metadata for GeoParquet datasets.
What is STAC?¶
STAC is a specification for describing geospatial data with standardized metadata. It enables dataset discovery and cataloging on platforms and catalogs.
Single File → STAC Item¶
Generate a STAC Item JSON for a single GeoParquet file:
gpio stac roads.parquet roads.json \
--bucket s3://source.coop/my-org/roads/
Creates roads.json with:
- Bounding box from data
- GeoParquet asset link
- PMTiles overview (if
overview.pmtilesexists) - Projection information (CRS, geometry types)
Partitioned Dataset → STAC Collection¶
Generate Collection + Items for partitioned datasets:
gpio stac partitioned/ . \
--bucket s3://source.coop/my-org/roads/
Creates:
collection.json- Overall dataset metadata in output directorypartitioned/usa.json,can.json, etc. - Per-partition Items co-located with data
STAC Best Practice: Items are written alongside their parquet files, not in a separate directory. This follows the STAC principle of co-locating metadata with data for better organization and discoverability.
Public URL Mapping¶
Convert S3 URIs to public HTTPS URLs:
gpio stac data.parquet output.json \
--bucket s3://my-bucket/roads/ \
--public-url https://data.example.com/roads/
Use --public-url to map S3 bucket prefixes to public HTTPS URLs for your assets.
PMTiles Overviews¶
STAC automatically detects PMTiles overview files for map visualization.
Detection rules:
- Exactly 1
.pmtilesfile in directory → included as asset - 0 files → warning, continue without overview
-
1 files → error, clean up duplicates
Create PMTiles overview:
Use tippecanoe to create PMTiles from your vector data.
Standard naming: Use overview.pmtiles for consistency.
Overwriting Existing STAC Files¶
If the output location already contains a valid STAC Collection or Item, the command will error to prevent accidental overwrites:
# Error if output already exists
gpio stac data.parquet output.json --bucket s3://...
# Use --overwrite to allow overwriting
gpio stac data.parquet output.json --bucket s3://... --overwrite
Note: The command will error if the input is a pure STAC file (no parquet files). If the input directory contains both STAC files and parquet files, it will generate from the parquet files.
Validation¶
Check STAC compliance:
gpio check stac output.json
Validates:
- STAC spec compliance
- Required fields
- Asset href resolution (local files)
- Best practices
End-to-End Workflow¶
# 1. Convert to optimized GeoParquet
gpio convert roads.geojson roads.parquet
# 2. Partition by country
gpio partition admin roads.parquet partitioned/ \
--dataset gaul --levels country
# 3. Create PMTiles overview (optional, see https://github.com/felt/tippecanoe)
# 4. Generate STAC collection
# Items written next to parquet files, collection.json in partitioned/
gpio stac partitioned/ partitioned/ \
--bucket s3://my-bucket/roads/ \
--public-url https://data.example.com/roads/
# 5. Validate
gpio check stac partitioned/collection.json
# 6. Upload to S3 (external)
# Single sync uploads both data and metadata together
aws s3 sync partitioned/ s3://my-bucket/roads/
Directory structure after step 4:
partitioned/
├── collection.json # Collection metadata
├── overview.pmtiles # Optional overview
├── usa.parquet
├── usa.json # STAC Item for USA
├── can.parquet
├── can.json # STAC Item for Canada
└── ...
Options¶
Custom IDs¶
# Custom Item ID
gpio stac data.parquet output.json \
--item-id my-roads \
--bucket s3://...
# Custom Collection ID
gpio stac partitions/ output/ \
--collection-id global-roads \
--bucket s3://...
Verbose Output¶
gpio stac data.parquet output.json \
--bucket s3://... \
--verbose
Metadata Extracted¶
STAC Items automatically include:
- Bounding box - Calculated from geometry data
- Geometry - GeoJSON Polygon from dataset extent
- CRS - From GeoParquet metadata (EPSG code, PROJJSON, or WKT)
- Geometry types - From GeoParquet metadata
- Datetime - From file modification time
- Assets - GeoParquet file and PMTiles overview (if present)
- Links - Self link, and collection link (for items in collections)
Best Practices¶
- Co-locate metadata with data - Items are automatically written alongside parquet files
- Use consistent naming -
overview.pmtilesfor PMTiles files - Validate before publishing - Run
gpio check stacbefore upload - Include PMTiles - Enables interactive map visualization
- Use public URLs - Map S3 URIs to HTTPS with
--public-urlfor web access - Custom IDs - Use meaningful IDs for better discoverability
- Single directory uploads - With co-located metadata, upload data and STAC files together