Skip to content

geoparquet-io

Tests Python Version License Code style: ruff

Fast I/O and transformation tools for GeoParquet files using PyArrow and DuckDB.

Features

  • Fast: Built on PyArrow and DuckDB for high-performance operations
  • Comprehensive: Sort, partition, enhance, validate, and upload GeoParquet files
  • Cloud-Native: Upload to S3, GCS, and Azure with parallel transfers
  • Spatial Indexing: Add bbox, H3 hexagonal cells, KD-tree partitions, and admin divisions
  • Best Practices: Automatic optimization following GeoParquet 1.1 spec
  • Flexible: CLI and Python API for any workflow
  • Tested: Extensive test suite across Python 3.9-3.13 and all platforms

Quick Example

# Install
pip install geoparquet-io

# Convert Shapefile/GeoJSON/GeoPackage/CSV to optimized GeoParquet
gpio convert input.shp output.parquet

# Inspect file structure and metadata
gpio inspect myfile.parquet

# Check file quality and best practices
gpio check all myfile.parquet

# Add bounding box column for faster queries
gpio add bbox input.parquet output.parquet

# Sort using Hilbert curve for spatial locality
gpio sort hilbert input.parquet output_sorted.parquet

# Partition into separate files by country
gpio partition admin buildings.parquet output_dir/

Why geoparquet-io?

GeoParquet is a cloud-native geospatial data format that combines the efficiency of Parquet with geospatial capabilities. This toolkit helps you:

  • Optimize file layout for cloud-native access patterns
  • Add spatial indices for faster queries and analysis
  • Validate compliance with GeoParquet best practices
  • Transform large datasets efficiently using columnar operations

Getting Started

New to geoparquet-io? Start here:

Command Reference

  • convert - Convert vector formats to optimized GeoParquet
  • inspect - Examine file metadata and preview data
  • check - Validate files and fix issues automatically
  • sort - Spatially sort using Hilbert curves
  • add - Enhance files with spatial indices
  • partition - Split files into optimized partitions
  • upload - Upload files to cloud storage (S3, GCS, Azure)

Support

License

Apache 2.0 - See LICENSE for details.