Skip to content

Converting to GeoParquet

The convert command transforms vector formats into optimized GeoParquet files with all best practices applied automatically.

Basic Usage

gpio convert input.shp output.parquet

Automatically applies: - ZSTD compression (level 15) - 100,000 row groups - Bbox column with proper metadata - Hilbert spatial ordering - GeoParquet 1.1.0 metadata

Supported Input Formats

Auto-detected by file extension:

  • Shapefile (.shp)
  • GeoJSON (.geojson, .json)
  • GeoPackage (.gpkg)
  • File Geodatabase (.gdb)
  • CSV/TSV (.csv, .tsv, .txt) - See CSV/TSV Support below

Any format supported by DuckDB's spatial extension can be read.

Remote Files

Read from cloud storage or HTTPS:

# Convert remote file
gpio convert https://example.com/data.geojson local.parquet

# Convert from S3
gpio convert s3://bucket/input.parquet local-optimized.parquet

See Remote Files Guide for authentication setup.

Options

Skip Hilbert Ordering

For faster conversion when spatial ordering isn't critical:

gpio convert large.gpkg output.parquet --skip-hilbert

Trade-off: Faster conversion but less optimal for spatial queries.

Custom Compression

Control compression type and level:

# GZIP compression
gpio convert input.shp output.parquet --compression GZIP --compression-level 6

# Uncompressed (not recommended)
gpio convert input.geojson output.parquet --compression UNCOMPRESSED

Available compression types: - ZSTD (default, level 15) - Best compression + speed balance - GZIP (level 1-9) - Wide compatibility - BROTLI (level 1-11) - High compression - LZ4 - Fastest decompression - SNAPPY - Fast compression - UNCOMPRESSED - No compression

Verbose Output

Track progress and see detailed information:

gpio convert input.gpkg output.parquet --verbose

Shows: - Geometry column detection - Dataset bounds calculation - Bbox column creation - Hilbert ordering progress - File size and validation

Examples

Basic Shapefile Conversion

gpio convert buildings.shp buildings.parquet

Output:

Converting buildings.shp...
Done in 2.3s
Output: buildings.parquet (4.2 MB)
✓ Output passes GeoParquet validation

Large Dataset Without Hilbert

gpio convert large_dataset.gpkg output.parquet --skip-hilbert

Skips Hilbert ordering for faster processing on large files.

Custom Compression Settings

gpio convert roads.geojson roads.parquet \
  --compression ZSTD \
  --compression-level 22 \
  --verbose

Maximum ZSTD compression with progress tracking.

Convert and Inspect

# Convert
gpio convert input.shp output.parquet

# Verify
gpio inspect output.parquet

# Validate
gpio check all output.parquet

CSV/TSV Support

Auto-detects geometry columns. WKT columns (wkt, geometry, geom) checked first, then lat/lon pairs (lat/lon, latitude/longitude).

# Auto-detect WKT or lat/lon
gpio convert points.csv points.parquet

# Explicit columns
gpio convert data.csv out.parquet --wkt-column geom_wkt
gpio convert data.csv out.parquet --lat-column lat --lon-column lng

# Custom delimiter
gpio convert data.txt out.parquet --delimiter "|"

CRS and Validation

Default: WGS84 (EPSG:4326). Override with --crs for WKT data:

gpio convert projected.csv out.parquet --crs EPSG:3857

Validates lat/lon ranges (-90 to 90, -180 to 180). Warns on large coordinates suggesting projected CRS.

Invalid Geometries

Fails on invalid WKT by default. Skip with --skip-invalid:

gpio convert messy.csv out.parquet --skip-invalid

Skips invalid rows, disables Hilbert ordering. Mixed geometry types supported.

Delimiters

Auto-detects comma and tab. Override with --delimiter for semicolon, pipe, or any single character.

gpio convert data.csv out.parquet --delimiter ";"

Performance

The convert command uses DuckDB's spatial extension - the fastest option for GeoParquet conversion, especially for large files.

Benchmarks on representative datasets:

Dataset Size Features DuckDB PyOGRIO ogr2ogr Fiona
GAUL L2 Shapefile 739 MB 45k 4.6s 5.9s 4.1s 187s
Argentina Roads 1.1 GB 3.5M 30s 66s 117s 349s

DuckDB also uses significantly less memory than alternatives (near-zero vs 600MB-2GB for GeoPandas).

To run your own benchmarks:

gpio benchmark input.geojson --iterations 3

See gpio benchmark for details.

See Also