Uploading to Cloud Storage¶

The upload command uploads GeoParquet files to cloud object storage (S3, GCS, Azure) with parallel transfers and progress tracking.

Basic Usage¶

# Single file to S3
gpio upload input.parquet s3://bucket/path/output.parquet --profile my-profile

# Directory to S3
gpio upload data/ s3://bucket/dataset/ --profile my-profile

Supported Destinations¶

Provider support via URL scheme:

AWS S3 - s3://bucket/path/
Google Cloud Storage - gs://bucket/path/
Azure Blob Storage - az://account/container/path/
HTTP stores - https://...

Authentication¶

AWS S3¶

Use AWS profiles configured in ~/.aws/credentials:

gpio upload data.parquet s3://bucket/file.parquet --profile my-profile

Profile credentials are automatically loaded from AWS CLI configuration.

Google Cloud Storage¶

Uses application default credentials. Set up with:

gcloud auth application-default login

Azure Blob Storage¶

Uses Azure CLI credentials. Set up with:

az login

Options¶

Pattern Filtering¶

Upload only specific file types:

# Only JSON files
gpio upload data/ s3://bucket/dataset/ --pattern "*.json"

# Only Parquet files
gpio upload data/ s3://bucket/dataset/ --pattern "*.parquet"

Parallel Uploads¶

Control concurrency for directory uploads:

# Upload 8 files in parallel (default: 4)
gpio upload data/ s3://bucket/dataset/ --max-files 8

Trade-off: Higher parallelism = faster uploads but more bandwidth/memory usage.

Chunk Concurrency¶

Control concurrent chunks within each file:

# More concurrent chunks per file (default: 12)
gpio upload large.parquet s3://bucket/file.parquet --chunk-concurrency 20

Custom Chunk Size¶

Override default multipart upload chunk size:

# 10MB chunks instead of default 5MB
gpio upload data.parquet s3://bucket/file.parquet --chunk-size 10485760

Error Handling¶

By default, continues uploading remaining files if one fails:

# Stop immediately on first error
gpio upload data/ s3://bucket/dataset/ --fail-fast

Dry Run¶

Preview what would be uploaded without actually uploading:

gpio upload data/ s3://bucket/dataset/ --dry-run

Shows: - Files that would be uploaded - Total size - Destination paths - AWS profile (if specified)

Directory Structure¶

When uploading directories, the structure is preserved:

# Input structure:
data/
  ├── region1/
  │   ├── file1.parquet
  │   └── file2.parquet
  └── region2/
      └── file3.parquet

# After upload to s3://bucket/dataset/:
s3://bucket/dataset/region1/file1.parquet
s3://bucket/dataset/region1/file2.parquet
s3://bucket/dataset/region2/file3.parquet