Configuration Guide

This guide explains how to configure storage backends in genro-storage.

Overview

Storage backends are configured through the StorageManager.configure() method. You can configure storage from:

  1. Python dictionaries - for programmatic configuration

  2. YAML files - for declarative configuration

  3. JSON files - for declarative configuration

All configuration methods support the same format and backends.

Configuration Methods

From Python Dictionaries

The most flexible method for programmatic configuration:

from genro_storage import StorageManager

storage = StorageManager()
storage.configure([
    {
        'name': 'home',
        'type': 'local',
        'path': '/home/user'
    },
    {
        'name': 'uploads',
        'type': 's3',
        'bucket': 'my-app-uploads',
        'region': 'eu-west-1'
    }
])

From YAML Files

Best for environment-specific configuration files:

storage.yaml:

# Local development storage
- name: home
  type: local
  path: /home/user

- name: temp
  type: local
  path: /tmp/app

# Production S3 storage
- name: uploads
  type: s3
  bucket: prod-app-uploads
  region: eu-west-1
  prefix: uploads/

- name: backups
  type: s3
  bucket: prod-app-backups
  region: eu-west-1

Python code:

storage = StorageManager()
storage.configure('/etc/app/storage.yaml')

From JSON Files

Alternative to YAML:

storage.json:

[
    {
        "name": "home",
        "type": "local",
        "path": "/home/user"
    },
    {
        "name": "uploads",
        "type": "s3",
        "bucket": "prod-app-uploads",
        "region": "eu-west-1"
    }
]

Python code:

storage = StorageManager()
storage.configure('./config/storage.json')

Storage Backend Types

Local Filesystem

Access files on the local filesystem.

Required fields:

  • name: Mount point name

  • type: Must be "local"

  • path: Absolute path to base directory

Example:

- name: home
  type: local
  path: /home/user

- name: temp
  type: local
  path: /tmp/app
storage.configure([
    {'name': 'home', 'type': 'local', 'path': '/home/user'},
    {'name': 'temp', 'type': 'local', 'path': '/tmp/app'}
])

Usage:

node = storage.node('home:documents/report.pdf')
content = node.read()

Amazon S3

Access files in Amazon S3 buckets.

Required fields:

  • name: Mount point name

  • type: Must be "s3"

  • bucket: S3 bucket name

Optional fields:

  • prefix: Path prefix within bucket (default: "")

  • region: AWS region (default: from AWS config)

  • anon: Anonymous access (default: False)

  • key: AWS access key (default: from AWS config)

  • secret: AWS secret key (default: from AWS config)

  • endpoint_url: Custom S3 endpoint for S3-compatible services

Example:

# Standard S3
- name: uploads
  type: s3
  bucket: my-app-uploads
  region: eu-west-1
  prefix: uploads/

# With credentials
- name: backups
  type: s3
  bucket: my-app-backups
  region: us-east-1
  key: AKIAIOSFODNN7EXAMPLE
  secret: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

# Anonymous public bucket
- name: public-data
  type: s3
  bucket: public-datasets
  anon: true

Usage:

# Upload file
node = storage.node('uploads:2024/report.pdf')
node.write(pdf_data, mode='wb')

# List directory
folder = storage.node('uploads:2024')
for file in folder.children():
    print(f"{file.basename}: {file.size} bytes")

Google Cloud Storage

Access files in Google Cloud Storage buckets.

Required fields:

  • name: Mount point name

  • type: Must be "gcs"

  • bucket: GCS bucket name

Optional fields:

  • prefix: Path prefix within bucket (default: "")

  • token: Path to service account JSON key file

  • project: GCP project ID

Example:

- name: backups
  type: gcs
  bucket: my-app-backups
  project: my-gcp-project
  token: /etc/secrets/gcp-service-account.json
storage.configure([{
    'name': 'backups',
    'type': 'gcs',
    'bucket': 'my-app-backups',
    'token': '/etc/secrets/gcp-service-account.json'
}])

Azure Blob Storage

Access files in Azure Blob Storage containers.

Required fields:

  • name: Mount point name

  • type: Must be "azure"

  • container: Container name

  • account_name: Storage account name

Optional fields:

  • account_key: Storage account key

  • sas_token: Shared access signature token

  • connection_string: Full connection string

Example:

- name: archive
  type: azure
  container: archives
  account_name: mystorageaccount
  account_key: xxxxxxxxxxxxxxxxxxxxx

HTTP Storage (Read-Only)

Access files via HTTP/HTTPS (read-only).

Required fields:

  • name: Mount point name

  • type: Must be "http"

  • base_url: Base URL for HTTP requests

Example:

- name: cdn
  type: http
  base_url: https://cdn.example.com
storage.configure([{
    'name': 'cdn',
    'type': 'http',
    'base_url': 'https://cdn.example.com'
}])

# Read-only access
node = storage.node('cdn:assets/logo.png')
image_data = node.read(mode='rb')

Memory Storage (Testing)

In-memory storage for testing and development.

Required fields:

  • name: Mount point name

  • type: Must be "memory"

Example:

# Perfect for unit tests
storage.configure([{'name': 'test', 'type': 'memory'}])

node = storage.node('test:temp.txt')
node.write("test data")
assert node.read() == "test data"

Permission Control

Configure access permissions for any storage backend to restrict operations.

Permission Levels

Three permission levels are available:

  • readonly: Only read operations (read, exists, list)

  • readwrite: Read and write operations, but no delete

  • delete: Full access including delete (default)

Example:

storage.configure([
    # Read-only access to public data
    {'name': 'public', 'type': 'http', 'base_url': 'https://cdn.example.com',
     'permissions': 'readonly'},

    # Read-write but no delete for backups
    {'name': 'backups', 'type': 'gcs', 'bucket': 'my-backups',
     'permissions': 'readwrite'},

    # Full access (default)
    {'name': 'uploads', 'type': 's3', 'bucket': 'my-uploads'}
])

Using Permissions

Permissions are validated at configuration time and enforced at runtime:

# Read-only mount
storage.configure([
    {'name': 'public', 'type': 's3', 'bucket': 'public-data',
     'permissions': 'readonly'}
])

node = storage.node('public:file.txt')

# Allowed operations
content = node.read()
exists = node.exists
files = node.children()

# Raises StoragePermissionError
node.write("data")
node.delete()
node.mkdir()

YAML Configuration

# Read-only CDN
- name: cdn
  type: http
  base_url: https://cdn.example.com
  permissions: readonly

# Read-write backups (no delete)
- name: backups
  type: gcs
  bucket: my-backups
  permissions: readwrite

# Full access uploads
- name: uploads
  type: s3
  bucket: my-uploads
  permissions: delete

Permission Validation

Permissions are validated against backend capabilities:

# Invalid: HTTP is read-only, cannot request readwrite
storage.configure([
    {'name': 'cdn', 'type': 'http', 'base_url': 'https://cdn.example.com',
     'permissions': 'readwrite'}  # Error!
])
# Raises: StorageConfigError: Backend is read-only

Best Practices

  1. Principle of least privilege - Use readonly when possible

  2. Protect backups - Use readwrite to prevent accidental deletion

  3. Validate at startup - Permissions are checked at configuration time

  4. Use for security - Restrict access to sensitive storage locations

Advanced Configuration

Multiple Configurations

You can call configure() multiple times. Mounts with the same name are replaced:

# Initial setup
storage.configure([
    {'name': 'home', 'type': 'local', 'path': '/home/user'}
])

# Add more mounts later
storage.configure([
    {'name': 'uploads', 'type': 's3', 'bucket': 'my-bucket'}
])

# Replace existing mount
storage.configure([
    {'name': 'home', 'type': 'local', 'path': '/mnt/newlocation'}
])

Environment-Specific Configuration

Use different configuration files per environment:

import os

storage = StorageManager()

# Load environment-specific config
env = os.getenv('APP_ENV', 'development')
config_file = f'/etc/app/storage-{env}.yaml'
storage.configure(config_file)

storage-development.yaml:

- name: uploads
  type: local
  path: /tmp/dev-uploads

storage-production.yaml:

- name: uploads
  type: s3
  bucket: prod-uploads
  region: eu-west-1

Configuration from Environment Variables

Build configuration dynamically from environment:

import os

storage = StorageManager()
storage.configure([
    {
        'name': 'uploads',
        'type': 's3',
        'bucket': os.getenv('S3_BUCKET'),
        'region': os.getenv('AWS_REGION', 'eu-west-1'),
        'key': os.getenv('AWS_ACCESS_KEY_ID'),
        'secret': os.getenv('AWS_SECRET_ACCESS_KEY')
    }
])

Checking Configured Mounts

# List all configured mounts
print(storage.get_mount_names())
# ['home', 'uploads', 'temp']

# Check if mount exists
if storage.has_mount('uploads'):
    node = storage.node('uploads:file.txt')
else:
    print("Uploads storage not configured")

Complete Example

config/storage-prod.yaml:

# Local temporary storage
- name: temp
  type: local
  path: /tmp/app

# User uploads to S3
- name: uploads
  type: s3
  bucket: prod-app-uploads
  region: eu-west-1
  prefix: uploads/

# Backups to GCS
- name: backups
  type: gcs
  bucket: prod-app-backups
  token: /etc/secrets/gcp-key.json

# Static assets from CDN
- name: cdn
  type: http
  base_url: https://cdn.example.com

Python application:

from genro_storage import StorageManager

# Initialize and configure
storage = StorageManager()
storage.configure('/etc/app/storage-prod.yaml')

# Process upload
upload = storage.node('temp:processing/image.jpg')
upload.write(uploaded_data, mode='wb')

# Save to S3
final = storage.node('uploads:2024/images/photo.jpg')
upload.copy_to(final)

# Backup to GCS
backup = storage.node('backups:daily/2024-10-26/photo.jpg')
final.copy_to(backup)

# Cleanup temp
upload.delete()

# Access CDN asset
logo = storage.node('cdn:assets/logo.png')
logo_data = logo.read(mode='rb')

Best Practices

  1. Use YAML for declarative configs - easier to read and maintain

  2. Separate configs per environment - development, staging, production

  3. Store credentials securely - use secrets managers, not config files

  4. Use meaningful mount names - uploads, backups, not s3_1

  5. Configure once at startup - don’t reconfigure during runtime

  6. Use prefixes in cloud storage - organize files within buckets

  7. Test with memory backend - fast, no cleanup needed

Troubleshooting

Mount not found error:

# Error: StorageNotFoundError: Mount point 'uploads' not found
node = storage.node('uploads:file.txt')

# Check configured mounts
print(storage.get_mount_names())

# Verify mount is configured
if not storage.has_mount('uploads'):
    storage.configure([{'name': 'uploads', 'type': 's3', 'bucket': 'my-bucket'}])

Invalid configuration:

# Missing required field
storage.configure([{'name': 'uploads', 'type': 's3'}])  # Missing 'bucket'!
# Raises: StorageConfigError: Missing required field 'bucket' for S3 storage

Path escaping base directory:

storage.configure([{'name': 'home', 'type': 'local', 'path': '/home/user'}])
node = storage.node('home:../../../etc/passwd')
# Raises: ValueError: Path escapes base directory