Storage Backends

This page describes the available storage backends and their configuration options.

Overview

genro-storage supports multiple storage backends through fsspec. Each backend is configured with a mount point name and backend-specific parameters.

Local Storage

Store files on the local filesystem.

Configuration:

{
    'name': 'home',
    'type': 'local',
    'path': '/home/user'  # required: absolute path
}

Use cases: - Development and testing - Local file processing - Temporary storage

Memory Storage

In-memory storage for testing.

Configuration:

{
    'name': 'test',
    'type': 'memory'
}

Use cases: - Unit testing - Fast temporary storage - Mock storage in tests

Amazon S3

Store files in Amazon S3 buckets.

Installation:

pip install genro-storage[s3]

Configuration:

{
    'name': 'uploads',
    'type': 's3',
    'bucket': 'my-bucket',      # required
    'prefix': 'uploads/',       # optional
    'region': 'eu-west-1',      # optional
    'anon': False               # optional: anonymous access
}

Authentication:

Uses boto3 credentials (environment variables, ~/.aws/credentials, or IAM roles).

Google Cloud Storage

Store files in Google Cloud Storage buckets.

Installation:

pip install genro-storage[gcs]

Configuration:

{
    'name': 'backups',
    'type': 'gcs',
    'bucket': 'my-backups',           # required
    'prefix': '',                     # optional
    'token': 'path/to/key.json'       # optional
}

Azure Blob Storage

Store files in Azure Blob Storage.

Installation:

pip install genro-storage[azure]

Configuration:

{
    'name': 'archive',
    'type': 'azure',
    'container': 'archives',          # required
    'account_name': 'myaccount',      # required
    'account_key': '...'              # optional
}

HTTP Storage

Read-only access to files via HTTP.

Configuration:

{
    'name': 'cdn',
    'type': 'http',
    'base_url': 'https://cdn.example.com'  # required
}

Note: HTTP storage is read-only.

Base64 Storage

Store data inline as base64-encoded strings, similar to data URIs.

Configuration:

{
    'name': 'data',
    'type': 'base64'
}

Usage:

# Read inline base64 data
node = storage.node('data:SGVsbG8gV29ybGQ=')  # "Hello World" encoded
content = node.read()  # Returns "Hello World"

# Write creates/updates the base64 path (writable with mutable paths)
node = storage.node('data:')
node.write("New content")
print(node.path)  # TmV3IGNvbnRlbnQ= (base64 encoded)

# Copy from other storage to base64 for inline use
s3_image = storage.node('uploads:photo.jpg')
b64_image = storage.node('data:')
s3_image.copy_to(b64_image)
data_uri = f"data:image/jpeg;base64,{b64_image.path}"

Use cases:

  • Embed small files directly in configuration or databases

  • Create data URIs for inline images in HTML/CSS

  • Store secrets or tokens as encoded strings

  • Testing with inline test data

Features:

  • Read from base64-encoded paths

  • Write to create/update base64 content (path updates automatically)

  • Supports both text and binary data

  • Automatic encoding/decoding

  • Compatible with standard base64 encoding

  • Handles multiline base64 strings

Limitations:

  • Not suitable for large files (base64 increases size by ~33%)

  • Path changes after every write (mutable path behavior)

  • No directory operations (delete, mkdir, list_dir raise errors)

SMB/CIFS Storage

Access Windows and Samba network shares.

Installation:

pip install genro-storage[smb]

Configuration:

{
    'name': 'fileserver',
    'type': 'smb',
    'host': '192.168.1.100',      # required: SMB server
    'share': 'documents',         # required: share name
    'username': 'user',           # optional
    'password': 'secret',         # optional
    'domain': 'WORKGROUP',        # optional
    'port': 445                   # optional (default: 445)
}

Authentication:

SMB supports both guest access (no credentials) and authenticated access with username/password. For domain environments, specify the domain parameter.

Use cases:

  • Access files on Windows file servers

  • Connect to NAS devices with SMB/CIFS support

  • Integrate with corporate network shares

  • Cross-platform file sharing in enterprise environments

Features:

  • Full read/write support

  • Directory operations (mkdir, list, delete)

  • Works with Windows, Samba, and NAS devices

  • Supports SMB2/SMB3 protocols

SFTP/SSH Storage

Secure file transfer over SSH.

Installation:

pip install genro-storage[sftp]

Configuration:

# Password authentication
{
    'name': 'server1',
    'type': 'sftp',
    'host': 'server.example.com',    # required
    'username': 'deploy',             # required
    'password': 'secret',             # optional
    'port': 22                        # optional (default: 22)
}

# Key-based authentication
{
    'name': 'server2',
    'type': 'sftp',
    'host': '192.168.1.50',
    'username': 'user',
    'key_filename': '/home/user/.ssh/id_rsa',  # optional
    'passphrase': 'keypass',                    # optional
    'timeout': 30                               # optional
}

Authentication:

Supports both password and SSH key-based authentication. Key-based authentication is recommended for automated deployments and CI/CD pipelines.

Use cases:

  • Secure file transfer to Linux/Unix servers

  • Automated deployments via SSH

  • Access files on VPS and cloud instances

  • Backup and sync operations over secure connections

Features:

  • Full read/write support

  • Directory operations

  • SSH key and password authentication

  • Configurable timeouts

ZIP Archives

Access ZIP archives as virtual filesystems.

Configuration:

# Read from existing ZIP
{
    'name': 'backup',
    'type': 'zip',
    'file': '/backups/data.zip',     # required: path to ZIP file
    'mode': 'r'                       # optional: 'r', 'w', 'a' (default: 'r')
}

# Create new ZIP
{
    'name': 'archive',
    'type': 'zip',
    'file': '/output/archive.zip',
    'mode': 'w'
}

Usage:

# Read from ZIP archive
node = storage.node('backup:config/settings.json')
config = node.read()

# Extract specific file
log = storage.node('backup:logs/app.log')
log.copy_to(storage.node('home:extracted_log.txt'))

# Create new archive
source = storage.node('home:documents/report.pdf')
source.copy_to(storage.node('archive:reports/report.pdf'))

Use cases:

  • Read configuration from deployment archives

  • Extract specific files without full decompression

  • Create backup archives programmatically

  • Distribute application bundles

Features:

  • Read and write support

  • Transparent compression

  • Standard ZIP format compatibility

  • Fast random access to archived files

Note: Built-in to fsspec, no additional dependencies required.

TAR Archives

Read TAR archives (including compressed .tar.gz, .tar.bz2, .tar.xz).

Configuration:

# Read TAR archive
{
    'name': 'logs',
    'type': 'tar',
    'file': '/var/log/archive.tar.gz'   # required: path to TAR file
}

# Compression auto-detected from extension
{
    'name': 'backup',
    'type': 'tar',
    'file': '/backups/data.tar.bz2'
}

Usage:

# Read from TAR archive
node = storage.node('logs:app.log')
content = node.read()

# List archive contents
for item in storage.node('logs:').list():
    print(item)

# Extract to local storage
archived = storage.node('backup:important.txt')
archived.copy_to(storage.node('home:restored.txt'))

Supported compressions:

  • .tar - Uncompressed TAR

  • .tar.gz or .tgz - Gzip compressed

  • .tar.bz2 - Bzip2 compressed

  • .tar.xz - XZ/LZMA compressed

Compression is automatically detected from the file extension.

Use cases:

  • Process log archives without extraction

  • Access files in backup archives

  • Read distribution packages

  • Analyze compressed TAR files

Features:

  • Automatic compression detection

  • Multiple compression format support

  • Fast archive browsing

  • No temporary extraction required

Limitations:

  • Read-only: TAR archives cannot be modified

  • No write support: Cannot create or update TAR files

  • Use ZIP backend if write access is needed

Note: Built-in to fsspec, no additional dependencies required.

Git Repositories

Read files from local Git repositories at specific commits, branches, or tags.

Installation:

pip install pygit2

Configuration:

# Access repository at HEAD
{
    'name': 'myrepo',
    'type': 'git',
    'path': '/path/to/repo.git'   # required: path to Git repository
}

# Access specific branch/tag/commit
{
    'name': 'production',
    'type': 'git',
    'path': '/path/to/repo.git',
    'ref': 'v1.0.0'               # optional: branch, tag, or commit SHA
}

Usage:

# Read file from repository
node = storage.node('myrepo:src/main.py')
content = node.read()

# List repository files
for item in storage.node('myrepo:src').list():
    print(item)

# Compare different versions
current = storage.node('production:config.yaml')
staging = storage.node('staging:config.yaml')
if current.md5 != staging.md5:
    print("Configuration differs between production and staging")

Use cases:

  • Read configuration from specific Git commits

  • Access historical versions of files

  • Compare files across branches

  • Browse repository contents without checkout

  • Build tools that need version-specific access

Features:

  • Access any commit, branch, or tag

  • Read files without full checkout

  • Version history access

  • Fast repository browsing

  • No working directory required

Limitations:

  • Read-only: Cannot commit or modify repository

  • No write support: Git repositories are read-only via fsspec

  • Requires pygit2 library

  • Only works with local repositories (use GitHub backend for remote)

Note: Requires pygit2 package for Git access.

GitHub Repositories

Read files from GitHub repositories via API, with support for branches, tags, and commits.

Configuration:

# Public repository (no authentication)
{
    'name': 'opensource',
    'type': 'github',
    'org': 'genropy',              # required: GitHub organization/user
    'repo': 'genro-storage'        # required: repository name
}

# Specific branch/tag/commit
{
    'name': 'release',
    'type': 'github',
    'org': 'genropy',
    'repo': 'genro-storage',
    'sha': 'v1.0.0'                # optional: branch, tag, or commit SHA
}

# Private repository (with authentication)
{
    'name': 'private',
    'type': 'github',
    'org': 'mycompany',
    'repo': 'secret-project',
    'username': 'myusername',      # required for private repos
    'token': 'ghp_xxxxxxxxxxxxx'   # required for private repos
}

Usage:

# Read file from GitHub
node = storage.node('opensource:README.md')
content = node.read()

# Download configuration from release tag
config_node = storage.node('release:config/production.yaml')
config_node.copy_to(storage.node('local:config.yaml'))

# List repository contents
for item in storage.node('opensource:src').list():
    print(f"File: {item.name}, Size: {item.size}")

Authentication:

For private repositories or to increase API rate limits:

  1. Create a Personal Access Token at https://github.com/settings/tokens

  2. Include both username and token in configuration

  3. Token needs repo scope for private repository access

Use cases:

  • Download configuration from GitHub releases

  • Access documentation files

  • Fetch schemas or templates from repositories

  • CI/CD pipelines reading from GitHub

  • Tools that process files from multiple GitHub repos

Features:

  • Access public and private repositories

  • Read any commit, branch, or tag

  • No local clone required

  • Works over HTTPS

  • Efficient API-based access

Limitations:

  • Read-only: Cannot push commits

  • API rate limits: 60 req/hour (unauthenticated), 5000 req/hour (authenticated)

  • Requires internet connection

  • Not suitable for large binary files (use Git clone for that)

Note: Built-in to fsspec, no additional dependencies required. Authentication requires GitHub Personal Access Token.

WebDAV Storage

Access remote files via WebDAV protocol (Nextcloud, ownCloud, SharePoint, etc.).

Installation:

pip install genro-storage[webdav]
# or
pip install webdav4

Configuration:

# Basic configuration
{
    'name': 'nextcloud',
    'type': 'webdav',
    'url': 'https://cloud.example.com/remote.php/dav/files/username'
}

# With username/password authentication
{
    'name': 'sharepoint',
    'type': 'webdav',
    'url': 'https://sharepoint.company.com/documents',
    'username': 'user@company.com',
    'password': 'secret'
}

# With bearer token authentication
{
    'name': 'owncloud',
    'type': 'webdav',
    'url': 'https://owncloud.example.com/remote.php/webdav',
    'token': 'bearer_token_here'
}

Usage:

# Read file from WebDAV
node = storage.node('nextcloud:Documents/report.pdf')
data = node.read_bytes()

# Upload file to WebDAV
local = storage.node('home:photo.jpg')
local.copy_to(storage.node('nextcloud:Photos/vacation.jpg'))

# Create directory
storage.node('sharepoint:Projects/NewProject').mkdir()

# List remote files
for item in storage.node('owncloud:Documents').list():
    print(f"{item.name}: {item.size} bytes")

# Delete remote file
storage.node('nextcloud:temp/old_file.txt').delete()

Supported services:

  • Nextcloud - Open-source file sync and share

  • ownCloud - Enterprise file sync and share

  • SharePoint - Microsoft collaboration platform

  • Box - Cloud content management

  • Any WebDAV server - Standard protocol support

Use cases:

  • Sync files with Nextcloud/ownCloud

  • Access corporate SharePoint documents

  • Remote file backup and restore

  • Collaborative document management

  • Cross-platform file sharing

Features:

  • Full read/write support

  • Create and delete files

  • Directory operations

  • Works with any WebDAV-compliant server

  • Standard HTTP/HTTPS protocol

Limitations:

  • Requires network connection

  • Performance depends on network speed

  • Authentication required for most servers

  • Some servers may have file size limits

Note: Requires webdav4 package. Ensure your WebDAV server URL is correct (often includes /remote.php/dav/ for Nextcloud/ownCloud).

LibArchive Storage

Read files from various archive formats using libarchive (ZIP, TAR, RAR, 7z, ISO, and more).

Installation:

pip install genro-storage[libarchive]
# or
pip install libarchive-c

Note: Also requires system libarchive library:

  • macOS: brew install libarchive

  • Ubuntu/Debian: apt-get install libarchive-dev

  • CentOS/RHEL: yum install libarchive-devel

  • Windows: Pre-built binaries available

Configuration:

# Read any archive format
{
    'name': 'backup',
    'type': 'libarchive',
    'file': '/backups/data.tar.gz'
}

{
    'name': 'install',
    'type': 'libarchive',
    'file': '/downloads/software.zip'
}

{
    'name': 'iso',
    'type': 'libarchive',
    'file': '/images/linux.iso'
}

Usage:

# Read from archive
node = storage.node('backup:important.txt')
content = node.read()

# List archive contents
for item in storage.node('install:').list():
    print(f"{item.name}: {item.size} bytes")

# Extract specific files
archived = storage.node('backup:database/config.json')
archived.copy_to(storage.node('local:restored_config.json'))

# Browse ISO image contents
for file in storage.node('iso:boot').list():
    print(file.name)

Supported formats:

  • ZIP - ZIP archives

  • TAR - TAR archives (with gzip, bzip2, xz, lzma compression)

  • RAR - RAR archives

  • 7z - 7-Zip archives

  • ISO - ISO disk images

  • ARJ - ARJ archives

  • CAB - Microsoft Cabinet files

  • LHA/LZH - LHA/LZH archives

  • And many more - See libarchive documentation

Use cases:

  • Read files from RAR archives (unlike ZIP backend)

  • Browse ISO disk images

  • Access files in 7z archives

  • Extract from various legacy archive formats

  • Unified interface for all archive types

Features:

  • Supports 20+ archive formats

  • Automatic format detection

  • Compression support for most formats

  • Fast archive browsing

  • No temporary extraction required

Limitations:

  • Read-only: Cannot create or modify archives

  • System dependency: Requires libarchive library installed

  • Performance: May be slower than format-specific backends

  • Not all archive formats support random access

Note: Requires both libarchive-c Python package and system libarchive library. For write access to ZIP/TAR, use the dedicated ZIP or TAR backends instead.