Storage Backends
This page describes the available storage backends and their configuration options.
Overview
genro-storage supports multiple storage backends through fsspec. Each backend is configured with a mount point name and backend-specific parameters.
Local Storage
Store files on the local filesystem.
Configuration:
{
'name': 'home',
'type': 'local',
'path': '/home/user' # required: absolute path
}
Use cases: - Development and testing - Local file processing - Temporary storage
Memory Storage
In-memory storage for testing.
Configuration:
{
'name': 'test',
'type': 'memory'
}
Use cases: - Unit testing - Fast temporary storage - Mock storage in tests
Amazon S3
Store files in Amazon S3 buckets.
Installation:
pip install genro-storage[s3]
Configuration:
{
'name': 'uploads',
'type': 's3',
'bucket': 'my-bucket', # required
'prefix': 'uploads/', # optional
'region': 'eu-west-1', # optional
'anon': False # optional: anonymous access
}
Authentication:
Uses boto3 credentials (environment variables, ~/.aws/credentials, or IAM roles).
Google Cloud Storage
Store files in Google Cloud Storage buckets.
Installation:
pip install genro-storage[gcs]
Configuration:
{
'name': 'backups',
'type': 'gcs',
'bucket': 'my-backups', # required
'prefix': '', # optional
'token': 'path/to/key.json' # optional
}
Azure Blob Storage
Store files in Azure Blob Storage.
Installation:
pip install genro-storage[azure]
Configuration:
{
'name': 'archive',
'type': 'azure',
'container': 'archives', # required
'account_name': 'myaccount', # required
'account_key': '...' # optional
}
HTTP Storage
Read-only access to files via HTTP.
Configuration:
{
'name': 'cdn',
'type': 'http',
'base_url': 'https://cdn.example.com' # required
}
Note: HTTP storage is read-only.
Base64 Storage
Store data inline as base64-encoded strings, similar to data URIs.
Configuration:
{
'name': 'data',
'type': 'base64'
}
Usage:
# Read inline base64 data
node = storage.node('data:SGVsbG8gV29ybGQ=') # "Hello World" encoded
content = node.read() # Returns "Hello World"
# Write creates/updates the base64 path (writable with mutable paths)
node = storage.node('data:')
node.write("New content")
print(node.path) # TmV3IGNvbnRlbnQ= (base64 encoded)
# Copy from other storage to base64 for inline use
s3_image = storage.node('uploads:photo.jpg')
b64_image = storage.node('data:')
s3_image.copy_to(b64_image)
data_uri = f"data:image/jpeg;base64,{b64_image.path}"
Use cases:
Embed small files directly in configuration or databases
Create data URIs for inline images in HTML/CSS
Store secrets or tokens as encoded strings
Testing with inline test data
Features:
Read from base64-encoded paths
Write to create/update base64 content (path updates automatically)
Supports both text and binary data
Automatic encoding/decoding
Compatible with standard base64 encoding
Handles multiline base64 strings
Limitations:
Not suitable for large files (base64 increases size by ~33%)
Path changes after every write (mutable path behavior)
No directory operations (delete, mkdir, list_dir raise errors)
SMB/CIFS Storage
Access Windows and Samba network shares.
Installation:
pip install genro-storage[smb]
Configuration:
{
'name': 'fileserver',
'type': 'smb',
'host': '192.168.1.100', # required: SMB server
'share': 'documents', # required: share name
'username': 'user', # optional
'password': 'secret', # optional
'domain': 'WORKGROUP', # optional
'port': 445 # optional (default: 445)
}
Authentication:
SMB supports both guest access (no credentials) and authenticated access with username/password. For domain environments, specify the domain parameter.
Use cases:
Access files on Windows file servers
Connect to NAS devices with SMB/CIFS support
Integrate with corporate network shares
Cross-platform file sharing in enterprise environments
Features:
Full read/write support
Directory operations (mkdir, list, delete)
Works with Windows, Samba, and NAS devices
Supports SMB2/SMB3 protocols
SFTP/SSH Storage
Secure file transfer over SSH.
Installation:
pip install genro-storage[sftp]
Configuration:
# Password authentication
{
'name': 'server1',
'type': 'sftp',
'host': 'server.example.com', # required
'username': 'deploy', # required
'password': 'secret', # optional
'port': 22 # optional (default: 22)
}
# Key-based authentication
{
'name': 'server2',
'type': 'sftp',
'host': '192.168.1.50',
'username': 'user',
'key_filename': '/home/user/.ssh/id_rsa', # optional
'passphrase': 'keypass', # optional
'timeout': 30 # optional
}
Authentication:
Supports both password and SSH key-based authentication. Key-based authentication is recommended for automated deployments and CI/CD pipelines.
Use cases:
Secure file transfer to Linux/Unix servers
Automated deployments via SSH
Access files on VPS and cloud instances
Backup and sync operations over secure connections
Features:
Full read/write support
Directory operations
SSH key and password authentication
Configurable timeouts
ZIP Archives
Access ZIP archives as virtual filesystems.
Configuration:
# Read from existing ZIP
{
'name': 'backup',
'type': 'zip',
'file': '/backups/data.zip', # required: path to ZIP file
'mode': 'r' # optional: 'r', 'w', 'a' (default: 'r')
}
# Create new ZIP
{
'name': 'archive',
'type': 'zip',
'file': '/output/archive.zip',
'mode': 'w'
}
Usage:
# Read from ZIP archive
node = storage.node('backup:config/settings.json')
config = node.read()
# Extract specific file
log = storage.node('backup:logs/app.log')
log.copy_to(storage.node('home:extracted_log.txt'))
# Create new archive
source = storage.node('home:documents/report.pdf')
source.copy_to(storage.node('archive:reports/report.pdf'))
Use cases:
Read configuration from deployment archives
Extract specific files without full decompression
Create backup archives programmatically
Distribute application bundles
Features:
Read and write support
Transparent compression
Standard ZIP format compatibility
Fast random access to archived files
Note: Built-in to fsspec, no additional dependencies required.
TAR Archives
Read TAR archives (including compressed .tar.gz, .tar.bz2, .tar.xz).
Configuration:
# Read TAR archive
{
'name': 'logs',
'type': 'tar',
'file': '/var/log/archive.tar.gz' # required: path to TAR file
}
# Compression auto-detected from extension
{
'name': 'backup',
'type': 'tar',
'file': '/backups/data.tar.bz2'
}
Usage:
# Read from TAR archive
node = storage.node('logs:app.log')
content = node.read()
# List archive contents
for item in storage.node('logs:').list():
print(item)
# Extract to local storage
archived = storage.node('backup:important.txt')
archived.copy_to(storage.node('home:restored.txt'))
Supported compressions:
.tar- Uncompressed TAR.tar.gzor.tgz- Gzip compressed.tar.bz2- Bzip2 compressed.tar.xz- XZ/LZMA compressed
Compression is automatically detected from the file extension.
Use cases:
Process log archives without extraction
Access files in backup archives
Read distribution packages
Analyze compressed TAR files
Features:
Automatic compression detection
Multiple compression format support
Fast archive browsing
No temporary extraction required
Limitations:
Read-only: TAR archives cannot be modified
No write support: Cannot create or update TAR files
Use ZIP backend if write access is needed
Note: Built-in to fsspec, no additional dependencies required.
Git Repositories
Read files from local Git repositories at specific commits, branches, or tags.
Installation:
pip install pygit2
Configuration:
# Access repository at HEAD
{
'name': 'myrepo',
'type': 'git',
'path': '/path/to/repo.git' # required: path to Git repository
}
# Access specific branch/tag/commit
{
'name': 'production',
'type': 'git',
'path': '/path/to/repo.git',
'ref': 'v1.0.0' # optional: branch, tag, or commit SHA
}
Usage:
# Read file from repository
node = storage.node('myrepo:src/main.py')
content = node.read()
# List repository files
for item in storage.node('myrepo:src').list():
print(item)
# Compare different versions
current = storage.node('production:config.yaml')
staging = storage.node('staging:config.yaml')
if current.md5 != staging.md5:
print("Configuration differs between production and staging")
Use cases:
Read configuration from specific Git commits
Access historical versions of files
Compare files across branches
Browse repository contents without checkout
Build tools that need version-specific access
Features:
Access any commit, branch, or tag
Read files without full checkout
Version history access
Fast repository browsing
No working directory required
Limitations:
Read-only: Cannot commit or modify repository
No write support: Git repositories are read-only via fsspec
Requires pygit2 library
Only works with local repositories (use GitHub backend for remote)
Note: Requires pygit2 package for Git access.
GitHub Repositories
Read files from GitHub repositories via API, with support for branches, tags, and commits.
Configuration:
# Public repository (no authentication)
{
'name': 'opensource',
'type': 'github',
'org': 'genropy', # required: GitHub organization/user
'repo': 'genro-storage' # required: repository name
}
# Specific branch/tag/commit
{
'name': 'release',
'type': 'github',
'org': 'genropy',
'repo': 'genro-storage',
'sha': 'v1.0.0' # optional: branch, tag, or commit SHA
}
# Private repository (with authentication)
{
'name': 'private',
'type': 'github',
'org': 'mycompany',
'repo': 'secret-project',
'username': 'myusername', # required for private repos
'token': 'ghp_xxxxxxxxxxxxx' # required for private repos
}
Usage:
# Read file from GitHub
node = storage.node('opensource:README.md')
content = node.read()
# Download configuration from release tag
config_node = storage.node('release:config/production.yaml')
config_node.copy_to(storage.node('local:config.yaml'))
# List repository contents
for item in storage.node('opensource:src').list():
print(f"File: {item.name}, Size: {item.size}")
Authentication:
For private repositories or to increase API rate limits:
Create a Personal Access Token at https://github.com/settings/tokens
Include both
usernameandtokenin configurationToken needs
reposcope for private repository access
Use cases:
Download configuration from GitHub releases
Access documentation files
Fetch schemas or templates from repositories
CI/CD pipelines reading from GitHub
Tools that process files from multiple GitHub repos
Features:
Access public and private repositories
Read any commit, branch, or tag
No local clone required
Works over HTTPS
Efficient API-based access
Limitations:
Read-only: Cannot push commits
API rate limits: 60 req/hour (unauthenticated), 5000 req/hour (authenticated)
Requires internet connection
Not suitable for large binary files (use Git clone for that)
Note: Built-in to fsspec, no additional dependencies required. Authentication requires GitHub Personal Access Token.
WebDAV Storage
Access remote files via WebDAV protocol (Nextcloud, ownCloud, SharePoint, etc.).
Installation:
pip install genro-storage[webdav]
# or
pip install webdav4
Configuration:
# Basic configuration
{
'name': 'nextcloud',
'type': 'webdav',
'url': 'https://cloud.example.com/remote.php/dav/files/username'
}
# With username/password authentication
{
'name': 'sharepoint',
'type': 'webdav',
'url': 'https://sharepoint.company.com/documents',
'username': 'user@company.com',
'password': 'secret'
}
# With bearer token authentication
{
'name': 'owncloud',
'type': 'webdav',
'url': 'https://owncloud.example.com/remote.php/webdav',
'token': 'bearer_token_here'
}
Usage:
# Read file from WebDAV
node = storage.node('nextcloud:Documents/report.pdf')
data = node.read_bytes()
# Upload file to WebDAV
local = storage.node('home:photo.jpg')
local.copy_to(storage.node('nextcloud:Photos/vacation.jpg'))
# Create directory
storage.node('sharepoint:Projects/NewProject').mkdir()
# List remote files
for item in storage.node('owncloud:Documents').list():
print(f"{item.name}: {item.size} bytes")
# Delete remote file
storage.node('nextcloud:temp/old_file.txt').delete()
Supported services:
Nextcloud - Open-source file sync and share
ownCloud - Enterprise file sync and share
SharePoint - Microsoft collaboration platform
Box - Cloud content management
Any WebDAV server - Standard protocol support
Use cases:
Sync files with Nextcloud/ownCloud
Access corporate SharePoint documents
Remote file backup and restore
Collaborative document management
Cross-platform file sharing
Features:
Full read/write support
Create and delete files
Directory operations
Works with any WebDAV-compliant server
Standard HTTP/HTTPS protocol
Limitations:
Requires network connection
Performance depends on network speed
Authentication required for most servers
Some servers may have file size limits
Note: Requires webdav4 package. Ensure your WebDAV server URL is correct (often includes /remote.php/dav/ for Nextcloud/ownCloud).
LibArchive Storage
Read files from various archive formats using libarchive (ZIP, TAR, RAR, 7z, ISO, and more).
Installation:
pip install genro-storage[libarchive]
# or
pip install libarchive-c
Note: Also requires system libarchive library:
macOS:
brew install libarchiveUbuntu/Debian:
apt-get install libarchive-devCentOS/RHEL:
yum install libarchive-develWindows: Pre-built binaries available
Configuration:
# Read any archive format
{
'name': 'backup',
'type': 'libarchive',
'file': '/backups/data.tar.gz'
}
{
'name': 'install',
'type': 'libarchive',
'file': '/downloads/software.zip'
}
{
'name': 'iso',
'type': 'libarchive',
'file': '/images/linux.iso'
}
Usage:
# Read from archive
node = storage.node('backup:important.txt')
content = node.read()
# List archive contents
for item in storage.node('install:').list():
print(f"{item.name}: {item.size} bytes")
# Extract specific files
archived = storage.node('backup:database/config.json')
archived.copy_to(storage.node('local:restored_config.json'))
# Browse ISO image contents
for file in storage.node('iso:boot').list():
print(file.name)
Supported formats:
ZIP - ZIP archives
TAR - TAR archives (with gzip, bzip2, xz, lzma compression)
RAR - RAR archives
7z - 7-Zip archives
ISO - ISO disk images
ARJ - ARJ archives
CAB - Microsoft Cabinet files
LHA/LZH - LHA/LZH archives
And many more - See libarchive documentation
Use cases:
Read files from RAR archives (unlike ZIP backend)
Browse ISO disk images
Access files in 7z archives
Extract from various legacy archive formats
Unified interface for all archive types
Features:
Supports 20+ archive formats
Automatic format detection
Compression support for most formats
Fast archive browsing
No temporary extraction required
Limitations:
Read-only: Cannot create or modify archives
System dependency: Requires libarchive library installed
Performance: May be slower than format-specific backends
Not all archive formats support random access
Note: Requires both libarchive-c Python package and system libarchive library. For write access to ZIP/TAR, use the dedicated ZIP or TAR backends instead.