API Reference
This page provides the complete API documentation for genro-storage.
StorageManager
- class genro_storage.StorageManager[source]
Bases:
objectMain entry point for configuring and accessing storage.
StorageManager is responsible for: - Configuring mount points that map to storage backends - Creating StorageNode instances for file/directory access - Managing the lifecycle of storage backend connections
A mount point is a logical name (e.g., “home”, “uploads”, “s3”) that maps to an actual storage backend (local filesystem, S3 bucket, etc.).
Examples
>>> # Create manager >>> storage = StorageManager() >>> >>> # Configure from file >>> storage.configure('/etc/app/storage.yaml') >>> >>> # Configure programmatically >>> storage.configure([ ... {'name': 'home', 'type': 'local', 'path': '/home/user'}, ... {'name': 'uploads', 'type': 's3', 'bucket': 'my-bucket'} ... ]) >>> >>> # Access files >>> node = storage.node('home:documents/report.pdf') >>> content = node.read_text()
- __init__()[source]
Initialize a new StorageManager with no configured mounts.
After initialization, you must call
configure()to set up mount points before you can access any files.Examples
>>> from genro_storage import StorageManager >>> storage = StorageManager()
- configure(source)[source]
Configure mount points from various sources.
This method can be called multiple times. If a mount with the same name already exists, it will be replaced with the new configuration.
- Parameters:
source (Annotated[str | list[dict[str, Any]], 'Configuration source: path to YAML/JSON file or list of mount configurations']) – Configuration source, can be: - str: Path to YAML or JSON configuration file - list[dict]: List of mount configurations
- Raises:
FileNotFoundError – If configuration file doesn’t exist
StorageConfigError – If configuration format is invalid
TypeError – If source is neither str nor list
- Configuration Dictionary Format:
Each mount configuration dict must have:
name (str, required): Mount point name (e.g., “home”, “uploads”)
type (str, required): Backend type (“local”, “s3”, “gcs”, “azure”, “http”, “memory”)
Additional fields depend on type (see examples below)
Examples
Local Storage:
>>> storage.configure([{ ... 'name': 'home', ... 'type': 'local', ... 'path': '/home/user' # required: absolute path ... }])
S3 Storage:
>>> storage.configure([{ ... 'name': 'uploads', ... 'type': 's3', ... 'bucket': 'my-bucket', # required ... 'prefix': 'uploads/', # optional, default: "" ... 'region': 'eu-west-1', # optional ... 'anon': False # optional, default: False ... }])
GCS Storage:
>>> storage.configure([{ ... 'name': 'backups', ... 'type': 'gcs', ... 'bucket': 'my-backups', # required ... 'prefix': '', # optional ... 'token': 'path/to/service-account.json' # optional ... }])
Azure Blob Storage:
>>> storage.configure([{ ... 'name': 'archive', ... 'type': 'azure', ... 'container': 'archives', # required ... 'account_name': 'myaccount', # required ... 'account_key': '...' # optional if using managed identity ... }])
HTTP Storage (read-only):
>>> storage.configure([{ ... 'name': 'cdn', ... 'type': 'http', ... 'base_url': 'https://cdn.example.com' # required ... }])
Memory Storage (for testing):
>>> storage.configure([{ ... 'name': 'test', ... 'type': 'memory' ... }])
From YAML File:
# storage.yaml - name: home type: local path: /home/user - name: uploads type: s3 bucket: my-app-uploads region: eu-west-1
>>> storage.configure('/etc/app/storage.yaml')
From JSON File:
[ { "name": "home", "type": "local", "path": "/home/user" }, { "name": "uploads", "type": "s3", "bucket": "my-app-uploads", "region": "eu-west-1" } ]
>>> storage.configure('./config/storage.json')
Multiple Calls (mounts are replaced if same name):
>>> storage.configure([{'name': 'home', 'type': 'local', 'path': '/home/user'}]) >>> storage.configure([{'name': 'uploads', 'type': 's3', 'bucket': 'my-bucket'}]) >>> # Now both 'home' and 'uploads' are configured
- add_mount(config)[source]
Add or update a single mount point.
If a mount with the same name already exists, it will be replaced.
- Parameters:
config (Annotated[dict[str, Any], 'Mount configuration dictionary']) – Mount configuration dictionary with ‘name’ and ‘type’ fields
- Raises:
StorageConfigError – If configuration is invalid
Examples
>>> storage.add_mount({ ... 'name': 'uploads', ... 'type': 's3', ... 'bucket': 'my-bucket' ... })
- delete_mount(name)[source]
Delete a mount point.
- Parameters:
name (Annotated[str, 'Mount point name to delete']) – Name of the mount point to remove
- Raises:
KeyError – If mount point doesn’t exist
Examples
>>> storage.delete_mount('uploads')
- node(mount_or_path=None, *path_parts, version=None)[source]
Create a StorageNode pointing to a file or directory.
This is the primary way to access files and directories. The path uses a mount:path format where the mount name refers to a configured storage backend.
When called without arguments, creates a dummy/accumulator node that can be used to build content from multiple sources.
- Parameters:
mount_or_path (Annotated[str | None, 'Mount name or full path (mount:path format), or None for dummy node']) – Either: - Full path with mount: “mount:path/to/file” - Just mount name: “mount” - None: creates a dummy accumulator node (no storage backend)
*path_parts (str) – Additional path components to join
version (Annotated[int | str | None, 'Optional version: int for index (-1=latest), str for version_id']) – Optional version specifier for versioned storage (S3, GCS). If specified, creates a read-only snapshot node of that version. Can be int (index: -1=latest, -2=previous) or str (version_id).
- Returns:
A new StorageNode instance
- Return type:
- Raises:
KeyError – If mount point doesn’t exist (wrapped as StorageNotFoundError)
ValueError – If path format is invalid
- Path Normalization:
Multiple slashes collapsed: “a//b” → “a/b”
Leading/trailing slashes stripped
No support for “..” (parent directory) - raises ValueError
Examples
Full path in one string:
>>> node = storage.node('home:documents/report.pdf')
Mount + path parts:
>>> node = storage.node('home', 'documents', 'report.pdf')
Mix styles:
>>> node = storage.node('home:documents', 'reports', 'q4.pdf')
Dynamic composition:
>>> user_id = '123' >>> year = '2024' >>> node = storage.node('uploads', 'users', user_id, year, 'avatar.jpg') >>> # Result: uploads:users/123/2024/avatar.jpg
Just mount (root of storage):
>>> node = storage.node('home') >>> # Result: home:
Path with special characters:
>>> # Spaces and unicode are OK >>> node = storage.node('home:My Documents/Café Menu.pdf')
Invalid paths (will raise ValueError):
>>> # Parent directory traversal not allowed >>> node = storage.node('home:documents/../etc/passwd') # ValueError
Dummy node (accumulator):
>>> dummy = storage.node() # No parameters >>> dummy.append(node1) >>> dummy.extend(node2, node3) >>> dummy.read_text() # Concatenates all sources
- iternode(*nodes)[source]
Create a virtual node that concatenates multiple nodes lazily.
This creates a virtual node (no physical storage) that accumulates references to other nodes. Content is only read when materialized via read_text(), read_bytes(), copy(), or zip().
- Parameters:
*nodes – StorageNode instances to concatenate
- Returns:
Virtual node with concatenation capability
- Return type:
Examples
>>> # Create from existing nodes >>> n1 = storage.node('mem:part1.txt') >>> n2 = storage.node('mem:part2.txt') >>> combined = storage.iternode(n1, n2) >>> >>> # Read concatenated content >>> content = combined.read_text() >>> >>> # Add more nodes >>> n3 = storage.node('mem:part3.txt') >>> combined.append(n3) >>> >>> # Save to file >>> result = storage.node('mem:result.txt') >>> combined.copy(result) >>> >>> # Create ZIP >>> zip_bytes = combined.zip()
- diffnode(node1, node2)[source]
Create a virtual node that generates a diff between two nodes.
This creates a virtual node that generates a unified diff between two text files. The diff is only computed when materialized via read_text() or copy().
- Parameters:
node1 (StorageNode) – First node (old version)
node2 (StorageNode) – Second node (new version)
- Returns:
Virtual node with diff capability
- Return type:
- Raises:
ValueError – If nodes contain binary data
Examples
>>> # Compare two versions >>> v1 = storage.node('mem:config_v1.txt') >>> v2 = storage.node('mem:config_v2.txt') >>> diff = storage.diffnode(v1, v2) >>> >>> # Read diff >>> changes = diff.read_text() >>> >>> # Save diff to file >>> diff_file = storage.node('mem:changes.diff') >>> diff.copy(diff_file)
- get_mount_names()[source]
Get list of configured mount names.
Examples
>>> storage.configure([ ... {'name': 'home', 'type': 'local', 'path': '/home/user'}, ... {'name': 'uploads', 'type': 's3', 'bucket': 'my-bucket'} ... ]) >>> print(storage.get_mount_names()) ['home', 'uploads']
- has_mount(name)[source]
Check if a mount point is configured.
- Parameters:
name (Annotated[str, 'Mount point name to check']) – Mount point name to check
- Returns:
True if mount exists
- Return type:
Examples
>>> if storage.has_mount('uploads'): ... node = storage.node('uploads:file.txt') ... else: ... print("Uploads storage not configured")
StorageNode
- class genro_storage.StorageNode(manager, mount_name, path, version=None)[source]
Bases:
objectRepresents a file or directory in a storage backend.
StorageNode provides a unified interface for file operations across different storage backends (local, S3, GCS, Azure, HTTP, etc.).
Note
Users should not instantiate StorageNode directly. Use
StorageManager.node()instead.The node can represent either a file or a directory. Use the properties
isfileandisdirto determine the type.Examples
>>> # Get a node via StorageManager >>> node = storage.node('home:documents/report.pdf') >>> >>> # Check if it exists >>> if node.exists: ... print(f"File size: {node.size} bytes") >>> >>> # Read content >>> content = node.read_text() >>> >>> # Write content >>> node.write_text("Hello World")
- fullpath
Full path including mount point (e.g., “home:documents/file.txt”)
- exists
True if file or directory exists
- isfile
True if node points to a file
- isdir
True if node points to a directory
- size
File size in bytes
- mtime
Last modification time as Unix timestamp
- basename
Filename with extension
- stem
Filename without extension
- suffix
File extension including dot
- parent
Parent directory as StorageNode
- __init__(manager, mount_name, path, version=None)[source]
Initialize a StorageNode.
- Parameters:
manager (StorageManager) – The StorageManager instance that owns this node
mount_name (str | None) – Name of the mount point (e.g., “home”, “uploads”), or None for dummy node
path (str | None) – Relative path within the mount (e.g., “documents/file.txt”), or None for dummy node
version (int | str | None) – Optional version specifier for versioned storage. If set, the node becomes a read-only snapshot of that version.
Note
This should not be called directly. Use
StorageManager.node()instead.
- property fullpath: str
Full path including mount point.
- Returns:
Full path in format “mount:path/to/file”
- Return type:
Examples
>>> node = storage.node('home:documents/report.pdf') >>> print(node.fullpath) 'home:documents/report.pdf'
- property path: str
Relative path within the mount.
- Returns:
Path relative to mount point (without mount prefix)
- Return type:
Examples
>>> node = storage.node('home:documents/report.pdf') >>> print(node.path) 'documents/report.pdf'
>>> # For base64 backend, this is the base64-encoded content >>> node = storage.node('b64:SGVsbG8=') >>> print(node.path) 'SGVsbG8='
- property exists: bool
True if file or directory exists.
- Returns:
- True if the file or directory exists on the storage backend.
Virtual nodes always return False.
- Return type:
Examples
>>> if node.exists: ... print("File exists!") ... else: ... print("File not found")
- property isfile: bool
True if node points to a file.
- Returns:
True if this node is a file, False if directory or doesn’t exist
- Return type:
Examples
>>> if node.isfile: ... data = node._read_bytes()
- property isdir: bool
True if node points to a directory.
- Returns:
True if this node is a directory, False if file or doesn’t exist
- Return type:
Examples
>>> if node.isdir: ... for child in node.children(): ... print(child.basename)
- property size: int
File size in bytes.
- Returns:
Size of the file in bytes
- Return type:
- Raises:
FileNotFoundError – If file doesn’t exist
ValueError – If node is a directory (directories don’t have size)
Examples
>>> print(f"File size: {node.size} bytes") >>> print(f"File size: {node.size / 1024:.1f} KB")
- property mtime: float
Last modification time as Unix timestamp.
- Returns:
Unix timestamp of last modification time
- Return type:
Examples
>>> from datetime import datetime >>> mod_time = datetime.fromtimestamp(node.mtime) >>> print(f"Modified: {mod_time}")
- property basename: str
Filename with extension.
- Returns:
The filename including extension
- Return type:
Examples
>>> node = storage.node('home:documents/report.pdf') >>> print(node.basename) 'report.pdf'
- property stem: str
Filename without extension.
- Returns:
The filename without extension
- Return type:
Examples
>>> node = storage.node('home:documents/report.pdf') >>> print(node.stem) 'report'
- property suffix: str
File extension including dot.
- Returns:
The file extension including the leading dot (e.g., “.pdf”)
- Return type:
Examples
>>> node = storage.node('home:documents/report.pdf') >>> print(node.suffix) '.pdf'
- property parent: StorageNode
Parent directory as StorageNode.
- Returns:
A new StorageNode pointing to the parent directory
- Return type:
Examples
>>> node = storage.node('home:documents/reports/q4.pdf') >>> parent = node.parent >>> print(parent.fullpath) 'home:documents/reports'
- property dirname: str
Parent directory fullpath as string.
Convenience property that returns the fullpath of the parent directory as a string, equivalent to
parent.fullpath.- Returns:
Parent directory fullpath (e.g., ‘home:documents/reports’)
- Return type:
Examples
>>> node = storage.node('home:documents/reports/q4.pdf') >>> print(node.dirname) 'home:documents/reports' >>> >>> # Compare with parent property >>> print(node.parent.fullpath) 'home:documents/reports' >>> # dirname is a shortcut for the above
- property ext: str
File extension without leading dot.
Convenience property for getting the file extension without the dot prefix, which is more convenient for comparisons and type checking than
suffix.- Returns:
Extension without dot (e.g., ‘pdf’, ‘txt’), or empty string if no extension
- Return type:
Examples
>>> node = storage.node('home:documents/report.pdf') >>> print(node.ext) 'pdf' >>> print(node.suffix) # Compare with suffix '.pdf' >>> >>> # More convenient for comparisons >>> if node.ext == 'pdf': ... process_pdf(node) >>> >>> # Instead of remembering the dot >>> if node.suffix == '.pdf': ... process_pdf(node)
- splitext()[source]
Split path into filename and extension.
Similar to
os.path.splitext(), returns a tuple of (filename, extension). The extension includes the leading dot. The filename includes the full path without the extension.Examples
>>> node = storage.node('home:documents/report.pdf') >>> name, ext = node.splitext() >>> print(name) 'documents/report' >>> print(ext) '.pdf' >>> >>> # Useful for renaming with different extension >>> name, _ = node.splitext() >>> new_path = f'{name}.docx' >>> new_node = storage.node(f'home:{new_path}')
- property ext_attributes: tuple[float | None, int | None, bool]
Commonly-used file attributes as a tuple.
Convenience property for getting (mtime, size, isdir) together in one call. Returns None values if file doesn’t exist. Size is None for directories.
- Returns:
- (mtime, size, isdir) where:
mtime: Modification time as Unix timestamp or None
size: File size in bytes or None (None for directories)
isdir: True if directory, False otherwise
- Return type:
Examples
>>> node = storage.node('home:document.pdf') >>> mtime, size, isdir = node.ext_attributes >>> if mtime and size: ... print(f'File: {size} bytes, modified at {mtime}') >>> >>> # More concise than >>> mtime = node.mtime >>> size = node.size >>> isdir = node.isdir
- property md5hash: str
MD5 hash of file content.
For cloud storage (S3, GCS, Azure), retrieves hash from metadata (fast). For local storage, computes hash by reading file in blocks (slower).
- Returns:
MD5 hash as lowercase hexadecimal string (32 characters)
- Return type:
- Raises:
FileNotFoundError – If file doesn’t exist
ValueError – If node is a directory
Examples
>>> hash1 = node1.md5hash >>> hash2 = node2.md5hash >>> if hash1 == hash2: ... print("Files have identical content")
- property mimetype: str
Get MIME type from file extension.
Uses Python’s mimetypes module to guess the MIME type based on the file extension. Returns ‘application/octet-stream’ if type cannot be determined.
- Returns:
MIME type string (e.g., ‘image/png’, ‘application/pdf’)
- Return type:
Examples
>>> jpg = storage.node('photos:image.jpg') >>> jpg.mimetype 'image/jpeg' >>> >>> pdf = storage.node('documents:report.pdf') >>> pdf.mimetype 'application/pdf' >>> >>> # Use for HTTP responses >>> response.headers['Content-Type'] = node.mimetype
- property capabilities
Get capabilities of underlying backend.
Returns backend capabilities which describe what features are supported, such as versioning, metadata, presigned URLs, etc.
If this node is a versioned snapshot (created with version parameter), the versioning capabilities are disabled since the node is read-only.
- Returns:
Object describing supported features
- Return type:
BackendCapabilities
Examples
>>> if node.capabilities.versioning: ... versions = node.versions >>> if node.capabilities.presigned_urls: ... url = node.get_presigned_url()
- open(mode='r', version=None, as_of=None)[source]
Open file with optional version control support.
- Parameters:
mode (str) – File mode (‘r’, ‘rb’, ‘w’, ‘wb’, ‘a’, ‘ab’)
Version to open: - None: Latest version (default) - str: Specific version_id (e.g., ‘abc123…’) - int: Version index with negative indexing support:
-1: Latest version
-2: Previous version
0: Oldest version
1: Second oldest version
as_of (datetime | None) – Open file as it was at this datetime
- Returns:
File-like object (context manager)
- Return type:
BinaryIO | TextIO
- Raises:
ValueError – If both version and as_of provided, or invalid mode for historical versions
IndexError – If version index out of range
FileNotFoundError – If no version found for as_of date
PermissionError – If backend doesn’t support versioning
Examples
>>> # Latest version >>> with node.open() as f: ... data = f.read()
>>> # Previous version (pythonic!) >>> with node.open(version=-2) as f: ... previous = f.read()
>>> # Specific version by ID >>> with node.open(version='abc123xyz') as f: ... old_content = f.read()
>>> # Version at date >>> from datetime import datetime >>> with node.open(as_of=datetime(2024, 1, 15)) as f: ... historical = f.read()
- read(mode='r', encoding='utf-8')[source]
Read file content in text or binary mode.
- Parameters:
- Returns:
File content as text or bytes depending on mode
- Return type:
- Raises:
FileNotFoundError – If file doesn’t exist
ValueError – If mode is invalid
Examples
>>> # Read as text (default) >>> content = node.read() >>> content = node.read(mode='r') >>> >>> # Read as binary >>> data = node.read(mode='rb')
- write(data, mode='w', encoding='utf-8', skip_if_unchanged=False)[source]
Write data to file in text or binary mode.
- Parameters:
data (Annotated[str | bytes, 'Data to write (str for text, bytes for binary)']) – Data to write (str for text mode, bytes for binary mode)
mode (Annotated[str, "Write mode: 'w' for text, 'wb' for binary"]) – Write mode - ‘w’ for text (default), ‘wb’ for binary
encoding (Annotated[str, 'Text encoding (only for text mode)']) – Text encoding (used only for text mode)
skip_if_unchanged (Annotated[bool, 'Skip writing if content is identical']) – If True, skip writing if content identical
- Returns:
True if written, False if skipped
- Return type:
- Raises:
TypeError – If data type doesn’t match mode
ValueError – If mode is invalid
Examples
>>> # Write text (default) >>> node.write('Hello World') >>> node.write('Hello', mode='w') >>> >>> # Write binary >>> node.write(b'binary data', mode='wb') >>> >>> # Skip if unchanged >>> written = node.write('content', skip_if_unchanged=True)
- read_text(encoding='utf-8')[source]
Read file content as text.
Convenience method equivalent to read(mode=’r’, encoding=encoding). Compatible with pathlib.Path API.
- Parameters:
encoding (str) – Text encoding (default: ‘utf-8’)
- Returns:
File content as text
- Return type:
- Raises:
FileNotFoundError – If file doesn’t exist
Examples
>>> content = node.read_text() >>> content = node.read_text(encoding='latin-1')
- read_bytes()[source]
Read file content as bytes.
Convenience method equivalent to read(mode=’rb’). Compatible with pathlib.Path API.
- Returns:
File content as bytes
- Return type:
- Raises:
FileNotFoundError – If file doesn’t exist
Examples
>>> data = node.read_bytes()
- write_text(text, encoding='utf-8', skip_if_unchanged=False)[source]
Write text content to file.
Convenience method equivalent to write(text, mode=’w’, encoding=encoding, skip_if_unchanged=skip_if_unchanged). Compatible with pathlib.Path API.
- Parameters:
- Returns:
True if file was written, False if skipped
- Return type:
- Raises:
TypeError – If text is not str
ValueError – If node is a versioned snapshot (read-only)
Examples
>>> node.write_text("Hello World") >>> node.write_text("Content", encoding='latin-1') >>> written = node.write_text("New", skip_if_unchanged=True)
- write_bytes(data, skip_if_unchanged=False)[source]
Write binary content to file.
Convenience method equivalent to write(data, mode=’wb’, skip_if_unchanged=skip_if_unchanged). Compatible with pathlib.Path API.
- Parameters:
- Returns:
True if file was written, False if skipped
- Return type:
- Raises:
TypeError – If data is not bytes
ValueError – If node is a versioned snapshot (read-only)
Examples
>>> node.write_bytes(b"Binary data") >>> written = node.write_bytes(data, skip_if_unchanged=True)
- copy_to(dest, include=None, exclude=None, filter=None, skip='never', skip_fn=None, progress=None, on_file=None, on_skip=None)[source]
Copy file or directory to destination with filtering and skip logic.
Supports filtering which files to copy (source-based) and skipping existing files (destination-based) for efficient incremental backups.
- Filtering (applied to source files):
‘include’: Glob patterns for files to include (whitelist)
‘exclude’: Glob patterns for files to exclude (blacklist)
‘filter’: Custom function(node, relpath) -> bool
- Skip strategies (applied to destination files):
‘never’: Always copy (overwrite existing files) - default
‘exists’: Skip if destination file exists (fastest)
‘size’: Skip if destination exists and has same size (fast)
‘hash’: Skip if destination exists and has same content/MD5 (accurate)
‘custom’: Use custom skip function
- Parameters:
dest (StorageNode | str) – Destination node or path string
include (str | list[str] | None) – Glob pattern(s) for files to include. If specified, only matching files are copied (whitelist mode). Can be string or list of strings.
exclude (str | list[str] | None) – Glob pattern(s) for files to exclude. Applied after include. Can be string or list of strings.
filter (Callable[[StorageNode, str], bool] | None) – Custom filter function(node, relative_path) -> bool. Return True to include file, False to exclude. Applied after include/exclude patterns.
skip (SkipStrategy | Literal['never', 'exists', 'size', 'hash', 'custom']) – Skip strategy (default: ‘never’ = always copy)
skip_fn (Callable[[StorageNode, StorageNode], bool] | None) – Custom skip function(src, dest) -> bool (required if skip=’custom’)
progress (Callable[[int, int], None] | None) – Callback(current, total) called after each file
on_file (Callable[[StorageNode], None] | None) – Callback(src_node) called after each file copied
on_skip (Callable[[StorageNode, str], None] | None) – Callback(src_node, reason) called when file is skipped
- Returns:
Destination StorageNode
- Raises:
FileNotFoundError – If source doesn’t exist
ValueError – If skip=’custom’ but no skip_fn provided
- Return type:
Examples
>>> # Simple copy (overwrite) - default behavior >>> src.copy(dest) >>> >>> # Copy only Python files >>> src.copy(dest, include='*.py') >>> >>> # Copy all except logs and temp files >>> src.copy(dest, exclude=['*.log', '*.tmp', '__pycache__/**']) >>> >>> # Combine include and exclude >>> src.copy(dest, include='*.py', exclude='test_*.py') >>> >>> # Custom filter: only files smaller than 10MB >>> src.copy(dest, filter=lambda node, path: node.size < 10_000_000) >>> >>> # Filter by modification time >>> from datetime import datetime, timedelta >>> cutoff = datetime.now() - timedelta(days=7) >>> src.copy(dest, filter=lambda n, p: n.mtime > cutoff.timestamp()) >>> >>> # Combine filtering and skip strategy >>> src.copy(dest, ... include=['*.py', '*.json'], ... exclude='__pycache__/**', ... skip='hash') # Skip if content identical >>> >>> # Full-featured backup with tracking >>> src.copy(dest, ... exclude=['*.log', '*.tmp', 'node_modules/**'], ... filter=lambda n, p: n.size < 100_000_000, ... skip='hash', ... progress=lambda c, t: print(f"{c}/{t}"))
- Performance Notes:
Filtering is applied before copying (saves bandwidth)
skip=’exists’: ~1-2ms per file (only existence check)
skip=’size’: ~2-5ms per file (existence + size read)
skip=’hash’: * S3/GCS: ~5-10ms per file (ETag from metadata, fast) * Local: ~100ms per MB (must read file to compute MD5)
For cloud storage, ‘hash’ is efficient due to ETag metadata. For local storage, ‘size’ is usually sufficient.
Note
Include/exclude patterns match against relative paths from source
If copying to base64 backend, destination path will be updated
Filtering is source-based (which files to copy)
Skip logic is destination-based (whether to overwrite)
- append(node)[source]
Append a node to this virtual node (iternode only).
This method is only available for virtual nodes created with storage.iternode(). It adds a node reference to the accumulation list. Content is read lazily when materialized.
- Parameters:
node (StorageNode) – StorageNode to append
- Raises:
ValueError – If not a virtual iternode
Examples
>>> iternode = storage.iternode() >>> n1 = storage.node('mem:part1.txt') >>> iternode.append(n1) >>> content = iternode.read_text() # Materializes here
- extend(*nodes)[source]
Extend this virtual node with multiple nodes (iternode only).
This method is only available for virtual nodes created with storage.iternode(). It adds multiple node references to the accumulation list. Content is read lazily when materialized.
- Parameters:
*nodes (StorageNode) – StorageNodes to append
- Raises:
ValueError – If not a virtual iternode
Examples
>>> iternode = storage.iternode(n1) >>> iternode.extend(n2, n3, n4) >>> content = iternode.read_text() # Materializes all
- zip()[source]
Create ZIP archive from node content.
Behavior depends on node type: - Regular file: Creates ZIP containing that file - Regular directory: Creates ZIP with all files recursively - Virtual iternode: Creates ZIP with all accumulated nodes as separate files
- Returns:
ZIP archive as bytes
- Return type:
- Raises:
ValueError – If node doesn’t exist (for regular nodes)
Examples
>>> # ZIP a directory >>> docs = storage.node('home:documents') >>> zip_bytes = docs.zip() >>> >>> # ZIP accumulated files >>> iternode = storage.iternode(n1, n2, n3) >>> zip_bytes = iternode.zip() >>> >>> # Save ZIP >>> archive = storage.node('backup.zip') >>> archive.write_bytes(zip_bytes)
- child(*parts)[source]
Get a child node by path components.
- Parameters:
*parts (Annotated[str, 'Path components to append']) – Path components to append. Can be: - Single string with path separators: ‘aaa/bbb/ccc’ - Multiple strings: ‘aaa’, ‘bbb’, ‘ccc’
- Returns:
Child node with combined path
- Return type:
Examples
>>> docs = storage.node('home:documents') >>> >>> # Single path string >>> report = docs.child('2024/reports/q4.pdf') >>> >>> # Multiple components >>> report = docs.child('2024', 'reports', 'q4.pdf') >>> >>> # Both produce: 'home:documents/2024/reports/q4.pdf'
- local_path(mode='r')[source]
Get local filesystem path for this file.
Returns a context manager that provides a local filesystem path. For local storage, returns the actual path. For remote storage (S3, GCS, etc.), downloads to a temporary file, yields the temp path, and uploads changes on exit.
This is essential for integrating with external tools that only work with local filesystem paths (ffmpeg, ImageMagick, etc.).
- Parameters:
mode (str) – Access mode - ‘r’: Read-only (download, no upload) - ‘w’: Write-only (no download, upload on exit) - ‘rw’: Read-write (download and upload)
- Returns:
Context manager yielding str (local filesystem path)
Examples
>>> # Process video with ffmpeg >>> video = storage.node('s3:videos/input.mp4') >>> with video.local_path(mode='r') as path: ... subprocess.run(['ffmpeg', '-i', path, 'output.mp4']) >>> >>> # Modify image in place >>> image = storage.node('s3:photos/pic.jpg') >>> with image.local_path(mode='rw') as path: ... subprocess.run(['convert', path, '-resize', '800x600', path]) >>> # Changes automatically uploaded to S3
Notes
For local storage, returns the actual path (no copy)
For remote storage, uses temporary files
Temporary files are automatically cleaned up on exit
Large files are streamed in chunks to avoid memory issues
- call(*args, callback=None, async_mode=False, return_output=False, **subprocess_kwargs)[source]
Execute external command with automatic local_path management.
Automatically manages local filesystem paths for StorageNode arguments, downloading from cloud storage as needed and uploading changes after execution. Perfect for integrating with external tools like ffmpeg, imagemagick, pandoc, etc.
- Parameters:
*args – Command arguments (str or StorageNode) StorageNode arguments are automatically converted to local paths
callback (Callable[[], None] | None) – Function to call on completion (async mode only)
async_mode (bool) – Run in background thread (default: False)
return_output (bool) – Return subprocess output as string (default: False)
**subprocess_kwargs – Additional arguments passed to subprocess.run() (e.g., cwd, env, timeout, shell, etc.)
- Returns:
- Command output if return_output=True, None otherwise
In async mode, returns immediately (None)
- Return type:
str | None
- Raises:
subprocess.CalledProcessError – If command exits with non-zero status
FileNotFoundError – If command executable not found
Examples
>>> # Video conversion (cloud storage) >>> input_video = storage.node('s3:videos/input.mp4') >>> output_video = storage.node('s3:videos/output.mp4') >>> input_video.call('ffmpeg', '-i', input_video, '-vcodec', 'h264', output_video) >>> # Automatically downloads input, uploads output
>>> # Image resize (local storage) >>> image = storage.node('home:photos/photo.jpg') >>> image.call('convert', image, '-resize', '800x600', image)
>>> # With callback (async) >>> def on_complete(): ... print("Processing complete!") >>> video.call('ffmpeg', '-i', video, 'output.mp4', ... callback=on_complete, async_mode=True) >>> # Returns immediately, callback called when done
>>> # Capture output >>> pdf = storage.node('documents:report.pdf') >>> info = pdf.call('pdfinfo', pdf, return_output=True) >>> print(info)
>>> # With subprocess options >>> script = storage.node('scripts:process.py') >>> script.call('python', script, 'arg1', 'arg2', ... cwd='/tmp', timeout=60, env={'DEBUG': '1'})
Notes
StorageNode arguments use local_path(mode=’rw’) automatically
Files are downloaded before command execution
Modified files are uploaded after command execution
In async mode, cleanup happens in background thread
Use return_output=False for commands with large output
For shell commands, use shell=True in subprocess_kwargs
- serve(environ, start_response, download=False, download_name=None, cache_max_age=None)[source]
Serve file via WSGI interface with caching support.
Serves the file through a WSGI application with: - ETag support for caching (304 Not Modified responses) - Content-Disposition headers for downloads - Cache-Control headers - Efficient streaming for large files
Perfect for integrating storage with web frameworks like Flask, Django, Pyramid, or any WSGI application.
- Parameters:
environ (dict) – WSGI environment dict (contains HTTP headers, request info)
start_response (callable) – WSGI start_response callable
download (bool) – If True, force download with Content-Disposition: attachment
download_name (str | None) – Custom filename for downloads (default: basename of file)
cache_max_age (int | None) – Cache-Control max-age in seconds (default: no caching)
- Returns:
Response body as list of byte chunks (WSGI response)
- Return type:
- Raises:
FileNotFoundError – If file doesn’t exist
StorageError – If file cannot be read
Examples
>>> # Flask integration >>> from flask import Flask, request >>> app = Flask(__name__) >>> >>> @app.route('/files/<path:filepath>') >>> def serve_file(filepath): >>> node = storage.node(f'uploads:{filepath}') >>> return node.serve(request.environ, lambda s, h: None, >>> cache_max_age=3600) >>> >>> # Download endpoint >>> @app.route('/download/<path:filepath>') >>> def download_file(filepath): >>> node = storage.node(f'uploads:{filepath}') >>> return node.serve(request.environ, lambda s, h: None, >>> download=True, >>> download_name='report.pdf') >>> >>> # Plain WSGI application >>> def application(environ, start_response): >>> path = environ['PATH_INFO'] >>> node = storage.node(f'static:{path}') >>> if not node.exists: >>> start_response('404 Not Found', [('Content-Type', 'text/plain')]) >>> return [b'Not Found'] >>> return node.serve(environ, start_response, cache_max_age=86400)
Notes
ETag is computed as “{mtime}-{size}” for efficient caching
Returns 304 Not Modified when client ETag matches
Uses local_path() for efficient cloud storage serving
Streams large files in chunks (doesn’t load entire file in memory)
- get_metadata()[source]
Get custom metadata for this file.
Returns user-defined metadata attached to the file. Supported for cloud storage (S3, GCS, Azure). For local storage, returns empty dict.
- Returns:
Metadata key-value pairs
- Return type:
- Raises:
FileNotFoundError – If file doesn’t exist
Examples
>>> file = storage.node('s3:documents/report.pdf') >>> metadata = file.get_metadata() >>> print(metadata.get('Author')) 'John Doe'
- set_metadata(metadata)[source]
Set custom metadata for this file.
Attaches user-defined metadata to the file. Supported for cloud storage (S3, GCS, Azure). For local storage, raises PermissionError.
- Parameters:
metadata (Annotated[dict[str, str], 'Metadata key-value pairs to set']) – Metadata key-value pairs to set
- Raises:
FileNotFoundError – If file doesn’t exist
PermissionError – If backend doesn’t support metadata
ValueError – If metadata keys/values are invalid
Examples
>>> file = storage.node('s3:documents/report.pdf') >>> file.set_metadata({ ... 'Author': 'John Doe', ... 'Version': '1.0', ... 'Department': 'Engineering' ... })
Notes
Keys and values must be strings
This typically replaces all existing metadata
Cloud providers may have size/format restrictions
- url(expires_in=3600, **kwargs)[source]
Generate public URL for accessing this file.
Returns a URL that can be used to access the file directly. For cloud storage (S3, GCS), generates a presigned/signed URL. For HTTP storage, returns the direct URL. For local storage, returns None.
- Parameters:
expires_in (int) – URL expiration time in seconds (default: 3600 = 1 hour)
**kwargs – Backend-specific options
- Returns:
Public URL or None if not supported
- Return type:
str | None
Examples
>>> # S3 presigned URL >>> file = storage.node('s3:documents/report.pdf') >>> url = file.url() >>> print(url) 'https://bucket.s3.amazonaws.com/documents/report.pdf?X-Amz-...' >>> >>> # Custom expiration (24 hours) >>> url = file.url(expires_in=86400)
Notes
Cloud storage URLs are temporary and expire
Use this for sharing files externally
HTTP URLs are direct (no expiration)
- internal_url(nocache=False)[source]
Generate internal/relative URL for this file.
Returns a URL suitable for internal application use. Optionally includes cache busting parameters.
- Parameters:
nocache (bool) – If True, append mtime for cache busting
- Returns:
Internal URL or None if not supported
- Return type:
str | None
Examples
>>> file = storage.node('home:static/app.js') >>> url = file.internal_url(nocache=True) >>> print(url) '/storage/home/static/app.js?mtime=1234567890'
Notes
Useful for web applications
Cache busting helps with CDN/browser caching
- property versions: list[dict]
Get list of available versions for this file.
Returns version history for versioned storage (S3 with versioning enabled). For non-versioned storage, returns empty list.
Examples
>>> file = storage.node('s3:documents/report.pdf') >>> for v in file.versions: ... print(f"Version {v['version_id']}: {v['last_modified']}")
Notes
Only S3 with versioning enabled returns versions
Empty list if versioning not supported
- property version_count: int
Get total number of versions available.
- Returns:
Number of versions, or 0 if versioning not supported
- Return type:
Examples
>>> print(f"File has {node.version_count} versions")
- compact_versions(dry_run=False)[source]
Compact version history by removing consecutive duplicates.
Scans version history and removes versions that have identical content to the immediately preceding version. This cleans up unnecessary versions created by repeated writes of the same content, reducing storage costs.
The rule: For each pair of consecutive versions with the same ETag, delete the second (more recent) one, keeping the first (older) one.
Non-consecutive duplicates are preserved to maintain history (e.g., reverting to an earlier state).
- Parameters:
dry_run (bool) – If True, only report what would be deleted without actually deleting
- Returns:
Number of versions removed (or would be removed if dry_run=True)
- Return type:
- Raises:
PermissionError – If versioning not supported
Examples
>>> # Check what would be removed >>> count = node.compact_versions(dry_run=True) >>> print(f"Would remove {count} duplicate versions")
>>> # Actually compact the history >>> removed = node.compact_versions() >>> print(f"Removed {removed} redundant versions")
Notes
Only works with backends that support versioning
Requires backend to support version deletion (S3)
Preserves the oldest of each duplicate pair
History of changes is maintained (non-consecutive duplicates kept)
Useful for reducing storage costs on versioned buckets
- Example scenario:
v1: content A (etag: xxx) v2: content A (etag: xxx) ← REMOVED (consecutive duplicate) v3: content B (etag: yyy) v4: content B (etag: yyy) ← REMOVED (consecutive duplicate) v5: content A (etag: xxx) ← KEPT (not consecutive to v1, shows revert)
- fill_from_url(url, timeout=30)[source]
Download content from URL and write to this file.
Fetches content from the specified URL and writes it to this storage node. Useful for downloading files from the internet into storage.
- Parameters:
- Raises:
ValueError – If URL is invalid
IOError – If download fails
PermissionError – If storage is read-only
Examples
>>> # Download image from internet >>> img = storage.node('s3:downloads/logo.png') >>> img.fill_from_url('https://example.com/logo.png') >>> >>> # Download with custom timeout >>> file = storage.node('local:data.json') >>> file.fill_from_url('https://api.example.com/data', timeout=60)
Notes
Uses urllib for HTTP requests (no external dependencies)
Overwrites existing file if present
Parent directory must exist or backend must support auto-creation
- to_base64(mime=None, include_uri=True)[source]
Encode file content as base64 string.
Converts the file content to a base64-encoded string, optionally formatted as a data URI for direct embedding in HTML/CSS.
- Parameters:
- Returns:
Base64-encoded string or data URI
- Return type:
- Raises:
FileNotFoundError – If file doesn’t exist
ValueError – If node is a directory
Examples
>>> # Data URI with auto-detected MIME type >>> img = storage.node('images:logo.png') >>> data_uri = img.to_base64() >>> print(data_uri) 'data:image/png;base64,iVBORw0KGgo...' >>> >>> # Raw base64 without URI wrapper >>> b64 = img.to_base64(include_uri=False) >>> print(b64) 'iVBORw0KGgo...' >>> >>> # Custom MIME type >>> data_uri = img.to_base64(mime='image/x-icon')
Notes
Useful for embedding small images/files in HTML
MIME type auto-detection based on file extension
Large files will result in very long strings
- __eq__(other)[source]
Compare nodes by content (MD5 hash).
Two nodes are considered equal if they have the same file content, regardless of their path or location. Comparison is done via MD5 hash.
- Parameters:
other (object) – Another StorageNode or object to compare
- Returns:
True if both nodes have identical content
- Return type:
Examples
>>> file1 = storage.node('home:original.txt') >>> file2 = storage.node('backup:copy.txt') >>> if file1 == file2: ... print("Files have identical content")
Notes
Only files can be compared (directories return False)
Non-existent files return False
Comparing with non-StorageNode returns NotImplemented
Exceptions
Exception classes for genro-storage.
All exceptions inherit from StorageError base class for easy catching. Exceptions also inherit from standard Python exceptions where appropriate to maintain compatibility with existing code.
- exception genro_storage.exceptions.StorageError[source]
Bases:
ExceptionBase exception for all storage-related errors.
This is the base class that all genro-storage exceptions inherit from. You can catch this to handle any storage-related error.
Examples
>>> try: ... node.read_bytes() ... except StorageError as e: ... print(f"Storage error occurred: {e}")
- exception genro_storage.exceptions.StorageNotFoundError[source]
Bases:
StorageError,FileNotFoundErrorRaised when a file, directory, or mount point is not found.
This exception inherits from both StorageError and FileNotFoundError, so it can be caught by either exception type.
- Common causes:
Attempting to access a mount point that hasn’t been configured
Reading a file that doesn’t exist
Accessing a path in a non-existent directory
Examples
>>> try: ... node = storage.node('missing_mount:file.txt') ... except StorageNotFoundError: ... print("Mount or file not found")
- exception genro_storage.exceptions.StoragePermissionError[source]
Bases:
StorageError,PermissionErrorRaised when a permission-related error occurs.
This exception inherits from both StorageError and PermissionError, so it can be caught by either exception type.
- Common causes:
Insufficient permissions to read/write a file
Insufficient AWS/GCS/Azure credentials or permissions
Attempting to write to a read-only storage backend (e.g., HTTP)
Examples
>>> try: ... node.write_bytes(b'data') ... except StoragePermissionError: ... print("Permission denied")
- exception genro_storage.exceptions.StorageConfigError[source]
Bases:
StorageError,ValueErrorRaised when configuration is invalid.
This exception inherits from both StorageError and ValueError, so it can be caught by either exception type.
- Common causes:
Invalid configuration format (missing required fields)
Unsupported storage backend type
Invalid path format
Malformed YAML/JSON configuration file
Examples
>>> try: ... storage.configure([{'name': 'test'}]) # missing 'type' ... except StorageConfigError as e: ... print(f"Configuration error: {e}")
Backend Classes
Base Backend
- class genro_storage.backends.StorageBackend[source]
Bases:
ABCAbstract base class for storage backends.
All storage backend implementations (Local, S3, GCS, Azure, HTTP, etc.) must inherit from this class and implement all abstract methods.
This ensures a consistent interface across all storage types and makes it easy to add new backends in the future.
Note
Backend implementations should not be instantiated directly by users. They are created internally by StorageManager based on configuration.
- Capability System:
Capabilities are automatically derived from methods decorated with @capability. The decorator populates the _capabilities set during class definition, and __init_subclass__ ensures proper inheritance.
- classmethod __init_subclass__(**kwargs)[source]
Automatically collect and inherit capabilities when subclass is created.
This method is called when a subclass of StorageBackend is defined. It collects PROTOCOL_CAPABILITIES from parent classes and merges them.
- classmethod get_capabilities(protocol=None)[source]
Get capability set for a given protocol.
For single-protocol backends (LocalStorage, Base64Backend), protocol parameter is optional and defaults to the backend’s only protocol. For multi-protocol backends (FsspecBackend), protocol must be specified.
- Parameters:
protocol (str | None) – Protocol name (e.g., ‘s3’, ‘gcs’, ‘local’, ‘base64’) If None, returns capabilities for the only available protocol
- Returns:
Set of capability names
- Return type:
Examples
>>> # Single-protocol backend >>> LocalStorage.get_capabilities() # protocol auto-detected {'read', 'write', 'delete', 'mkdir', ...}
>>> # Multi-protocol backend >>> FsspecBackend.get_capabilities('s3') {'read', 'write', 'metadata', 'presigned_urls', ...}
- property capabilities: BackendCapabilities
Return the capabilities of this backend instance.
For single-protocol backends, automatically uses the only protocol. For multi-protocol backends (FsspecBackend), uses self.protocol.
- Returns:
Object describing supported features
- Return type:
BackendCapabilities
Examples
>>> backend = LocalStorage('/tmp') >>> caps = backend.capabilities >>> if caps.versioning: ... versions = backend.get_versions('file.txt')
- classmethod get_json_info(protocol=None)[source]
Return complete backend information in JSON format.
This classmethod can be overridden by backend subclasses to provide complete information including configuration schema, capabilities, and description. This is useful for UI generation and documentation.
The default implementation returns capabilities derived from @capability decorators, but no schema information.
- Parameters:
protocol (str | None) – Protocol name for multi-protocol backends (optional for single-protocol)
- Returns:
Backend information with schema, capabilities, and description
- Return type:
Examples
>>> # Single-protocol backend >>> info = LocalStorage.get_json_info() >>> print(info['schema']['fields']) [{'name': 'path', 'type': 'text', 'required': True}]
>>> # Multi-protocol backend >>> info = FsspecBackend.get_json_info('s3') >>> print(info['capabilities']['metadata']) True
- abstractmethod exists(path)[source]
Check if a file or directory exists.
- Parameters:
path (str) – Relative path within this storage backend
- Returns:
True if file or directory exists
- Return type:
Examples
>>> exists = backend.exists('documents/report.pdf')
- abstractmethod is_file(path)[source]
Check if path points to a file.
- Parameters:
path (str) – Relative path within this storage backend
- Returns:
True if path is a file, False otherwise
- Return type:
Examples
>>> if backend.is_file('documents/report.pdf'): ... print("It's a file")
- abstractmethod is_dir(path)[source]
Check if path points to a directory.
- Parameters:
path (str) – Relative path within this storage backend
- Returns:
True if path is a directory, False otherwise
- Return type:
Examples
>>> if backend.is_dir('documents'): ... print("It's a directory")
- abstractmethod size(path)[source]
Get file size in bytes.
- Parameters:
path (str) – Relative path to file
- Returns:
File size in bytes
- Return type:
- Raises:
FileNotFoundError – If file doesn’t exist
ValueError – If path is a directory
Examples
>>> size = backend.size('documents/report.pdf') >>> print(f"File is {size} bytes")
- abstractmethod mtime(path)[source]
Get last modification time.
- Parameters:
path (str) – Relative path to file or directory
- Returns:
Unix timestamp of last modification
- Return type:
- Raises:
FileNotFoundError – If path doesn’t exist
Examples
>>> from datetime import datetime >>> timestamp = backend.mtime('documents/report.pdf') >>> mod_time = datetime.fromtimestamp(timestamp)
- abstractmethod open(path, mode='rb')[source]
Open a file and return file-like object.
- Parameters:
- Returns:
File-like object supporting context manager
- Return type:
BinaryIO | TextIO
- Raises:
FileNotFoundError – If file doesn’t exist (in read mode)
PermissionError – If insufficient permissions
Examples
>>> with backend.open('file.txt', 'rb') as f: ... data = f.read()
- abstractmethod read_bytes(path)[source]
Read entire file as bytes.
- Parameters:
path (str) – Relative path to file
- Returns:
Complete file contents
- Return type:
- Raises:
FileNotFoundError – If file doesn’t exist
Examples
>>> data = backend.read_bytes('image.jpg')
- abstractmethod read_text(path, encoding='utf-8')[source]
Read entire file as text.
- Parameters:
- Returns:
Complete file contents as string
- Return type:
- Raises:
FileNotFoundError – If file doesn’t exist
UnicodeDecodeError – If encoding is incorrect
Examples
>>> content = backend.read_text('document.txt')
- abstractmethod write_bytes(path, data)[source]
Write bytes to file.
- Parameters:
- Raises:
PermissionError – If insufficient permissions
FileNotFoundError – If parent directory doesn’t exist
Examples
>>> backend.write_bytes('file.bin', b'Hello')
- abstractmethod write_text(path, text, encoding='utf-8')[source]
Write text to file.
- Parameters:
- Raises:
PermissionError – If insufficient permissions
FileNotFoundError – If parent directory doesn’t exist
Examples
>>> backend.write_text('file.txt', 'Hello World')
- abstractmethod delete(path, recursive=False)[source]
Delete file or directory.
- Parameters:
- Raises:
FileNotFoundError – If path doesn’t exist (implementation may choose to be idempotent)
ValueError – If path is non-empty directory and recursive=False
Examples
>>> backend.delete('file.txt') >>> backend.delete('folder', recursive=True)
- abstractmethod list_dir(path)[source]
List directory contents.
- Parameters:
path (str) – Relative path to directory
- Returns:
List of names (not full paths) in the directory
- Return type:
- Raises:
FileNotFoundError – If directory doesn’t exist
ValueError – If path is not a directory
Examples
>>> names = backend.list_dir('documents') >>> for name in names: ... print(name) # Just 'report.pdf', not 'documents/report.pdf'
- abstractmethod mkdir(path, parents=False, exist_ok=False)[source]
Create directory.
- Parameters:
- Raises:
FileExistsError – If exists and exist_ok=False
FileNotFoundError – If parent doesn’t exist and parents=False
Examples
>>> backend.mkdir('new_folder') >>> backend.mkdir('a/b/c', parents=True)
- abstractmethod copy(src_path, dest_backend, dest_path)[source]
Copy file/directory to another backend.
This method handles cross-backend copying efficiently, streaming data when possible to avoid loading large files in memory.
- Parameters:
src_path (str) – Source path in this backend
dest_backend (StorageBackend) – Destination backend (may be different type)
dest_path (str) – Destination path in dest_backend
- Returns:
- New destination path if destination backend changes it
(e.g., base64 backend), or None if path unchanged
- Return type:
str | None
- Raises:
FileNotFoundError – If source doesn’t exist
PermissionError – If insufficient permissions
Examples
>>> # Copy within same backend >>> backend.copy('file.txt', backend, 'backup/file.txt') >>> >>> # Copy to different backend >>> backend.copy('file.txt', other_backend, 'file.txt')
- get_hash(path)[source]
Get MD5 hash from filesystem metadata if available.
This method attempts to retrieve the MD5 hash from the storage backend’s metadata without reading the file content. For cloud storage like S3, this uses the ETag. For local storage, this returns None and the hash must be computed by reading the file.
- Parameters:
path (str) – Relative path to file
- Returns:
MD5 hash as hexadecimal string, or None if not available
- Return type:
str | None
Examples
>>> hash_value = backend.get_hash('file.txt') >>> if hash_value: ... print(f"MD5: {hash_value}")
- get_metadata(path)[source]
Get custom metadata for a file.
Returns user-defined metadata attached to the file. For cloud storage (S3, GCS, Azure), this retrieves custom metadata stored with the file. For local storage, this typically returns an empty dict or uses extended attributes if supported.
- Parameters:
path (str) – Relative path to file
- Returns:
Metadata key-value pairs
- Return type:
Examples
>>> metadata = backend.get_metadata('document.pdf') >>> print(metadata.get('Content-Type')) 'application/pdf'
Notes
Keys and values are strings
Cloud storage may have restrictions on key names (e.g., lowercase only)
Returns empty dict if no metadata or not supported
- set_metadata(path, metadata)[source]
Set custom metadata for a file.
Attaches user-defined metadata to the file. For cloud storage (S3, GCS, Azure), this sets custom metadata that persists with the file. For local storage, this may use extended attributes if supported, or raise PermissionError if not supported.
- Parameters:
- Raises:
FileNotFoundError – If file doesn’t exist
PermissionError – If backend doesn’t support metadata
Examples
>>> backend.set_metadata('document.pdf', { ... 'Content-Type': 'application/pdf', ... 'Author': 'John Doe', ... 'Version': '1.0' ... })
Notes
Keys and values must be strings
Cloud storage may have restrictions (e.g., max metadata size)
This typically replaces all metadata (not merge)
- get_versions(path)[source]
Get list of available versions for a file.
Returns version history for versioned storage. Default implementation returns empty list (no versioning support).
- Parameters:
path (str) – Relative path to file
- Returns:
List of version info dicts
- Return type:
Notes
Override in subclasses that support versioning
S3 with versioning enabled can implement this
- open_version(path, version_id, mode='rb')[source]
Open a specific version of a file.
Default implementation raises PermissionError. Override in subclasses that support versioning (e.g., S3).
- Parameters:
- Raises:
PermissionError – Always (base implementation)
- delete_version(path, version_id)[source]
Delete a specific version of a file.
Removes a specific version from versioned storage. The current version and other versions remain unaffected. This is useful for cleaning up duplicate or unwanted versions.
Default implementation raises PermissionError. Override in subclasses that support versioning (e.g., S3).
- Parameters:
- Raises:
PermissionError – If backend doesn’t support versioning
FileNotFoundError – If version doesn’t exist
ValueError – If attempting to delete the only remaining version
Examples
>>> # Delete a specific version >>> backend.delete_version('file.txt', 'abc123')
Notes
Cannot delete the current version if it’s the only version
Some backends may have restrictions on version deletion
This operation is typically irreversible
- url(path, expires_in=3600, **kwargs)[source]
Generate public URL for file access.
Returns a URL that can be used to access the file directly. For cloud storage (S3, GCS, Azure), this generates a presigned URL. For local storage, this returns None or a local file path URL.
- Parameters:
- Returns:
Public URL or None if not supported
- Return type:
str | None
Examples
>>> # S3 presigned URL (expires in 1 hour) >>> url = backend.url('documents/report.pdf') >>> print(url) 'https://bucket.s3.amazonaws.com/documents/report.pdf?X-Amz-...' >>> >>> # Custom expiration (24 hours) >>> url = backend.url('video.mp4', expires_in=86400)
Notes
Cloud storage URLs are temporary and expire
Local storage typically returns None
HTTP storage returns the direct URL
- internal_url(path, nocache=False)[source]
Generate internal/relative URL for file access.
Returns a URL suitable for internal application use, typically relative to the application’s base URL. Optionally includes cache busting parameters.
- Parameters:
- Returns:
Internal URL or None if not supported
- Return type:
str | None
Examples
>>> # Simple internal URL >>> url = backend.internal_url('images/logo.png') >>> print(url) '/storage/home/images/logo.png' >>> >>> # With cache busting >>> url = backend.internal_url('app.js', nocache=True) >>> print(url) '/storage/home/app.js?mtime=1234567890'
Notes
Useful for web applications
Cache busting helps with CDN/browser caching
Format depends on application configuration
- local_path(path, mode='r')[source]
Get a local filesystem path for the file.
Returns a context manager that provides a local filesystem path to the file. For local storage, this returns the actual path. For remote storage (S3, GCS, etc.), this downloads the file to a temporary location, yields the temp path, and uploads changes back on exit if the file was modified.
This is essential for integrating with external tools that only work with local filesystem paths (ffmpeg, ImageMagick, etc.).
- Parameters:
- Returns:
Context manager yielding str (local filesystem path)
Examples
>>> # Process remote file with external tool >>> with backend.local_path('video.mp4', mode='r') as local_path: ... subprocess.run(['ffmpeg', '-i', local_path, 'output.mp4']) >>> >>> # Modify remote file in place >>> with backend.local_path('image.jpg', mode='rw') as local_path: ... subprocess.run(['convert', local_path, '-resize', '800x600', local_path]) >>> # Changes automatically uploaded on exit
Notes
For read mode (‘r’), the file is downloaded but not uploaded
For write mode (‘w’), the file is uploaded on exit
For read-write mode (‘rw’), both download and upload occur
Temporary files are automatically cleaned up on exit
For local storage, returns the original path (no copy)
- close()[source]
Close backend and release resources.
This method is called when the backend is no longer needed. Implementations should close any open connections, file handles, etc.
The default implementation does nothing. Backends that manage resources should override this method.
Examples
>>> backend.close()
Local Storage
- class genro_storage.backends.LocalStorage(path)[source]
Bases:
StorageBackendLocal filesystem storage backend.
This backend provides access to files on the local filesystem. All paths are relative to a configured base directory.
The base_path can be either a string or a callable that returns a string. When a callable is provided, it will be evaluated each time the base_path property is accessed, allowing for dynamic paths (e.g., user-specific directories).
- Parameters:
path (Union[str, Callable[[], str]]) – Absolute path to the base directory, or callable returning path
- Raises:
ValueError – If resolved path is not absolute or not a directory
FileNotFoundError – If resolved path doesn’t exist
Examples
>>> # Static path >>> backend = LocalStorage('/home/user') >>> >>> # Dynamic path with context-based callable (no parameters) >>> def get_user_dir(): ... user_id = get_current_user() ... return f'/data/users/{user_id}' >>> backend = LocalStorage(get_user_dir) >>> >>> # Switched mount: callable with prefix parameter >>> # Single mount behaves like multiple mounts based on first path component >>> def resource_resolver(prefix): ... # prefix = 'sys', 'adm', 'gnr', etc. ... return f'/path/to/{prefix}-package' >>> backend = LocalStorage(resource_resolver) >>> # Accessing 'sys/folder/file.txt' routes to '/path/to/sys-package/folder/file.txt' >>> # Accessing 'adm/folder/file.txt' routes to '/path/to/adm-package/folder/file.txt' >>> >>> # Access files relative to base >>> data = backend.read_bytes('documents/report.pdf')
Note
Switched Mounts: When the callable accepts a parameter, it receives the first path component (prefix) and should return the base directory for that prefix. The backend then appends the remaining path. This allows a single mount to route to different base directories based on the prefix.
- __init__(path)[source]
Initialize LocalStorage backend.
- Parameters:
path (str | Callable[[], str]) – Absolute path or callable returning absolute path
- Raises:
ValueError – If path (string only) is not absolute or not a directory
FileNotFoundError – If path (string only) doesn’t exist
Note
When path is a callable, validation is deferred until first access. This allows configuration before the context (e.g., current user) is available.
- property base_path: Path
Get current base path (evaluates callable if needed).
- Returns:
Current base path as Path object
- classmethod get_json_info()[source]
Return complete backend information in JSON format.
- Returns:
Backend information with schema, capabilities, and description.
- Return type:
- copy(src_path, dest_backend, dest_path)[source]
Copy file/directory to another backend.
For local-to-local copies, uses efficient filesystem operations. For copies to other backends, streams the data.
- local_path(path, mode='r')[source]
Get local filesystem path (returns the actual path).
For local storage, this simply returns the actual filesystem path since the file is already local. No temporary copy is needed.
- Parameters:
- Returns:
Context manager yielding str (the actual filesystem path)
Examples
>>> with backend.local_path('video.mp4') as local_path: ... subprocess.run(['ffmpeg', '-i', local_path, 'out.mp4'])
Base64 Backend
- class genro_storage.backends.Base64Backend[source]
Bases:
StorageBackendStorage backend that decodes base64 data from the path/URI.
This backend treats the path as base64-encoded data and provides read-only access to the decoded content. It’s useful for embedding small amounts of data directly in URIs without requiring actual file storage.
- _creation_time
Fixed timestamp for mtime() calls
- property capabilities: BackendCapabilities
Return the capabilities of this backend.
Overrides the base implementation to add base64-specific meta-capabilities.
- classmethod get_json_info()[source]
Return complete backend information in JSON format.
- Returns:
Backend information with schema, capabilities, and description.
- Return type:
- size(path)[source]
Get size of decoded data in bytes.
- Parameters:
path (str) – Base64-encoded string
- Returns:
Size of decoded data
- Raises:
FileNotFoundError – If invalid base64
- Return type:
- mtime(path)[source]
Get modification time.
- Parameters:
path (str) – Base64-encoded string
- Returns:
Fixed timestamp (base64 data has no modification time)
- Raises:
FileNotFoundError – If invalid base64
- Return type:
- open(path, mode='rb')[source]
Open base64 data as file-like object.
- Parameters:
- Returns:
File-like object (BytesIO or StringIO)
- Raises:
FileNotFoundError – If invalid base64 (read modes only)
- Return type:
Note
Write modes return empty BytesIO/StringIO. The caller must handle retrieving the content and calling write_bytes/write_text to get the new base64 path.
- read_bytes(path)[source]
Read and decode base64 data.
- Parameters:
path (str) – Base64-encoded string
- Returns:
Decoded bytes
- Raises:
FileNotFoundError – If invalid base64
- Return type:
- read_text(path, encoding='utf-8')[source]
Read and decode base64 data as text.
- Parameters:
- Returns:
Decoded text string
- Raises:
FileNotFoundError – If invalid base64
UnicodeDecodeError – If data is not valid text
- Return type:
- write_bytes(path, data)[source]
Write bytes to base64 node.
Creates a new base64-encoded string from the data. The path parameter is ignored as the base64 content itself becomes the new path.
- Parameters:
- Returns:
New base64-encoded path
- Return type:
Note
This operation changes the node’s path to the new base64 string. The old path becomes invalid.
Examples
>>> new_path = backend.write_bytes("old", b"Hello") >>> # new_path is now "SGVsbG8=" (base64 of "Hello")
- write_text(path, text, encoding='utf-8')[source]
Write text to base64 node.
Creates a new base64-encoded string from the text. The path parameter is ignored as the base64 content itself becomes the new path.
- Parameters:
- Returns:
New base64-encoded path
- Return type:
Note
This operation changes the node’s path to the new base64 string. The old path becomes invalid.
Examples
>>> new_path = backend.write_text("old", "Hello World") >>> # new_path is now "SGVsbG8gV29ybGQ=" (base64 of "Hello World")
- delete(path, recursive=False)[source]
Delete operation not supported.
- Parameters:
- Raises:
PermissionError – Always (read-only backend)
- list_dir(path)[source]
List directory contents.
- Parameters:
path (str) – Base64-encoded string
- Returns:
Empty list
- Raises:
ValueError – Always (no directories in base64 backend)
- Return type:
- mkdir(path, parents=False, exist_ok=False)[source]
Create directory operation not supported.
- Parameters:
- Raises:
PermissionError – Always (read-only backend)
- copy(src_path, dest_backend, dest_path)[source]
Copy base64 data to another backend.
This decodes the base64 data and writes it to the destination backend.
- Parameters:
src_path (str) – Base64-encoded source data
dest_backend (StorageBackend) – Destination backend
dest_path (str) – Destination path
- Returns:
- New destination path if destination backend changes it,
or None if path unchanged
- Return type:
str | None
- Raises:
FileNotFoundError – If invalid base64
- get_hash(path)[source]
Get MD5 hash of decoded data.
- Parameters:
path (str) – Base64-encoded string
- Returns:
MD5 hash of decoded data
- Raises:
FileNotFoundError – If invalid base64
- Return type:
str | None
- local_path(path, mode='r')[source]
Get local filesystem path for base64 data.
Creates a temporary file with the decoded base64 content. Since Base64Backend is read-only, write modes are not supported.
- Parameters:
- Returns:
Context manager yielding str (temp file path)
- Raises:
PermissionError – If mode is not ‘r’
FileNotFoundError – If invalid base64
Examples
>>> # Use base64 data with external tool >>> node = storage.node('b64:SGVsbG8gV29ybGQ=') >>> with node.local_path() as path: ... subprocess.run(['cat', path])