API Reference

This page provides the complete API documentation for genro-storage.

StorageManager

class genro_storage.StorageManager[source]

Bases: object

Main entry point for configuring and accessing storage.

StorageManager is responsible for: - Configuring mount points that map to storage backends - Creating StorageNode instances for file/directory access - Managing the lifecycle of storage backend connections

A mount point is a logical name (e.g., “home”, “uploads”, “s3”) that maps to an actual storage backend (local filesystem, S3 bucket, etc.).

Examples

>>> # Create manager
>>> storage = StorageManager()
>>>
>>> # Configure from file
>>> storage.configure('/etc/app/storage.yaml')
>>>
>>> # Configure programmatically
>>> storage.configure([
...     {'name': 'home', 'type': 'local', 'path': '/home/user'},
...     {'name': 'uploads', 'type': 's3', 'bucket': 'my-bucket'}
... ])
>>>
>>> # Access files
>>> node = storage.node('home:documents/report.pdf')
>>> content = node.read_text()
__init__()[source]

Initialize a new StorageManager with no configured mounts.

After initialization, you must call configure() to set up mount points before you can access any files.

Examples

>>> from genro_storage import StorageManager
>>> storage = StorageManager()
configure(source)[source]

Configure mount points from various sources.

This method can be called multiple times. If a mount with the same name already exists, it will be replaced with the new configuration.

Parameters:

source (Annotated[str | list[dict[str, Any]], 'Configuration source: path to YAML/JSON file or list of mount configurations']) – Configuration source, can be: - str: Path to YAML or JSON configuration file - list[dict]: List of mount configurations

Raises:
Configuration Dictionary Format:

Each mount configuration dict must have:

  • name (str, required): Mount point name (e.g., “home”, “uploads”)

  • type (str, required): Backend type (“local”, “s3”, “gcs”, “azure”, “http”, “memory”)

  • Additional fields depend on type (see examples below)

Examples

Local Storage:

>>> storage.configure([{
...     'name': 'home',
...     'type': 'local',
...     'path': '/home/user'  # required: absolute path
... }])

S3 Storage:

>>> storage.configure([{
...     'name': 'uploads',
...     'type': 's3',
...     'bucket': 'my-bucket',    # required
...     'prefix': 'uploads/',     # optional, default: ""
...     'region': 'eu-west-1',    # optional
...     'anon': False             # optional, default: False
... }])

GCS Storage:

>>> storage.configure([{
...     'name': 'backups',
...     'type': 'gcs',
...     'bucket': 'my-backups',   # required
...     'prefix': '',             # optional
...     'token': 'path/to/service-account.json'  # optional
... }])

Azure Blob Storage:

>>> storage.configure([{
...     'name': 'archive',
...     'type': 'azure',
...     'container': 'archives',      # required
...     'account_name': 'myaccount',  # required
...     'account_key': '...'          # optional if using managed identity
... }])

HTTP Storage (read-only):

>>> storage.configure([{
...     'name': 'cdn',
...     'type': 'http',
...     'base_url': 'https://cdn.example.com'  # required
... }])

Memory Storage (for testing):

>>> storage.configure([{
...     'name': 'test',
...     'type': 'memory'
... }])

From YAML File:

# storage.yaml
- name: home
  type: local
  path: /home/user

- name: uploads
  type: s3
  bucket: my-app-uploads
  region: eu-west-1
>>> storage.configure('/etc/app/storage.yaml')

From JSON File:

[
  {
    "name": "home",
    "type": "local",
    "path": "/home/user"
  },
  {
    "name": "uploads",
    "type": "s3",
    "bucket": "my-app-uploads",
    "region": "eu-west-1"
  }
]
>>> storage.configure('./config/storage.json')

Multiple Calls (mounts are replaced if same name):

>>> storage.configure([{'name': 'home', 'type': 'local', 'path': '/home/user'}])
>>> storage.configure([{'name': 'uploads', 'type': 's3', 'bucket': 'my-bucket'}])
>>> # Now both 'home' and 'uploads' are configured
add_mount(config)[source]

Add or update a single mount point.

If a mount with the same name already exists, it will be replaced.

Parameters:

config (Annotated[dict[str, Any], 'Mount configuration dictionary']) – Mount configuration dictionary with ‘name’ and ‘type’ fields

Raises:

StorageConfigError – If configuration is invalid

Examples

>>> storage.add_mount({
...     'name': 'uploads',
...     'type': 's3',
...     'bucket': 'my-bucket'
... })
delete_mount(name)[source]

Delete a mount point.

Parameters:

name (Annotated[str, 'Mount point name to delete']) – Name of the mount point to remove

Raises:

KeyError – If mount point doesn’t exist

Examples

>>> storage.delete_mount('uploads')
node(mount_or_path=None, *path_parts, version=None)[source]

Create a StorageNode pointing to a file or directory.

This is the primary way to access files and directories. The path uses a mount:path format where the mount name refers to a configured storage backend.

When called without arguments, creates a dummy/accumulator node that can be used to build content from multiple sources.

Parameters:
  • mount_or_path (Annotated[str | None, 'Mount name or full path (mount:path format), or None for dummy node']) – Either: - Full path with mount: “mount:path/to/file” - Just mount name: “mount” - None: creates a dummy accumulator node (no storage backend)

  • *path_parts (str) – Additional path components to join

  • version (Annotated[int | str | None, 'Optional version: int for index (-1=latest), str for version_id']) – Optional version specifier for versioned storage (S3, GCS). If specified, creates a read-only snapshot node of that version. Can be int (index: -1=latest, -2=previous) or str (version_id).

Returns:

A new StorageNode instance

Return type:

StorageNode

Raises:
  • KeyError – If mount point doesn’t exist (wrapped as StorageNotFoundError)

  • ValueError – If path format is invalid

Path Normalization:
  • Multiple slashes collapsed: “a//b” → “a/b”

  • Leading/trailing slashes stripped

  • No support for “..” (parent directory) - raises ValueError

Examples

Full path in one string:

>>> node = storage.node('home:documents/report.pdf')

Mount + path parts:

>>> node = storage.node('home', 'documents', 'report.pdf')

Mix styles:

>>> node = storage.node('home:documents', 'reports', 'q4.pdf')

Dynamic composition:

>>> user_id = '123'
>>> year = '2024'
>>> node = storage.node('uploads', 'users', user_id, year, 'avatar.jpg')
>>> # Result: uploads:users/123/2024/avatar.jpg

Just mount (root of storage):

>>> node = storage.node('home')
>>> # Result: home:

Path with special characters:

>>> # Spaces and unicode are OK
>>> node = storage.node('home:My Documents/Café Menu.pdf')

Invalid paths (will raise ValueError):

>>> # Parent directory traversal not allowed
>>> node = storage.node('home:documents/../etc/passwd')  # ValueError

Dummy node (accumulator):

>>> dummy = storage.node()  # No parameters
>>> dummy.append(node1)
>>> dummy.extend(node2, node3)
>>> dummy.read_text()  # Concatenates all sources
iternode(*nodes)[source]

Create a virtual node that concatenates multiple nodes lazily.

This creates a virtual node (no physical storage) that accumulates references to other nodes. Content is only read when materialized via read_text(), read_bytes(), copy(), or zip().

Parameters:

*nodes – StorageNode instances to concatenate

Returns:

Virtual node with concatenation capability

Return type:

StorageNode

Examples

>>> # Create from existing nodes
>>> n1 = storage.node('mem:part1.txt')
>>> n2 = storage.node('mem:part2.txt')
>>> combined = storage.iternode(n1, n2)
>>>
>>> # Read concatenated content
>>> content = combined.read_text()
>>>
>>> # Add more nodes
>>> n3 = storage.node('mem:part3.txt')
>>> combined.append(n3)
>>>
>>> # Save to file
>>> result = storage.node('mem:result.txt')
>>> combined.copy(result)
>>>
>>> # Create ZIP
>>> zip_bytes = combined.zip()
diffnode(node1, node2)[source]

Create a virtual node that generates a diff between two nodes.

This creates a virtual node that generates a unified diff between two text files. The diff is only computed when materialized via read_text() or copy().

Parameters:
Returns:

Virtual node with diff capability

Return type:

StorageNode

Raises:

ValueError – If nodes contain binary data

Examples

>>> # Compare two versions
>>> v1 = storage.node('mem:config_v1.txt')
>>> v2 = storage.node('mem:config_v2.txt')
>>> diff = storage.diffnode(v1, v2)
>>>
>>> # Read diff
>>> changes = diff.read_text()
>>>
>>> # Save diff to file
>>> diff_file = storage.node('mem:changes.diff')
>>> diff.copy(diff_file)
get_mount_names()[source]

Get list of configured mount names.

Returns:

List of mount point names

Return type:

list[str]

Examples

>>> storage.configure([
...     {'name': 'home', 'type': 'local', 'path': '/home/user'},
...     {'name': 'uploads', 'type': 's3', 'bucket': 'my-bucket'}
... ])
>>> print(storage.get_mount_names())
['home', 'uploads']
has_mount(name)[source]

Check if a mount point is configured.

Parameters:

name (Annotated[str, 'Mount point name to check']) – Mount point name to check

Returns:

True if mount exists

Return type:

bool

Examples

>>> if storage.has_mount('uploads'):
...     node = storage.node('uploads:file.txt')
... else:
...     print("Uploads storage not configured")
__repr__()[source]

String representation for debugging.

StorageNode

class genro_storage.StorageNode(manager, mount_name, path, version=None)[source]

Bases: object

Represents a file or directory in a storage backend.

StorageNode provides a unified interface for file operations across different storage backends (local, S3, GCS, Azure, HTTP, etc.).

Note

Users should not instantiate StorageNode directly. Use StorageManager.node() instead.

The node can represent either a file or a directory. Use the properties isfile and isdir to determine the type.

Examples

>>> # Get a node via StorageManager
>>> node = storage.node('home:documents/report.pdf')
>>>
>>> # Check if it exists
>>> if node.exists:
...     print(f"File size: {node.size} bytes")
>>>
>>> # Read content
>>> content = node.read_text()
>>>
>>> # Write content
>>> node.write_text("Hello World")
fullpath

Full path including mount point (e.g., “home:documents/file.txt”)

exists

True if file or directory exists

isfile

True if node points to a file

isdir

True if node points to a directory

size

File size in bytes

mtime

Last modification time as Unix timestamp

basename

Filename with extension

stem

Filename without extension

suffix

File extension including dot

parent

Parent directory as StorageNode

__init__(manager, mount_name, path, version=None)[source]

Initialize a StorageNode.

Parameters:
  • manager (StorageManager) – The StorageManager instance that owns this node

  • mount_name (str | None) – Name of the mount point (e.g., “home”, “uploads”), or None for dummy node

  • path (str | None) – Relative path within the mount (e.g., “documents/file.txt”), or None for dummy node

  • version (int | str | None) – Optional version specifier for versioned storage. If set, the node becomes a read-only snapshot of that version.

Note

This should not be called directly. Use StorageManager.node() instead.

property fullpath: str

Full path including mount point.

Returns:

Full path in format “mount:path/to/file”

Return type:

str

Examples

>>> node = storage.node('home:documents/report.pdf')
>>> print(node.fullpath)
'home:documents/report.pdf'
property path: str

Relative path within the mount.

Returns:

Path relative to mount point (without mount prefix)

Return type:

str

Examples

>>> node = storage.node('home:documents/report.pdf')
>>> print(node.path)
'documents/report.pdf'
>>> # For base64 backend, this is the base64-encoded content
>>> node = storage.node('b64:SGVsbG8=')
>>> print(node.path)
'SGVsbG8='
property exists: bool

True if file or directory exists.

Returns:

True if the file or directory exists on the storage backend.

Virtual nodes always return False.

Return type:

bool

Examples

>>> if node.exists:
...     print("File exists!")
... else:
...     print("File not found")
property isfile: bool

True if node points to a file.

Returns:

True if this node is a file, False if directory or doesn’t exist

Return type:

bool

Examples

>>> if node.isfile:
...     data = node._read_bytes()
property isdir: bool

True if node points to a directory.

Returns:

True if this node is a directory, False if file or doesn’t exist

Return type:

bool

Examples

>>> if node.isdir:
...     for child in node.children():
...         print(child.basename)
property size: int

File size in bytes.

Returns:

Size of the file in bytes

Return type:

int

Raises:

Examples

>>> print(f"File size: {node.size} bytes")
>>> print(f"File size: {node.size / 1024:.1f} KB")
property mtime: float

Last modification time as Unix timestamp.

Returns:

Unix timestamp of last modification time

Return type:

float

Examples

>>> from datetime import datetime
>>> mod_time = datetime.fromtimestamp(node.mtime)
>>> print(f"Modified: {mod_time}")
property basename: str

Filename with extension.

Returns:

The filename including extension

Return type:

str

Examples

>>> node = storage.node('home:documents/report.pdf')
>>> print(node.basename)
'report.pdf'
property stem: str

Filename without extension.

Returns:

The filename without extension

Return type:

str

Examples

>>> node = storage.node('home:documents/report.pdf')
>>> print(node.stem)
'report'
property suffix: str

File extension including dot.

Returns:

The file extension including the leading dot (e.g., “.pdf”)

Return type:

str

Examples

>>> node = storage.node('home:documents/report.pdf')
>>> print(node.suffix)
'.pdf'
property parent: StorageNode

Parent directory as StorageNode.

Returns:

A new StorageNode pointing to the parent directory

Return type:

StorageNode

Examples

>>> node = storage.node('home:documents/reports/q4.pdf')
>>> parent = node.parent
>>> print(parent.fullpath)
'home:documents/reports'
property dirname: str

Parent directory fullpath as string.

Convenience property that returns the fullpath of the parent directory as a string, equivalent to parent.fullpath.

Returns:

Parent directory fullpath (e.g., ‘home:documents/reports’)

Return type:

str

Examples

>>> node = storage.node('home:documents/reports/q4.pdf')
>>> print(node.dirname)
'home:documents/reports'
>>>
>>> # Compare with parent property
>>> print(node.parent.fullpath)
'home:documents/reports'
>>> # dirname is a shortcut for the above
property ext: str

File extension without leading dot.

Convenience property for getting the file extension without the dot prefix, which is more convenient for comparisons and type checking than suffix.

Returns:

Extension without dot (e.g., ‘pdf’, ‘txt’), or empty string if no extension

Return type:

str

Examples

>>> node = storage.node('home:documents/report.pdf')
>>> print(node.ext)
'pdf'
>>> print(node.suffix)  # Compare with suffix
'.pdf'
>>>
>>> # More convenient for comparisons
>>> if node.ext == 'pdf':
...     process_pdf(node)
>>>
>>> # Instead of remembering the dot
>>> if node.suffix == '.pdf':
...     process_pdf(node)
splitext()[source]

Split path into filename and extension.

Similar to os.path.splitext(), returns a tuple of (filename, extension). The extension includes the leading dot. The filename includes the full path without the extension.

Returns:

(filename, extension) where extension includes the dot

Return type:

tuple[str, str]

Examples

>>> node = storage.node('home:documents/report.pdf')
>>> name, ext = node.splitext()
>>> print(name)
'documents/report'
>>> print(ext)
'.pdf'
>>>
>>> # Useful for renaming with different extension
>>> name, _ = node.splitext()
>>> new_path = f'{name}.docx'
>>> new_node = storage.node(f'home:{new_path}')
property ext_attributes: tuple[float | None, int | None, bool]

Commonly-used file attributes as a tuple.

Convenience property for getting (mtime, size, isdir) together in one call. Returns None values if file doesn’t exist. Size is None for directories.

Returns:

(mtime, size, isdir) where:
  • mtime: Modification time as Unix timestamp or None

  • size: File size in bytes or None (None for directories)

  • isdir: True if directory, False otherwise

Return type:

tuple

Examples

>>> node = storage.node('home:document.pdf')
>>> mtime, size, isdir = node.ext_attributes
>>> if mtime and size:
...     print(f'File: {size} bytes, modified at {mtime}')
>>>
>>> # More concise than
>>> mtime = node.mtime
>>> size = node.size
>>> isdir = node.isdir
property md5hash: str

MD5 hash of file content.

For cloud storage (S3, GCS, Azure), retrieves hash from metadata (fast). For local storage, computes hash by reading file in blocks (slower).

Returns:

MD5 hash as lowercase hexadecimal string (32 characters)

Return type:

str

Raises:

Examples

>>> hash1 = node1.md5hash
>>> hash2 = node2.md5hash
>>> if hash1 == hash2:
...     print("Files have identical content")
property mimetype: str

Get MIME type from file extension.

Uses Python’s mimetypes module to guess the MIME type based on the file extension. Returns ‘application/octet-stream’ if type cannot be determined.

Returns:

MIME type string (e.g., ‘image/png’, ‘application/pdf’)

Return type:

str

Examples

>>> jpg = storage.node('photos:image.jpg')
>>> jpg.mimetype
'image/jpeg'
>>>
>>> pdf = storage.node('documents:report.pdf')
>>> pdf.mimetype
'application/pdf'
>>>
>>> # Use for HTTP responses
>>> response.headers['Content-Type'] = node.mimetype
property capabilities

Get capabilities of underlying backend.

Returns backend capabilities which describe what features are supported, such as versioning, metadata, presigned URLs, etc.

If this node is a versioned snapshot (created with version parameter), the versioning capabilities are disabled since the node is read-only.

Returns:

Object describing supported features

Return type:

BackendCapabilities

Examples

>>> if node.capabilities.versioning:
...     versions = node.versions
>>> if node.capabilities.presigned_urls:
...     url = node.get_presigned_url()
open(mode='r', version=None, as_of=None)[source]

Open file with optional version control support.

Parameters:
  • mode (str) – File mode (‘r’, ‘rb’, ‘w’, ‘wb’, ‘a’, ‘ab’)

  • version (int | str | None) –

    Version to open: - None: Latest version (default) - str: Specific version_id (e.g., ‘abc123…’) - int: Version index with negative indexing support:

    • -1: Latest version

    • -2: Previous version

    • 0: Oldest version

    • 1: Second oldest version

  • as_of (datetime | None) – Open file as it was at this datetime

Returns:

File-like object (context manager)

Return type:

BinaryIO | TextIO

Raises:

Examples

>>> # Latest version
>>> with node.open() as f:
...     data = f.read()
>>> # Previous version (pythonic!)
>>> with node.open(version=-2) as f:
...     previous = f.read()
>>> # Specific version by ID
>>> with node.open(version='abc123xyz') as f:
...     old_content = f.read()
>>> # Version at date
>>> from datetime import datetime
>>> with node.open(as_of=datetime(2024, 1, 15)) as f:
...     historical = f.read()
read(mode='r', encoding='utf-8')[source]

Read file content in text or binary mode.

Parameters:
  • mode (Annotated[str, "Read mode: 'r' for text, 'rb' for binary"]) – Read mode - ‘r’ for text (default), ‘rb’ for binary

  • encoding (Annotated[str, 'Text encoding (only for text mode)']) – Text encoding (used only for text mode)

Returns:

File content as text or bytes depending on mode

Return type:

str | bytes

Raises:

Examples

>>> # Read as text (default)
>>> content = node.read()
>>> content = node.read(mode='r')
>>>
>>> # Read as binary
>>> data = node.read(mode='rb')
write(data, mode='w', encoding='utf-8', skip_if_unchanged=False)[source]

Write data to file in text or binary mode.

Parameters:
  • data (Annotated[str | bytes, 'Data to write (str for text, bytes for binary)']) – Data to write (str for text mode, bytes for binary mode)

  • mode (Annotated[str, "Write mode: 'w' for text, 'wb' for binary"]) – Write mode - ‘w’ for text (default), ‘wb’ for binary

  • encoding (Annotated[str, 'Text encoding (only for text mode)']) – Text encoding (used only for text mode)

  • skip_if_unchanged (Annotated[bool, 'Skip writing if content is identical']) – If True, skip writing if content identical

Returns:

True if written, False if skipped

Return type:

bool

Raises:

Examples

>>> # Write text (default)
>>> node.write('Hello World')
>>> node.write('Hello', mode='w')
>>>
>>> # Write binary
>>> node.write(b'binary data', mode='wb')
>>>
>>> # Skip if unchanged
>>> written = node.write('content', skip_if_unchanged=True)
read_text(encoding='utf-8')[source]

Read file content as text.

Convenience method equivalent to read(mode=’r’, encoding=encoding). Compatible with pathlib.Path API.

Parameters:

encoding (str) – Text encoding (default: ‘utf-8’)

Returns:

File content as text

Return type:

str

Raises:

FileNotFoundError – If file doesn’t exist

Examples

>>> content = node.read_text()
>>> content = node.read_text(encoding='latin-1')
read_bytes()[source]

Read file content as bytes.

Convenience method equivalent to read(mode=’rb’). Compatible with pathlib.Path API.

Returns:

File content as bytes

Return type:

bytes

Raises:

FileNotFoundError – If file doesn’t exist

Examples

>>> data = node.read_bytes()
write_text(text, encoding='utf-8', skip_if_unchanged=False)[source]

Write text content to file.

Convenience method equivalent to write(text, mode=’w’, encoding=encoding, skip_if_unchanged=skip_if_unchanged). Compatible with pathlib.Path API.

Parameters:
  • text (str) – Text content to write

  • encoding (str) – Text encoding (default: ‘utf-8’)

  • skip_if_unchanged (bool) – Skip write if content identical (default: False)

Returns:

True if file was written, False if skipped

Return type:

bool

Raises:

Examples

>>> node.write_text("Hello World")
>>> node.write_text("Content", encoding='latin-1')
>>> written = node.write_text("New", skip_if_unchanged=True)
write_bytes(data, skip_if_unchanged=False)[source]

Write binary content to file.

Convenience method equivalent to write(data, mode=’wb’, skip_if_unchanged=skip_if_unchanged). Compatible with pathlib.Path API.

Parameters:
  • data (bytes) – Binary content to write

  • skip_if_unchanged (bool) – Skip write if content identical (default: False)

Returns:

True if file was written, False if skipped

Return type:

bool

Raises:
  • TypeError – If data is not bytes

  • ValueError – If node is a versioned snapshot (read-only)

Examples

>>> node.write_bytes(b"Binary data")
>>> written = node.write_bytes(data, skip_if_unchanged=True)
delete()[source]

Delete file or directory.

copy_to(dest, include=None, exclude=None, filter=None, skip='never', skip_fn=None, progress=None, on_file=None, on_skip=None)[source]

Copy file or directory to destination with filtering and skip logic.

Supports filtering which files to copy (source-based) and skipping existing files (destination-based) for efficient incremental backups.

Filtering (applied to source files):
  • ‘include’: Glob patterns for files to include (whitelist)

  • ‘exclude’: Glob patterns for files to exclude (blacklist)

  • ‘filter’: Custom function(node, relpath) -> bool

Skip strategies (applied to destination files):
  • ‘never’: Always copy (overwrite existing files) - default

  • ‘exists’: Skip if destination file exists (fastest)

  • ‘size’: Skip if destination exists and has same size (fast)

  • ‘hash’: Skip if destination exists and has same content/MD5 (accurate)

  • ‘custom’: Use custom skip function

Parameters:
  • dest (StorageNode | str) – Destination node or path string

  • include (str | list[str] | None) – Glob pattern(s) for files to include. If specified, only matching files are copied (whitelist mode). Can be string or list of strings.

  • exclude (str | list[str] | None) – Glob pattern(s) for files to exclude. Applied after include. Can be string or list of strings.

  • filter (Callable[[StorageNode, str], bool] | None) – Custom filter function(node, relative_path) -> bool. Return True to include file, False to exclude. Applied after include/exclude patterns.

  • skip (SkipStrategy | Literal['never', 'exists', 'size', 'hash', 'custom']) – Skip strategy (default: ‘never’ = always copy)

  • skip_fn (Callable[[StorageNode, StorageNode], bool] | None) – Custom skip function(src, dest) -> bool (required if skip=’custom’)

  • progress (Callable[[int, int], None] | None) – Callback(current, total) called after each file

  • on_file (Callable[[StorageNode], None] | None) – Callback(src_node) called after each file copied

  • on_skip (Callable[[StorageNode, str], None] | None) – Callback(src_node, reason) called when file is skipped

Returns:

Destination StorageNode

Raises:
Return type:

StorageNode

Examples

>>> # Simple copy (overwrite) - default behavior
>>> src.copy(dest)
>>>
>>> # Copy only Python files
>>> src.copy(dest, include='*.py')
>>>
>>> # Copy all except logs and temp files
>>> src.copy(dest, exclude=['*.log', '*.tmp', '__pycache__/**'])
>>>
>>> # Combine include and exclude
>>> src.copy(dest, include='*.py', exclude='test_*.py')
>>>
>>> # Custom filter: only files smaller than 10MB
>>> src.copy(dest, filter=lambda node, path: node.size < 10_000_000)
>>>
>>> # Filter by modification time
>>> from datetime import datetime, timedelta
>>> cutoff = datetime.now() - timedelta(days=7)
>>> src.copy(dest, filter=lambda n, p: n.mtime > cutoff.timestamp())
>>>
>>> # Combine filtering and skip strategy
>>> src.copy(dest,
...          include=['*.py', '*.json'],
...          exclude='__pycache__/**',
...          skip='hash')  # Skip if content identical
>>>
>>> # Full-featured backup with tracking
>>> src.copy(dest,
...          exclude=['*.log', '*.tmp', 'node_modules/**'],
...          filter=lambda n, p: n.size < 100_000_000,
...          skip='hash',
...          progress=lambda c, t: print(f"{c}/{t}"))
Performance Notes:
  • Filtering is applied before copying (saves bandwidth)

  • skip=’exists’: ~1-2ms per file (only existence check)

  • skip=’size’: ~2-5ms per file (existence + size read)

  • skip=’hash’: * S3/GCS: ~5-10ms per file (ETag from metadata, fast) * Local: ~100ms per MB (must read file to compute MD5)

For cloud storage, ‘hash’ is efficient due to ETag metadata. For local storage, ‘size’ is usually sufficient.

Note

  • Include/exclude patterns match against relative paths from source

  • If copying to base64 backend, destination path will be updated

  • Filtering is source-based (which files to copy)

  • Skip logic is destination-based (whether to overwrite)

move_to(dest)[source]

Move file/directory to destination.

append(node)[source]

Append a node to this virtual node (iternode only).

This method is only available for virtual nodes created with storage.iternode(). It adds a node reference to the accumulation list. Content is read lazily when materialized.

Parameters:

node (StorageNode) – StorageNode to append

Raises:

ValueError – If not a virtual iternode

Examples

>>> iternode = storage.iternode()
>>> n1 = storage.node('mem:part1.txt')
>>> iternode.append(n1)
>>> content = iternode.read_text()  # Materializes here
extend(*nodes)[source]

Extend this virtual node with multiple nodes (iternode only).

This method is only available for virtual nodes created with storage.iternode(). It adds multiple node references to the accumulation list. Content is read lazily when materialized.

Parameters:

*nodes (StorageNode) – StorageNodes to append

Raises:

ValueError – If not a virtual iternode

Examples

>>> iternode = storage.iternode(n1)
>>> iternode.extend(n2, n3, n4)
>>> content = iternode.read_text()  # Materializes all
zip()[source]

Create ZIP archive from node content.

Behavior depends on node type: - Regular file: Creates ZIP containing that file - Regular directory: Creates ZIP with all files recursively - Virtual iternode: Creates ZIP with all accumulated nodes as separate files

Returns:

ZIP archive as bytes

Return type:

bytes

Raises:

ValueError – If node doesn’t exist (for regular nodes)

Examples

>>> # ZIP a directory
>>> docs = storage.node('home:documents')
>>> zip_bytes = docs.zip()
>>>
>>> # ZIP accumulated files
>>> iternode = storage.iternode(n1, n2, n3)
>>> zip_bytes = iternode.zip()
>>>
>>> # Save ZIP
>>> archive = storage.node('backup.zip')
>>> archive.write_bytes(zip_bytes)
children()[source]

List child nodes (if directory).

child(*parts)[source]

Get a child node by path components.

Parameters:

*parts (Annotated[str, 'Path components to append']) – Path components to append. Can be: - Single string with path separators: ‘aaa/bbb/ccc’ - Multiple strings: ‘aaa’, ‘bbb’, ‘ccc’

Returns:

Child node with combined path

Return type:

StorageNode

Examples

>>> docs = storage.node('home:documents')
>>>
>>> # Single path string
>>> report = docs.child('2024/reports/q4.pdf')
>>>
>>> # Multiple components
>>> report = docs.child('2024', 'reports', 'q4.pdf')
>>>
>>> # Both produce: 'home:documents/2024/reports/q4.pdf'
mkdir(parents=False, exist_ok=False)[source]

Create directory.

local_path(mode='r')[source]

Get local filesystem path for this file.

Returns a context manager that provides a local filesystem path. For local storage, returns the actual path. For remote storage (S3, GCS, etc.), downloads to a temporary file, yields the temp path, and uploads changes on exit.

This is essential for integrating with external tools that only work with local filesystem paths (ffmpeg, ImageMagick, etc.).

Parameters:

mode (str) – Access mode - ‘r’: Read-only (download, no upload) - ‘w’: Write-only (no download, upload on exit) - ‘rw’: Read-write (download and upload)

Returns:

Context manager yielding str (local filesystem path)

Examples

>>> # Process video with ffmpeg
>>> video = storage.node('s3:videos/input.mp4')
>>> with video.local_path(mode='r') as path:
...     subprocess.run(['ffmpeg', '-i', path, 'output.mp4'])
>>>
>>> # Modify image in place
>>> image = storage.node('s3:photos/pic.jpg')
>>> with image.local_path(mode='rw') as path:
...     subprocess.run(['convert', path, '-resize', '800x600', path])
>>> # Changes automatically uploaded to S3

Notes

  • For local storage, returns the actual path (no copy)

  • For remote storage, uses temporary files

  • Temporary files are automatically cleaned up on exit

  • Large files are streamed in chunks to avoid memory issues

call(*args, callback=None, async_mode=False, return_output=False, **subprocess_kwargs)[source]

Execute external command with automatic local_path management.

Automatically manages local filesystem paths for StorageNode arguments, downloading from cloud storage as needed and uploading changes after execution. Perfect for integrating with external tools like ffmpeg, imagemagick, pandoc, etc.

Parameters:
  • *args – Command arguments (str or StorageNode) StorageNode arguments are automatically converted to local paths

  • callback (Callable[[], None] | None) – Function to call on completion (async mode only)

  • async_mode (bool) – Run in background thread (default: False)

  • return_output (bool) – Return subprocess output as string (default: False)

  • **subprocess_kwargs – Additional arguments passed to subprocess.run() (e.g., cwd, env, timeout, shell, etc.)

Returns:

Command output if return_output=True, None otherwise

In async mode, returns immediately (None)

Return type:

str | None

Raises:

Examples

>>> # Video conversion (cloud storage)
>>> input_video = storage.node('s3:videos/input.mp4')
>>> output_video = storage.node('s3:videos/output.mp4')
>>> input_video.call('ffmpeg', '-i', input_video, '-vcodec', 'h264', output_video)
>>> # Automatically downloads input, uploads output
>>> # Image resize (local storage)
>>> image = storage.node('home:photos/photo.jpg')
>>> image.call('convert', image, '-resize', '800x600', image)
>>> # With callback (async)
>>> def on_complete():
...     print("Processing complete!")
>>> video.call('ffmpeg', '-i', video, 'output.mp4',
...           callback=on_complete, async_mode=True)
>>> # Returns immediately, callback called when done
>>> # Capture output
>>> pdf = storage.node('documents:report.pdf')
>>> info = pdf.call('pdfinfo', pdf, return_output=True)
>>> print(info)
>>> # With subprocess options
>>> script = storage.node('scripts:process.py')
>>> script.call('python', script, 'arg1', 'arg2',
...            cwd='/tmp', timeout=60, env={'DEBUG': '1'})

Notes

  • StorageNode arguments use local_path(mode=’rw’) automatically

  • Files are downloaded before command execution

  • Modified files are uploaded after command execution

  • In async mode, cleanup happens in background thread

  • Use return_output=False for commands with large output

  • For shell commands, use shell=True in subprocess_kwargs

serve(environ, start_response, download=False, download_name=None, cache_max_age=None)[source]

Serve file via WSGI interface with caching support.

Serves the file through a WSGI application with: - ETag support for caching (304 Not Modified responses) - Content-Disposition headers for downloads - Cache-Control headers - Efficient streaming for large files

Perfect for integrating storage with web frameworks like Flask, Django, Pyramid, or any WSGI application.

Parameters:
  • environ (dict) – WSGI environment dict (contains HTTP headers, request info)

  • start_response (callable) – WSGI start_response callable

  • download (bool) – If True, force download with Content-Disposition: attachment

  • download_name (str | None) – Custom filename for downloads (default: basename of file)

  • cache_max_age (int | None) – Cache-Control max-age in seconds (default: no caching)

Returns:

Response body as list of byte chunks (WSGI response)

Return type:

list[bytes]

Raises:

Examples

>>> # Flask integration
>>> from flask import Flask, request
>>> app = Flask(__name__)
>>>
>>> @app.route('/files/<path:filepath>')
>>> def serve_file(filepath):
>>>     node = storage.node(f'uploads:{filepath}')
>>>     return node.serve(request.environ, lambda s, h: None,
>>>                       cache_max_age=3600)
>>>
>>> # Download endpoint
>>> @app.route('/download/<path:filepath>')
>>> def download_file(filepath):
>>>     node = storage.node(f'uploads:{filepath}')
>>>     return node.serve(request.environ, lambda s, h: None,
>>>                       download=True,
>>>                       download_name='report.pdf')
>>>
>>> # Plain WSGI application
>>> def application(environ, start_response):
>>>     path = environ['PATH_INFO']
>>>     node = storage.node(f'static:{path}')
>>>     if not node.exists:
>>>         start_response('404 Not Found', [('Content-Type', 'text/plain')])
>>>         return [b'Not Found']
>>>     return node.serve(environ, start_response, cache_max_age=86400)

Notes

  • ETag is computed as “{mtime}-{size}” for efficient caching

  • Returns 304 Not Modified when client ETag matches

  • Uses local_path() for efficient cloud storage serving

  • Streams large files in chunks (doesn’t load entire file in memory)

get_metadata()[source]

Get custom metadata for this file.

Returns user-defined metadata attached to the file. Supported for cloud storage (S3, GCS, Azure). For local storage, returns empty dict.

Returns:

Metadata key-value pairs

Return type:

dict[str, str]

Raises:

FileNotFoundError – If file doesn’t exist

Examples

>>> file = storage.node('s3:documents/report.pdf')
>>> metadata = file.get_metadata()
>>> print(metadata.get('Author'))
'John Doe'
set_metadata(metadata)[source]

Set custom metadata for this file.

Attaches user-defined metadata to the file. Supported for cloud storage (S3, GCS, Azure). For local storage, raises PermissionError.

Parameters:

metadata (Annotated[dict[str, str], 'Metadata key-value pairs to set']) – Metadata key-value pairs to set

Raises:

Examples

>>> file = storage.node('s3:documents/report.pdf')
>>> file.set_metadata({
...     'Author': 'John Doe',
...     'Version': '1.0',
...     'Department': 'Engineering'
... })

Notes

  • Keys and values must be strings

  • This typically replaces all existing metadata

  • Cloud providers may have size/format restrictions

url(expires_in=3600, **kwargs)[source]

Generate public URL for accessing this file.

Returns a URL that can be used to access the file directly. For cloud storage (S3, GCS), generates a presigned/signed URL. For HTTP storage, returns the direct URL. For local storage, returns None.

Parameters:
  • expires_in (int) – URL expiration time in seconds (default: 3600 = 1 hour)

  • **kwargs – Backend-specific options

Returns:

Public URL or None if not supported

Return type:

str | None

Examples

>>> # S3 presigned URL
>>> file = storage.node('s3:documents/report.pdf')
>>> url = file.url()
>>> print(url)
'https://bucket.s3.amazonaws.com/documents/report.pdf?X-Amz-...'
>>>
>>> # Custom expiration (24 hours)
>>> url = file.url(expires_in=86400)

Notes

  • Cloud storage URLs are temporary and expire

  • Use this for sharing files externally

  • HTTP URLs are direct (no expiration)

internal_url(nocache=False)[source]

Generate internal/relative URL for this file.

Returns a URL suitable for internal application use. Optionally includes cache busting parameters.

Parameters:

nocache (bool) – If True, append mtime for cache busting

Returns:

Internal URL or None if not supported

Return type:

str | None

Examples

>>> file = storage.node('home:static/app.js')
>>> url = file.internal_url(nocache=True)
>>> print(url)
'/storage/home/static/app.js?mtime=1234567890'

Notes

  • Useful for web applications

  • Cache busting helps with CDN/browser caching

property versions: list[dict]

Get list of available versions for this file.

Returns version history for versioned storage (S3 with versioning enabled). For non-versioned storage, returns empty list.

Returns:

List of version info dicts

Return type:

list[dict]

Examples

>>> file = storage.node('s3:documents/report.pdf')
>>> for v in file.versions:
...     print(f"Version {v['version_id']}: {v['last_modified']}")

Notes

  • Only S3 with versioning enabled returns versions

  • Empty list if versioning not supported

property version_count: int

Get total number of versions available.

Returns:

Number of versions, or 0 if versioning not supported

Return type:

int

Examples

>>> print(f"File has {node.version_count} versions")
compact_versions(dry_run=False)[source]

Compact version history by removing consecutive duplicates.

Scans version history and removes versions that have identical content to the immediately preceding version. This cleans up unnecessary versions created by repeated writes of the same content, reducing storage costs.

The rule: For each pair of consecutive versions with the same ETag, delete the second (more recent) one, keeping the first (older) one.

Non-consecutive duplicates are preserved to maintain history (e.g., reverting to an earlier state).

Parameters:

dry_run (bool) – If True, only report what would be deleted without actually deleting

Returns:

Number of versions removed (or would be removed if dry_run=True)

Return type:

int

Raises:

PermissionError – If versioning not supported

Examples

>>> # Check what would be removed
>>> count = node.compact_versions(dry_run=True)
>>> print(f"Would remove {count} duplicate versions")
>>> # Actually compact the history
>>> removed = node.compact_versions()
>>> print(f"Removed {removed} redundant versions")

Notes

  • Only works with backends that support versioning

  • Requires backend to support version deletion (S3)

  • Preserves the oldest of each duplicate pair

  • History of changes is maintained (non-consecutive duplicates kept)

  • Useful for reducing storage costs on versioned buckets

Example scenario:

v1: content A (etag: xxx) v2: content A (etag: xxx) ← REMOVED (consecutive duplicate) v3: content B (etag: yyy) v4: content B (etag: yyy) ← REMOVED (consecutive duplicate) v5: content A (etag: xxx) ← KEPT (not consecutive to v1, shows revert)

fill_from_url(url, timeout=30)[source]

Download content from URL and write to this file.

Fetches content from the specified URL and writes it to this storage node. Useful for downloading files from the internet into storage.

Parameters:
  • url (str) – URL to download from (http:// or https://)

  • timeout (int) – Request timeout in seconds (default: 30)

Raises:

Examples

>>> # Download image from internet
>>> img = storage.node('s3:downloads/logo.png')
>>> img.fill_from_url('https://example.com/logo.png')
>>>
>>> # Download with custom timeout
>>> file = storage.node('local:data.json')
>>> file.fill_from_url('https://api.example.com/data', timeout=60)

Notes

  • Uses urllib for HTTP requests (no external dependencies)

  • Overwrites existing file if present

  • Parent directory must exist or backend must support auto-creation

to_base64(mime=None, include_uri=True)[source]

Encode file content as base64 string.

Converts the file content to a base64-encoded string, optionally formatted as a data URI for direct embedding in HTML/CSS.

Parameters:
  • mime (str | None) – MIME type to include in data URI (auto-detected if None)

  • include_uri (bool) – If True, format as data URI; if False, return raw base64

Returns:

Base64-encoded string or data URI

Return type:

str

Raises:

Examples

>>> # Data URI with auto-detected MIME type
>>> img = storage.node('images:logo.png')
>>> data_uri = img.to_base64()
>>> print(data_uri)
'data:image/png;base64,iVBORw0KGgo...'
>>>
>>> # Raw base64 without URI wrapper
>>> b64 = img.to_base64(include_uri=False)
>>> print(b64)
'iVBORw0KGgo...'
>>>
>>> # Custom MIME type
>>> data_uri = img.to_base64(mime='image/x-icon')

Notes

  • Useful for embedding small images/files in HTML

  • MIME type auto-detection based on file extension

  • Large files will result in very long strings

__repr__()[source]

String representation for debugging.

__str__()[source]

String representation.

__eq__(other)[source]

Compare nodes by content (MD5 hash).

Two nodes are considered equal if they have the same file content, regardless of their path or location. Comparison is done via MD5 hash.

Parameters:

other (object) – Another StorageNode or object to compare

Returns:

True if both nodes have identical content

Return type:

bool

Examples

>>> file1 = storage.node('home:original.txt')
>>> file2 = storage.node('backup:copy.txt')
>>> if file1 == file2:
...     print("Files have identical content")

Notes

  • Only files can be compared (directories return False)

  • Non-existent files return False

  • Comparing with non-StorageNode returns NotImplemented

__ne__(other)[source]

Compare nodes for inequality.

Parameters:

other (object) – Another StorageNode or object to compare

Returns:

True if nodes have different content

Return type:

bool

Examples

>>> if file1 != file2:
...     print("Files differ")

Exceptions

Exception classes for genro-storage.

All exceptions inherit from StorageError base class for easy catching. Exceptions also inherit from standard Python exceptions where appropriate to maintain compatibility with existing code.

exception genro_storage.exceptions.StorageError[source]

Bases: Exception

Base exception for all storage-related errors.

This is the base class that all genro-storage exceptions inherit from. You can catch this to handle any storage-related error.

Examples

>>> try:
...     node.read_bytes()
... except StorageError as e:
...     print(f"Storage error occurred: {e}")
exception genro_storage.exceptions.StorageNotFoundError[source]

Bases: StorageError, FileNotFoundError

Raised when a file, directory, or mount point is not found.

This exception inherits from both StorageError and FileNotFoundError, so it can be caught by either exception type.

Common causes:
  • Attempting to access a mount point that hasn’t been configured

  • Reading a file that doesn’t exist

  • Accessing a path in a non-existent directory

Examples

>>> try:
...     node = storage.node('missing_mount:file.txt')
... except StorageNotFoundError:
...     print("Mount or file not found")
exception genro_storage.exceptions.StoragePermissionError[source]

Bases: StorageError, PermissionError

Raised when a permission-related error occurs.

This exception inherits from both StorageError and PermissionError, so it can be caught by either exception type.

Common causes:
  • Insufficient permissions to read/write a file

  • Insufficient AWS/GCS/Azure credentials or permissions

  • Attempting to write to a read-only storage backend (e.g., HTTP)

Examples

>>> try:
...     node.write_bytes(b'data')
... except StoragePermissionError:
...     print("Permission denied")
exception genro_storage.exceptions.StorageConfigError[source]

Bases: StorageError, ValueError

Raised when configuration is invalid.

This exception inherits from both StorageError and ValueError, so it can be caught by either exception type.

Common causes:
  • Invalid configuration format (missing required fields)

  • Unsupported storage backend type

  • Invalid path format

  • Malformed YAML/JSON configuration file

Examples

>>> try:
...     storage.configure([{'name': 'test'}])  # missing 'type'
... except StorageConfigError as e:
...     print(f"Configuration error: {e}")

Backend Classes

Base Backend

class genro_storage.backends.StorageBackend[source]

Bases: ABC

Abstract base class for storage backends.

All storage backend implementations (Local, S3, GCS, Azure, HTTP, etc.) must inherit from this class and implement all abstract methods.

This ensures a consistent interface across all storage types and makes it easy to add new backends in the future.

Note

Backend implementations should not be instantiated directly by users. They are created internally by StorageManager based on configuration.

Capability System:

Capabilities are automatically derived from methods decorated with @capability. The decorator populates the _capabilities set during class definition, and __init_subclass__ ensures proper inheritance.

PROTOCOL_CAPABILITIES: dict[str, set[str]] = {}
classmethod __init_subclass__(**kwargs)[source]

Automatically collect and inherit capabilities when subclass is created.

This method is called when a subclass of StorageBackend is defined. It collects PROTOCOL_CAPABILITIES from parent classes and merges them.

classmethod get_capabilities(protocol=None)[source]

Get capability set for a given protocol.

For single-protocol backends (LocalStorage, Base64Backend), protocol parameter is optional and defaults to the backend’s only protocol. For multi-protocol backends (FsspecBackend), protocol must be specified.

Parameters:

protocol (str | None) – Protocol name (e.g., ‘s3’, ‘gcs’, ‘local’, ‘base64’) If None, returns capabilities for the only available protocol

Returns:

Set of capability names

Return type:

set

Examples

>>> # Single-protocol backend
>>> LocalStorage.get_capabilities()  # protocol auto-detected
{'read', 'write', 'delete', 'mkdir', ...}
>>> # Multi-protocol backend
>>> FsspecBackend.get_capabilities('s3')
{'read', 'write', 'metadata', 'presigned_urls', ...}
property capabilities: BackendCapabilities

Return the capabilities of this backend instance.

For single-protocol backends, automatically uses the only protocol. For multi-protocol backends (FsspecBackend), uses self.protocol.

Returns:

Object describing supported features

Return type:

BackendCapabilities

Examples

>>> backend = LocalStorage('/tmp')
>>> caps = backend.capabilities
>>> if caps.versioning:
...     versions = backend.get_versions('file.txt')
classmethod get_json_info(protocol=None)[source]

Return complete backend information in JSON format.

This classmethod can be overridden by backend subclasses to provide complete information including configuration schema, capabilities, and description. This is useful for UI generation and documentation.

The default implementation returns capabilities derived from @capability decorators, but no schema information.

Parameters:

protocol (str | None) – Protocol name for multi-protocol backends (optional for single-protocol)

Returns:

Backend information with schema, capabilities, and description

Return type:

dict

Examples

>>> # Single-protocol backend
>>> info = LocalStorage.get_json_info()
>>> print(info['schema']['fields'])
[{'name': 'path', 'type': 'text', 'required': True}]
>>> # Multi-protocol backend
>>> info = FsspecBackend.get_json_info('s3')
>>> print(info['capabilities']['metadata'])
True
abstractmethod exists(path)[source]

Check if a file or directory exists.

Parameters:

path (str) – Relative path within this storage backend

Returns:

True if file or directory exists

Return type:

bool

Examples

>>> exists = backend.exists('documents/report.pdf')
abstractmethod is_file(path)[source]

Check if path points to a file.

Parameters:

path (str) – Relative path within this storage backend

Returns:

True if path is a file, False otherwise

Return type:

bool

Examples

>>> if backend.is_file('documents/report.pdf'):
...     print("It's a file")
abstractmethod is_dir(path)[source]

Check if path points to a directory.

Parameters:

path (str) – Relative path within this storage backend

Returns:

True if path is a directory, False otherwise

Return type:

bool

Examples

>>> if backend.is_dir('documents'):
...     print("It's a directory")
abstractmethod size(path)[source]

Get file size in bytes.

Parameters:

path (str) – Relative path to file

Returns:

File size in bytes

Return type:

int

Raises:

Examples

>>> size = backend.size('documents/report.pdf')
>>> print(f"File is {size} bytes")
abstractmethod mtime(path)[source]

Get last modification time.

Parameters:

path (str) – Relative path to file or directory

Returns:

Unix timestamp of last modification

Return type:

float

Raises:

FileNotFoundError – If path doesn’t exist

Examples

>>> from datetime import datetime
>>> timestamp = backend.mtime('documents/report.pdf')
>>> mod_time = datetime.fromtimestamp(timestamp)
abstractmethod open(path, mode='rb')[source]

Open a file and return file-like object.

Parameters:
  • path (str) – Relative path to file

  • mode (str) – File mode (‘r’, ‘rb’, ‘w’, ‘wb’, ‘a’, ‘ab’)

Returns:

File-like object supporting context manager

Return type:

BinaryIO | TextIO

Raises:

Examples

>>> with backend.open('file.txt', 'rb') as f:
...     data = f.read()
abstractmethod read_bytes(path)[source]

Read entire file as bytes.

Parameters:

path (str) – Relative path to file

Returns:

Complete file contents

Return type:

bytes

Raises:

FileNotFoundError – If file doesn’t exist

Examples

>>> data = backend.read_bytes('image.jpg')
abstractmethod read_text(path, encoding='utf-8')[source]

Read entire file as text.

Parameters:
  • path (str) – Relative path to file

  • encoding (str) – Text encoding

Returns:

Complete file contents as string

Return type:

str

Raises:

Examples

>>> content = backend.read_text('document.txt')
abstractmethod write_bytes(path, data)[source]

Write bytes to file.

Parameters:
  • path (str) – Relative path to file

  • data (bytes) – Bytes to write

Raises:

Examples

>>> backend.write_bytes('file.bin', b'Hello')
abstractmethod write_text(path, text, encoding='utf-8')[source]

Write text to file.

Parameters:
  • path (str) – Relative path to file

  • text (str) – String to write

  • encoding (str) – Text encoding

Raises:

Examples

>>> backend.write_text('file.txt', 'Hello World')
abstractmethod delete(path, recursive=False)[source]

Delete file or directory.

Parameters:
  • path (str) – Relative path to delete

  • recursive (bool) – If True, delete directories recursively

Raises:
  • FileNotFoundError – If path doesn’t exist (implementation may choose to be idempotent)

  • ValueError – If path is non-empty directory and recursive=False

Examples

>>> backend.delete('file.txt')
>>> backend.delete('folder', recursive=True)
abstractmethod list_dir(path)[source]

List directory contents.

Parameters:

path (str) – Relative path to directory

Returns:

List of names (not full paths) in the directory

Return type:

list[str]

Raises:

Examples

>>> names = backend.list_dir('documents')
>>> for name in names:
...     print(name)  # Just 'report.pdf', not 'documents/report.pdf'
abstractmethod mkdir(path, parents=False, exist_ok=False)[source]

Create directory.

Parameters:
  • path (str) – Relative path to create

  • parents (bool) – If True, create parent directories as needed

  • exist_ok (bool) – If True, don’t error if directory exists

Raises:

Examples

>>> backend.mkdir('new_folder')
>>> backend.mkdir('a/b/c', parents=True)
abstractmethod copy(src_path, dest_backend, dest_path)[source]

Copy file/directory to another backend.

This method handles cross-backend copying efficiently, streaming data when possible to avoid loading large files in memory.

Parameters:
  • src_path (str) – Source path in this backend

  • dest_backend (StorageBackend) – Destination backend (may be different type)

  • dest_path (str) – Destination path in dest_backend

Returns:

New destination path if destination backend changes it

(e.g., base64 backend), or None if path unchanged

Return type:

str | None

Raises:

Examples

>>> # Copy within same backend
>>> backend.copy('file.txt', backend, 'backup/file.txt')
>>>
>>> # Copy to different backend
>>> backend.copy('file.txt', other_backend, 'file.txt')
get_hash(path)[source]

Get MD5 hash from filesystem metadata if available.

This method attempts to retrieve the MD5 hash from the storage backend’s metadata without reading the file content. For cloud storage like S3, this uses the ETag. For local storage, this returns None and the hash must be computed by reading the file.

Parameters:

path (str) – Relative path to file

Returns:

MD5 hash as hexadecimal string, or None if not available

Return type:

str | None

Examples

>>> hash_value = backend.get_hash('file.txt')
>>> if hash_value:
...     print(f"MD5: {hash_value}")
get_metadata(path)[source]

Get custom metadata for a file.

Returns user-defined metadata attached to the file. For cloud storage (S3, GCS, Azure), this retrieves custom metadata stored with the file. For local storage, this typically returns an empty dict or uses extended attributes if supported.

Parameters:

path (str) – Relative path to file

Returns:

Metadata key-value pairs

Return type:

dict[str, str]

Examples

>>> metadata = backend.get_metadata('document.pdf')
>>> print(metadata.get('Content-Type'))
'application/pdf'

Notes

  • Keys and values are strings

  • Cloud storage may have restrictions on key names (e.g., lowercase only)

  • Returns empty dict if no metadata or not supported

set_metadata(path, metadata)[source]

Set custom metadata for a file.

Attaches user-defined metadata to the file. For cloud storage (S3, GCS, Azure), this sets custom metadata that persists with the file. For local storage, this may use extended attributes if supported, or raise PermissionError if not supported.

Parameters:
  • path (str) – Relative path to file

  • metadata (dict[str, str]) – Metadata key-value pairs to set

Raises:

Examples

>>> backend.set_metadata('document.pdf', {
...     'Content-Type': 'application/pdf',
...     'Author': 'John Doe',
...     'Version': '1.0'
... })

Notes

  • Keys and values must be strings

  • Cloud storage may have restrictions (e.g., max metadata size)

  • This typically replaces all metadata (not merge)

get_versions(path)[source]

Get list of available versions for a file.

Returns version history for versioned storage. Default implementation returns empty list (no versioning support).

Parameters:

path (str) – Relative path to file

Returns:

List of version info dicts

Return type:

list[dict]

Notes

  • Override in subclasses that support versioning

  • S3 with versioning enabled can implement this

open_version(path, version_id, mode='rb')[source]

Open a specific version of a file.

Default implementation raises PermissionError. Override in subclasses that support versioning (e.g., S3).

Parameters:
  • path (str) – Relative path to file

  • version_id (str) – Version identifier

  • mode (str) – Open mode (read-only)

Raises:

PermissionError – Always (base implementation)

delete_version(path, version_id)[source]

Delete a specific version of a file.

Removes a specific version from versioned storage. The current version and other versions remain unaffected. This is useful for cleaning up duplicate or unwanted versions.

Default implementation raises PermissionError. Override in subclasses that support versioning (e.g., S3).

Parameters:
  • path (str) – Relative path to file

  • version_id (str) – Version identifier to delete

Raises:

Examples

>>> # Delete a specific version
>>> backend.delete_version('file.txt', 'abc123')

Notes

  • Cannot delete the current version if it’s the only version

  • Some backends may have restrictions on version deletion

  • This operation is typically irreversible

url(path, expires_in=3600, **kwargs)[source]

Generate public URL for file access.

Returns a URL that can be used to access the file directly. For cloud storage (S3, GCS, Azure), this generates a presigned URL. For local storage, this returns None or a local file path URL.

Parameters:
  • path (str) – Relative path to file

  • expires_in (int) – URL expiration time in seconds (default: 3600 = 1 hour)

  • **kwargs – Backend-specific options

Returns:

Public URL or None if not supported

Return type:

str | None

Examples

>>> # S3 presigned URL (expires in 1 hour)
>>> url = backend.url('documents/report.pdf')
>>> print(url)
'https://bucket.s3.amazonaws.com/documents/report.pdf?X-Amz-...'
>>>
>>> # Custom expiration (24 hours)
>>> url = backend.url('video.mp4', expires_in=86400)

Notes

  • Cloud storage URLs are temporary and expire

  • Local storage typically returns None

  • HTTP storage returns the direct URL

internal_url(path, nocache=False)[source]

Generate internal/relative URL for file access.

Returns a URL suitable for internal application use, typically relative to the application’s base URL. Optionally includes cache busting parameters.

Parameters:
  • path (str) – Relative path to file

  • nocache (bool) – If True, append mtime as query parameter for cache busting

Returns:

Internal URL or None if not supported

Return type:

str | None

Examples

>>> # Simple internal URL
>>> url = backend.internal_url('images/logo.png')
>>> print(url)
'/storage/home/images/logo.png'
>>>
>>> # With cache busting
>>> url = backend.internal_url('app.js', nocache=True)
>>> print(url)
'/storage/home/app.js?mtime=1234567890'

Notes

  • Useful for web applications

  • Cache busting helps with CDN/browser caching

  • Format depends on application configuration

local_path(path, mode='r')[source]

Get a local filesystem path for the file.

Returns a context manager that provides a local filesystem path to the file. For local storage, this returns the actual path. For remote storage (S3, GCS, etc.), this downloads the file to a temporary location, yields the temp path, and uploads changes back on exit if the file was modified.

This is essential for integrating with external tools that only work with local filesystem paths (ffmpeg, ImageMagick, etc.).

Parameters:
  • path (str) – Relative path to file

  • mode (str) – Access mode - ‘r’ (read-only), ‘w’ (write-only), ‘rw’ (read-write)

Returns:

Context manager yielding str (local filesystem path)

Examples

>>> # Process remote file with external tool
>>> with backend.local_path('video.mp4', mode='r') as local_path:
...     subprocess.run(['ffmpeg', '-i', local_path, 'output.mp4'])
>>>
>>> # Modify remote file in place
>>> with backend.local_path('image.jpg', mode='rw') as local_path:
...     subprocess.run(['convert', local_path, '-resize', '800x600', local_path])
>>> # Changes automatically uploaded on exit

Notes

  • For read mode (‘r’), the file is downloaded but not uploaded

  • For write mode (‘w’), the file is uploaded on exit

  • For read-write mode (‘rw’), both download and upload occur

  • Temporary files are automatically cleaned up on exit

  • For local storage, returns the original path (no copy)

close()[source]

Close backend and release resources.

This method is called when the backend is no longer needed. Implementations should close any open connections, file handles, etc.

The default implementation does nothing. Backends that manage resources should override this method.

Examples

>>> backend.close()

Local Storage

class genro_storage.backends.LocalStorage(path)[source]

Bases: StorageBackend

Local filesystem storage backend.

This backend provides access to files on the local filesystem. All paths are relative to a configured base directory.

The base_path can be either a string or a callable that returns a string. When a callable is provided, it will be evaluated each time the base_path property is accessed, allowing for dynamic paths (e.g., user-specific directories).

Parameters:

path (Union[str, Callable[[], str]]) – Absolute path to the base directory, or callable returning path

Raises:

Examples

>>> # Static path
>>> backend = LocalStorage('/home/user')
>>>
>>> # Dynamic path with context-based callable (no parameters)
>>> def get_user_dir():
...     user_id = get_current_user()
...     return f'/data/users/{user_id}'
>>> backend = LocalStorage(get_user_dir)
>>>
>>> # Switched mount: callable with prefix parameter
>>> # Single mount behaves like multiple mounts based on first path component
>>> def resource_resolver(prefix):
...     # prefix = 'sys', 'adm', 'gnr', etc.
...     return f'/path/to/{prefix}-package'
>>> backend = LocalStorage(resource_resolver)
>>> # Accessing 'sys/folder/file.txt' routes to '/path/to/sys-package/folder/file.txt'
>>> # Accessing 'adm/folder/file.txt' routes to '/path/to/adm-package/folder/file.txt'
>>>
>>> # Access files relative to base
>>> data = backend.read_bytes('documents/report.pdf')

Note

Switched Mounts: When the callable accepts a parameter, it receives the first path component (prefix) and should return the base directory for that prefix. The backend then appends the remaining path. This allows a single mount to route to different base directories based on the prefix.

__init__(path)[source]

Initialize LocalStorage backend.

Parameters:

path (str | Callable[[], str]) – Absolute path or callable returning absolute path

Raises:

Note

When path is a callable, validation is deferred until first access. This allows configuration before the context (e.g., current user) is available.

property base_path: Path

Get current base path (evaluates callable if needed).

Returns:

Current base path as Path object

classmethod get_json_info()[source]

Return complete backend information in JSON format.

Returns:

Backend information with schema, capabilities, and description.

Return type:

dict

exists(path)[source]

Check if file or directory exists.

is_file(path)[source]

Check if path points to a file.

is_dir(path)[source]

Check if path points to a directory.

size(path)[source]

Get file size in bytes.

mtime(path)[source]

Get last modification time.

open(path, mode='rb')[source]

Open file and return file-like object.

read_bytes(path)[source]

Read entire file as bytes.

read_text(path, encoding='utf-8')[source]

Read entire file as text.

write_bytes(path, data)[source]

Write bytes to file.

write_text(path, text, encoding='utf-8')[source]

Write text to file.

delete(path, recursive=False)[source]

Delete file or directory.

list_dir(path)[source]

List directory contents.

mkdir(path, parents=False, exist_ok=False)[source]

Create directory.

copy(src_path, dest_backend, dest_path)[source]

Copy file/directory to another backend.

For local-to-local copies, uses efficient filesystem operations. For copies to other backends, streams the data.

local_path(path, mode='r')[source]

Get local filesystem path (returns the actual path).

For local storage, this simply returns the actual filesystem path since the file is already local. No temporary copy is needed.

Parameters:
  • path (str) – Relative path to file

  • mode (str) – Access mode (ignored for local storage)

Returns:

Context manager yielding str (the actual filesystem path)

Examples

>>> with backend.local_path('video.mp4') as local_path:
...     subprocess.run(['ffmpeg', '-i', local_path, 'out.mp4'])
__repr__()[source]

String representation.

PROTOCOL_CAPABILITIES: dict[str, set[str]] = {'local': {'append_mode', 'atomic_operations', 'copy_optimization', 'delete', 'list_dir', 'mkdir', 'read', 'seek_support', 'write'}}

Base64 Backend

class genro_storage.backends.Base64Backend[source]

Bases: StorageBackend

Storage backend that decodes base64 data from the path/URI.

This backend treats the path as base64-encoded data and provides read-only access to the decoded content. It’s useful for embedding small amounts of data directly in URIs without requiring actual file storage.

_creation_time

Fixed timestamp for mtime() calls

__init__()[source]

Initialize the Base64 backend.

property capabilities: BackendCapabilities

Return the capabilities of this backend.

Overrides the base implementation to add base64-specific meta-capabilities.

classmethod get_json_info()[source]

Return complete backend information in JSON format.

Returns:

Backend information with schema, capabilities, and description.

Return type:

dict

exists(path)[source]

Check if the base64 data is valid.

Parameters:

path (str) – Base64-encoded string

Returns:

True if valid base64, False otherwise

Return type:

bool

is_file(path)[source]

Check if path is a valid base64 file.

Parameters:

path (str) – Base64-encoded string

Returns:

True if valid base64 (treated as a file)

Return type:

bool

is_dir(path)[source]

Check if path is a directory.

Parameters:

path (str) – Base64-encoded string

Returns:

Always False (base64 backend has no directories)

Return type:

bool

size(path)[source]

Get size of decoded data in bytes.

Parameters:

path (str) – Base64-encoded string

Returns:

Size of decoded data

Raises:

FileNotFoundError – If invalid base64

Return type:

int

mtime(path)[source]

Get modification time.

Parameters:

path (str) – Base64-encoded string

Returns:

Fixed timestamp (base64 data has no modification time)

Raises:

FileNotFoundError – If invalid base64

Return type:

float

open(path, mode='rb')[source]

Open base64 data as file-like object.

Parameters:
  • path (str) – Base64-encoded string (ignored for write modes)

  • mode (str) – Open mode (‘rb’, ‘r’, ‘wb’, ‘w’, ‘ab’, ‘a’)

Returns:

File-like object (BytesIO or StringIO)

Raises:

FileNotFoundError – If invalid base64 (read modes only)

Return type:

BinaryIO | TextIO

Note

Write modes return empty BytesIO/StringIO. The caller must handle retrieving the content and calling write_bytes/write_text to get the new base64 path.

read_bytes(path)[source]

Read and decode base64 data.

Parameters:

path (str) – Base64-encoded string

Returns:

Decoded bytes

Raises:

FileNotFoundError – If invalid base64

Return type:

bytes

read_text(path, encoding='utf-8')[source]

Read and decode base64 data as text.

Parameters:
  • path (str) – Base64-encoded string

  • encoding (str) – Text encoding (default: utf-8)

Returns:

Decoded text string

Raises:
Return type:

str

write_bytes(path, data)[source]

Write bytes to base64 node.

Creates a new base64-encoded string from the data. The path parameter is ignored as the base64 content itself becomes the new path.

Parameters:
  • path (str) – Ignored (base64 backend is pathless)

  • data (bytes) – Bytes to encode

Returns:

New base64-encoded path

Return type:

str

Note

This operation changes the node’s path to the new base64 string. The old path becomes invalid.

Examples

>>> new_path = backend.write_bytes("old", b"Hello")
>>> # new_path is now "SGVsbG8=" (base64 of "Hello")
write_text(path, text, encoding='utf-8')[source]

Write text to base64 node.

Creates a new base64-encoded string from the text. The path parameter is ignored as the base64 content itself becomes the new path.

Parameters:
  • path (str) – Ignored (base64 backend is pathless)

  • text (str) – String to encode

  • encoding (str) – Text encoding (default: utf-8)

Returns:

New base64-encoded path

Return type:

str

Note

This operation changes the node’s path to the new base64 string. The old path becomes invalid.

Examples

>>> new_path = backend.write_text("old", "Hello World")
>>> # new_path is now "SGVsbG8gV29ybGQ=" (base64 of "Hello World")
delete(path, recursive=False)[source]

Delete operation not supported.

Parameters:
  • path (str) – Unused

  • recursive (bool) – Unused

Raises:

PermissionError – Always (read-only backend)

list_dir(path)[source]

List directory contents.

Parameters:

path (str) – Base64-encoded string

Returns:

Empty list

Raises:

ValueError – Always (no directories in base64 backend)

Return type:

list[str]

mkdir(path, parents=False, exist_ok=False)[source]

Create directory operation not supported.

Parameters:
  • path (str) – Unused

  • parents (bool) – Unused

  • exist_ok (bool) – Unused

Raises:

PermissionError – Always (read-only backend)

copy(src_path, dest_backend, dest_path)[source]

Copy base64 data to another backend.

This decodes the base64 data and writes it to the destination backend.

Parameters:
  • src_path (str) – Base64-encoded source data

  • dest_backend (StorageBackend) – Destination backend

  • dest_path (str) – Destination path

Returns:

New destination path if destination backend changes it,

or None if path unchanged

Return type:

str | None

Raises:

FileNotFoundError – If invalid base64

get_hash(path)[source]

Get MD5 hash of decoded data.

Parameters:

path (str) – Base64-encoded string

Returns:

MD5 hash of decoded data

Raises:

FileNotFoundError – If invalid base64

Return type:

str | None

local_path(path, mode='r')[source]

Get local filesystem path for base64 data.

Creates a temporary file with the decoded base64 content. Since Base64Backend is read-only, write modes are not supported.

Parameters:
  • path (str) – Base64-encoded string

  • mode (str) – Access mode (only ‘r’ is supported)

Returns:

Context manager yielding str (temp file path)

Raises:

Examples

>>> # Use base64 data with external tool
>>> node = storage.node('b64:SGVsbG8gV29ybGQ=')
>>> with node.local_path() as path:
...     subprocess.run(['cat', path])
PROTOCOL_CAPABILITIES: dict[str, set[str]] = {'base64': {'read', 'seek_support'}}

Fsspec Backend