API Reference

This page provides the complete API documentation for genro-storage.

StorageManager

class genro_storage.StorageManager[source]

Bases: object

Main entry point for configuring and accessing storage.

StorageManager is responsible for: - Configuring mount points that map to storage backends - Creating StorageNode instances for file/directory access - Managing the lifecycle of storage backend connections

A mount point is a logical name (e.g., “home”, “uploads”, “s3”) that maps to an actual storage backend (local filesystem, S3 bucket, etc.).

Examples

>>> # Create manager
>>> storage = StorageManager()
>>>
>>> # Configure from file
>>> storage.configure('/etc/app/storage.yaml')
>>>
>>> # Configure programmatically
>>> storage.configure([
...     {'name': 'home', 'type': 'local', 'path': '/home/user'},
...     {'name': 'uploads', 'type': 's3', 'bucket': 'my-bucket'}
... ])
>>>
>>> # Access files
>>> node = storage.node('home:documents/report.pdf')
>>> content = node.read_text()

__init__()[source]

Initialize a new StorageManager with no configured mounts.

After initialization, you must call configure() to set up mount points before you can access any files.

Examples

>>> from genro_storage import StorageManager
>>> storage = StorageManager()

configure(source)[source]

Configure mount points from various sources.

This method can be called multiple times. If a mount with the same name already exists, it will be replaced with the new configuration.

Parameters:

source (Annotated[str | list[dict[str, Any]], 'Configuration source: path to YAML/JSON file or list of mount configurations']) – Configuration source, can be: - str: Path to YAML or JSON configuration file - list[dict]: List of mount configurations

Raises:

FileNotFoundError – If configuration file doesn’t exist
StorageConfigError – If configuration format is invalid
TypeError – If source is neither str nor list

Configuration Dictionary Format:

Each mount configuration dict must have:

name (str, required): Mount point name (e.g., “home”, “uploads”)
type (str, required): Backend type (“local”, “s3”, “gcs”, “azure”, “http”, “memory”)
Additional fields depend on type (see examples below)

Examples

Local Storage:

>>> storage.configure([{
...     'name': 'home',
...     'type': 'local',
...     'path': '/home/user'  # required: absolute path
... }])

S3 Storage:

>>> storage.configure([{
...     'name': 'uploads',
...     'type': 's3',
...     'bucket': 'my-bucket',    # required
...     'prefix': 'uploads/',     # optional, default: ""
...     'region': 'eu-west-1',    # optional
...     'anon': False             # optional, default: False
... }])

GCS Storage:

>>> storage.configure([{
...     'name': 'backups',
...     'type': 'gcs',
...     'bucket': 'my-backups',   # required
...     'prefix': '',             # optional
...     'token': 'path/to/service-account.json'  # optional
... }])

Azure Blob Storage:

>>> storage.configure([{
...     'name': 'archive',
...     'type': 'azure',
...     'container': 'archives',      # required
...     'account_name': 'myaccount',  # required
...     'account_key': '...'          # optional if using managed identity
... }])

HTTP Storage (read-only):

>>> storage.configure([{
...     'name': 'cdn',
...     'type': 'http',
...     'base_url': 'https://cdn.example.com'  # required
... }])

Memory Storage (for testing):

>>> storage.configure([{
...     'name': 'test',
...     'type': 'memory'
... }])

From YAML File:

# storage.yaml
- name: home
  type: local
  path: /home/user

- name: uploads
  type: s3
  bucket: my-app-uploads
  region: eu-west-1

>>> storage.configure('/etc/app/storage.yaml')

From JSON File:

[
  {
    "name": "home",
    "type": "local",
    "path": "/home/user"
  },
  {
    "name": "uploads",
    "type": "s3",
    "bucket": "my-app-uploads",
    "region": "eu-west-1"
  }
]

>>> storage.configure('./config/storage.json')

Multiple Calls (mounts are replaced if same name):

>>> storage.configure([{'name': 'home', 'type': 'local', 'path': '/home/user'}])
>>> storage.configure([{'name': 'uploads', 'type': 's3', 'bucket': 'my-bucket'}])
>>> # Now both 'home' and 'uploads' are configured

add_mount(config)[source]

Add or update a single mount point.

If a mount with the same name already exists, it will be replaced.

Parameters:: config (Annotated[dict[str, Any], 'Mount configuration dictionary']) – Mount configuration dictionary with ‘name’ and ‘type’ fields
Raises:: StorageConfigError – If configuration is invalid

Examples

>>> storage.add_mount({
...     'name': 'uploads',
...     'type': 's3',
...     'bucket': 'my-bucket'
... })

delete_mount(name)[source]

Delete a mount point.

Parameters:: name (Annotated[str, 'Mount point name to delete']) – Name of the mount point to remove
Raises:: KeyError – If mount point doesn’t exist

Examples

>>> storage.delete_mount('uploads')

node(mount_or_path=None, *path_parts, version=None)[source]

Create a StorageNode pointing to a file or directory.

This is the primary way to access files and directories. The path uses a mount:path format where the mount name refers to a configured storage backend.

When called without arguments, creates a dummy/accumulator node that can be used to build content from multiple sources.

Parameters:

mount_or_path (Annotated[str | None, 'Mount name or full path (mount:path format), or None for dummy node']) – Either: - Full path with mount: “mount:path/to/file” - Just mount name: “mount” - None: creates a dummy accumulator node (no storage backend)
*path_parts (str) – Additional path components to join
version (Annotated[int | str | None, 'Optional version: int for index (-1=latest), str for version_id']) – Optional version specifier for versioned storage (S3, GCS). If specified, creates a read-only snapshot node of that version. Can be int (index: -1=latest, -2=previous) or str (version_id).

Returns:

A new StorageNode instance

Return type:

StorageNode

Raises:

KeyError – If mount point doesn’t exist (wrapped as StorageNotFoundError)
ValueError – If path format is invalid

Path Normalization:

Multiple slashes collapsed: “a//b” → “a/b”
Leading/trailing slashes stripped
No support for “..” (parent directory) - raises ValueError

Examples

Full path in one string:

>>> node = storage.node('home:documents/report.pdf')

Mount + path parts:

>>> node = storage.node('home', 'documents', 'report.pdf')

Mix styles:

>>> node = storage.node('home:documents', 'reports', 'q4.pdf')

Dynamic composition:

>>> user_id = '123'
>>> year = '2024'
>>> node = storage.node('uploads', 'users', user_id, year, 'avatar.jpg')
>>> # Result: uploads:users/123/2024/avatar.jpg

Just mount (root of storage):

>>> node = storage.node('home')
>>> # Result: home:

Path with special characters:

>>> # Spaces and unicode are OK
>>> node = storage.node('home:My Documents/Café Menu.pdf')

Invalid paths (will raise ValueError):

>>> # Parent directory traversal not allowed
>>> node = storage.node('home:documents/../etc/passwd')  # ValueError

Dummy node (accumulator):

>>> dummy = storage.node()  # No parameters
>>> dummy.append(node1)
>>> dummy.extend(node2, node3)
>>> dummy.read_text()  # Concatenates all sources

iternode(*nodes)[source]

Create a virtual node that concatenates multiple nodes lazily.

This creates a virtual node (no physical storage) that accumulates references to other nodes. Content is only read when materialized via read_text(), read_bytes(), copy(), or zip().

Parameters:: *nodes – StorageNode instances to concatenate
Returns:: Virtual node with concatenation capability
Return type:: StorageNode

Examples

>>> # Create from existing nodes
>>> n1 = storage.node('mem:part1.txt')
>>> n2 = storage.node('mem:part2.txt')
>>> combined = storage.iternode(n1, n2)
>>>
>>> # Read concatenated content
>>> content = combined.read_text()
>>>
>>> # Add more nodes
>>> n3 = storage.node('mem:part3.txt')
>>> combined.append(n3)
>>>
>>> # Save to file
>>> result = storage.node('mem:result.txt')
>>> combined.copy(result)
>>>
>>> # Create ZIP
>>> zip_bytes = combined.zip()

diffnode(node1, node2)[source]

Create a virtual node that generates a diff between two nodes.

This creates a virtual node that generates a unified diff between two text files. The diff is only computed when materialized via read_text() or copy().

Parameters:

node1 (StorageNode) – First node (old version)
node2 (StorageNode) – Second node (new version)

Returns:

Virtual node with diff capability

Return type:

StorageNode

Raises:

ValueError – If nodes contain binary data

Examples

>>> # Compare two versions
>>> v1 = storage.node('mem:config_v1.txt')
>>> v2 = storage.node('mem:config_v2.txt')
>>> diff = storage.diffnode(v1, v2)
>>>
>>> # Read diff
>>> changes = diff.read_text()
>>>
>>> # Save diff to file
>>> diff_file = storage.node('mem:changes.diff')
>>> diff.copy(diff_file)

get_mount_names()[source]

Get list of configured mount names.

Returns:: List of mount point names
Return type:: list[str]

Examples

>>> storage.configure([
...     {'name': 'home', 'type': 'local', 'path': '/home/user'},
...     {'name': 'uploads', 'type': 's3', 'bucket': 'my-bucket'}
... ])
>>> print(storage.get_mount_names())
['home', 'uploads']

has_mount(name)[source]

Check if a mount point is configured.

Parameters:: name (Annotated[str, 'Mount point name to check']) – Mount point name to check
Returns:: True if mount exists
Return type:: bool

Examples

>>> if storage.has_mount('uploads'):
...     node = storage.node('uploads:file.txt')
... else:
...     print("Uploads storage not configured")

__repr__()[source]

String representation for debugging.

StorageNode

class genro_storage.StorageNode(manager, mount_name, path, version=None)[source]

Bases: object

Represents a file or directory in a storage backend.

StorageNode provides a unified interface for file operations across different storage backends (local, S3, GCS, Azure, HTTP, etc.).

Note

Users should not instantiate StorageNode directly. Use StorageManager.node() instead.

The node can represent either a file or a directory. Use the properties isfile and isdir to determine the type.

Examples

>>> # Get a node via StorageManager
>>> node = storage.node('home:documents/report.pdf')
>>>
>>> # Check if it exists
>>> if node.exists:
...     print(f"File size: {node.size} bytes")
>>>
>>> # Read content
>>> content = node.read_text()
>>>
>>> # Write content
>>> node.write_text("Hello World")

fullpath: Full path including mount point (e.g., “home:documents/file.txt”)

exists: True if file or directory exists

isfile: True if node points to a file

isdir: True if node points to a directory

size: File size in bytes

mtime: Last modification time as Unix timestamp

basename: Filename with extension

stem: Filename without extension

suffix: File extension including dot

parent: Parent directory as StorageNode

__init__(manager, mount_name, path, version=None)[source]

Initialize a StorageNode.

Parameters:

manager (StorageManager) – The StorageManager instance that owns this node
mount_name (str | None) – Name of the mount point (e.g., “home”, “uploads”), or None for dummy node
path (str | None) – Relative path within the mount (e.g., “documents/file.txt”), or None for dummy node
version (int | str | None) – Optional version specifier for versioned storage. If set, the node becomes a read-only snapshot of that version.

Note

This should not be called directly. Use StorageManager.node() instead.

property fullpath: str

Full path including mount point.

Returns:: Full path in format “mount:path/to/file”
Return type:: str

Examples

>>> node = storage.node('home:documents/report.pdf')
>>> print(node.fullpath)
'home:documents/report.pdf'

property path: str

Relative path within the mount.

Returns:: Path relative to mount point (without mount prefix)
Return type:: str

Examples

>>> node = storage.node('home:documents/report.pdf')
>>> print(node.path)
'documents/report.pdf'

>>> # For base64 backend, this is the base64-encoded content
>>> node = storage.node('b64:SGVsbG8=')
>>> print(node.path)
'SGVsbG8='

property exists: bool

True if file or directory exists.

Returns:

True if the file or directory exists on the storage backend.: Virtual nodes always return False.

Return type:

bool

Examples

>>> if node.exists:
...     print("File exists!")
... else:
...     print("File not found")

property isfile: bool

True if node points to a file.

Returns:: True if this node is a file, False if directory or doesn’t exist
Return type:: bool

Examples

>>> if node.isfile:
...     data = node._read_bytes()

property isdir: bool

True if node points to a directory.

Returns:: True if this node is a directory, False if file or doesn’t exist
Return type:: bool

Examples

>>> if node.isdir:
...     for child in node.children():
...         print(child.basename)

property size: int

File size in bytes.

Returns:

Size of the file in bytes

Return type:

int

Raises:

FileNotFoundError – If file doesn’t exist
ValueError – If node is a directory (directories don’t have size)

Examples

>>> print(f"File size: {node.size} bytes")
>>> print(f"File size: {node.size / 1024:.1f} KB")

property mtime: float

Last modification time as Unix timestamp.

Returns:: Unix timestamp of last modification time
Return type:: float

Examples

>>> from datetime import datetime
>>> mod_time = datetime.fromtimestamp(node.mtime)
>>> print(f"Modified: {mod_time}")

property basename: str

Filename with extension.

Returns:: The filename including extension
Return type:: str

Examples

>>> node = storage.node('home:documents/report.pdf')
>>> print(node.basename)
'report.pdf'

property stem: str

Filename without extension.

Returns:: The filename without extension
Return type:: str

Examples

>>> node = storage.node('home:documents/report.pdf')
>>> print(node.stem)
'report'

property suffix: str

File extension including dot.

Returns:: The file extension including the leading dot (e.g., “.pdf”)
Return type:: str

Examples

>>> node = storage.node('home:documents/report.pdf')
>>> print(node.suffix)
'.pdf'

property parent: StorageNode

Parent directory as StorageNode.

Returns:: A new StorageNode pointing to the parent directory
Return type:: StorageNode

Examples

>>> node = storage.node('home:documents/reports/q4.pdf')
>>> parent = node.parent
>>> print(parent.fullpath)
'home:documents/reports'

property dirname: str

Parent directory fullpath as string.

Convenience property that returns the fullpath of the parent directory as a string, equivalent to parent.fullpath.

Returns:: Parent directory fullpath (e.g., ‘home:documents/reports’)
Return type:: str

Examples

>>> node = storage.node('home:documents/reports/q4.pdf')
>>> print(node.dirname)
'home:documents/reports'
>>>
>>> # Compare with parent property
>>> print(node.parent.fullpath)
'home:documents/reports'
>>> # dirname is a shortcut for the above

property ext: str

File extension without leading dot.

Convenience property for getting the file extension without the dot prefix, which is more convenient for comparisons and type checking than suffix.

Returns:: Extension without dot (e.g., ‘pdf’, ‘txt’), or empty string if no extension
Return type:: str

Examples

>>> node = storage.node('home:documents/report.pdf')
>>> print(node.ext)
'pdf'
>>> print(node.suffix)  # Compare with suffix
'.pdf'
>>>
>>> # More convenient for comparisons
>>> if node.ext == 'pdf':
...     process_pdf(node)
>>>
>>> # Instead of remembering the dot
>>> if node.suffix == '.pdf':
...     process_pdf(node)

splitext()[source]

Split path into filename and extension.

Similar to os.path.splitext(), returns a tuple of (filename, extension). The extension includes the leading dot. The filename includes the full path without the extension.

Returns:: (filename, extension) where extension includes the dot
Return type:: tuple[str, str]

Examples

>>> node = storage.node('home:documents/report.pdf')
>>> name, ext = node.splitext()
>>> print(name)
'documents/report'
>>> print(ext)
'.pdf'
>>>
>>> # Useful for renaming with different extension
>>> name, _ = node.splitext()
>>> new_path = f'{name}.docx'
>>> new_node = storage.node(f'home:{new_path}')

property ext_attributes: tuple[float | None, int | None, bool]

Commonly-used file attributes as a tuple.

Convenience property for getting (mtime, size, isdir) together in one call. Returns None values if file doesn’t exist. Size is None for directories.

Returns:

(mtime, size, isdir) where:

mtime: Modification time as Unix timestamp or None
size: File size in bytes or None (None for directories)
isdir: True if directory, False otherwise

Return type:

tuple

Examples

>>> node = storage.node('home:document.pdf')
>>> mtime, size, isdir = node.ext_attributes
>>> if mtime and size:
...     print(f'File: {size} bytes, modified at {mtime}')
>>>
>>> # More concise than
>>> mtime = node.mtime
>>> size = node.size
>>> isdir = node.isdir

property md5hash: str

MD5 hash of file content.

For cloud storage (S3, GCS, Azure), retrieves hash from metadata (fast). For local storage, computes hash by reading file in blocks (slower).

Returns:

MD5 hash as lowercase hexadecimal string (32 characters)

Return type:

str

Raises:

FileNotFoundError – If file doesn’t exist
ValueError – If node is a directory

Examples

>>> hash1 = node1.md5hash
>>> hash2 = node2.md5hash
>>> if hash1 == hash2:
...     print("Files have identical content")

property mimetype: str

Get MIME type from file extension.

Uses Python’s mimetypes module to guess the MIME type based on the file extension. Returns ‘application/octet-stream’ if type cannot be determined.

Returns:: MIME type string (e.g., ‘image/png’, ‘application/pdf’)
Return type:: str

Examples

>>> jpg = storage.node('photos:image.jpg')
>>> jpg.mimetype
'image/jpeg'
>>>
>>> pdf = storage.node('documents:report.pdf')
>>> pdf.mimetype
'application/pdf'
>>>
>>> # Use for HTTP responses
>>> response.headers['Content-Type'] = node.mimetype

property capabilities

Get capabilities of underlying backend.

Returns backend capabilities which describe what features are supported, such as versioning, metadata, presigned URLs, etc.

If this node is a versioned snapshot (created with version parameter), the versioning capabilities are disabled since the node is read-only.

Returns:: Object describing supported features
Return type:: BackendCapabilities

Examples

>>> if node.capabilities.versioning:
...     versions = node.versions
>>> if node.capabilities.presigned_urls:
...     url = node.get_presigned_url()

open(mode='r', version=None, as_of=None)[source]

Open file with optional version control support.

Parameters:

mode (str) – File mode (‘r’, ‘rb’, ‘w’, ‘wb’, ‘a’, ‘ab’)
version (int | str | None) –
Version to open: - None: Latest version (default) - str: Specific version_id (e.g., ‘abc123…’) - int: Version index with negative indexing support:
- -1: Latest version
- -2: Previous version
- 0: Oldest version
- 1: Second oldest version
as_of (datetime | None) – Open file as it was at this datetime

Returns:

File-like object (context manager)

Return type:

BinaryIO | TextIO

Raises:

ValueError – If both version and as_of provided, or invalid mode for historical versions
IndexError – If version index out of range
FileNotFoundError – If no version found for as_of date
PermissionError – If backend doesn’t support versioning

Examples

>>> # Latest version
>>> with node.open() as f:
...     data = f.read()

>>> # Previous version (pythonic!)
>>> with node.open(version=-2) as f:
...     previous = f.read()

>>> # Specific version by ID
>>> with node.open(version='abc123xyz') as f:
...     old_content = f.read()

>>> # Version at date
>>> from datetime import datetime
>>> with node.open(as_of=datetime(2024, 1, 15)) as f:
...     historical = f.read()

read(mode='r', encoding='utf-8')[source]

Read file content in text or binary mode.

Parameters:

mode (Annotated[str, "Read mode: 'r' for text, 'rb' for binary"]) – Read mode - ‘r’ for text (default), ‘rb’ for binary
encoding (Annotated[str, 'Text encoding (only for text mode)']) – Text encoding (used only for text mode)

Returns:

File content as text or bytes depending on mode

Return type:

str | bytes

Raises:

FileNotFoundError – If file doesn’t exist
ValueError – If mode is invalid

Examples

>>> # Read as text (default)
>>> content = node.read()
>>> content = node.read(mode='r')
>>>
>>> # Read as binary
>>> data = node.read(mode='rb')

write(data, mode='w', encoding='utf-8', skip_if_unchanged=False)[source]

Write data to file in text or binary mode.

Parameters:

data (Annotated[str | bytes, 'Data to write (str for text, bytes for binary)']) – Data to write (str for text mode, bytes for binary mode)
mode (Annotated[str, "Write mode: 'w' for text, 'wb' for binary"]) – Write mode - ‘w’ for text (default), ‘wb’ for binary
encoding (Annotated[str, 'Text encoding (only for text mode)']) – Text encoding (used only for text mode)
skip_if_unchanged (Annotated[bool, 'Skip writing if content is identical']) – If True, skip writing if content identical

Returns:

True if written, False if skipped

Return type:

bool

Raises:

TypeError – If data type doesn’t match mode
ValueError – If mode is invalid

Examples

>>> # Write text (default)
>>> node.write('Hello World')
>>> node.write('Hello', mode='w')
>>>
>>> # Write binary
>>> node.write(b'binary data', mode='wb')
>>>
>>> # Skip if unchanged
>>> written = node.write('content', skip_if_unchanged=True)

read_text(encoding='utf-8')[source]

Read file content as text.

Convenience method equivalent to read(mode=’r’, encoding=encoding). Compatible with pathlib.Path API.

Parameters:: encoding (str) – Text encoding (default: ‘utf-8’)
Returns:: File content as text
Return type:: str
Raises:: FileNotFoundError – If file doesn’t exist

Examples

>>> content = node.read_text()
>>> content = node.read_text(encoding='latin-1')

read_bytes()[source]

Read file content as bytes.

Convenience method equivalent to read(mode=’rb’). Compatible with pathlib.Path API.

Returns:: File content as bytes
Return type:: bytes
Raises:: FileNotFoundError – If file doesn’t exist

Examples

>>> data = node.read_bytes()

write_text(text, encoding='utf-8', skip_if_unchanged=False)[source]

Write text content to file.

Convenience method equivalent to write(text, mode=’w’, encoding=encoding, skip_if_unchanged=skip_if_unchanged). Compatible with pathlib.Path API.

Parameters:

text (str) – Text content to write
encoding (str) – Text encoding (default: ‘utf-8’)
skip_if_unchanged (bool) – Skip write if content identical (default: False)

Returns:

True if file was written, False if skipped

Return type:

bool

Raises:

TypeError – If text is not str
ValueError – If node is a versioned snapshot (read-only)

Examples

>>> node.write_text("Hello World")
>>> node.write_text("Content", encoding='latin-1')
>>> written = node.write_text("New", skip_if_unchanged=True)

write_bytes(data, skip_if_unchanged=False)[source]

Write binary content to file.

Convenience method equivalent to write(data, mode=’wb’, skip_if_unchanged=skip_if_unchanged). Compatible with pathlib.Path API.

Parameters:

data (bytes) – Binary content to write
skip_if_unchanged (bool) – Skip write if content identical (default: False)

Returns:

True if file was written, False if skipped

Return type:

bool

Raises:

TypeError – If data is not bytes
ValueError – If node is a versioned snapshot (read-only)

Examples

>>> node.write_bytes(b"Binary data")
>>> written = node.write_bytes(data, skip_if_unchanged=True)

delete()[source]

Delete file or directory.

copy_to(dest, include=None, exclude=None, filter=None, skip='never', skip_fn=None, progress=None, on_file=None, on_skip=None)[source]

Copy file or directory to destination with filtering and skip logic.

Supports filtering which files to copy (source-based) and skipping existing files (destination-based) for efficient incremental backups.

Filtering (applied to source files):

‘include’: Glob patterns for files to include (whitelist)
‘exclude’: Glob patterns for files to exclude (blacklist)
‘filter’: Custom function(node, relpath) -> bool

Skip strategies (applied to destination files):

‘never’: Always copy (overwrite existing files) - default
‘exists’: Skip if destination file exists (fastest)
‘size’: Skip if destination exists and has same size (fast)
‘hash’: Skip if destination exists and has same content/MD5 (accurate)
‘custom’: Use custom skip function

Parameters:

dest (StorageNode | str) – Destination node or path string
include (str | list[str] | None) – Glob pattern(s) for files to include. If specified, only matching files are copied (whitelist mode). Can be string or list of strings.
exclude (str | list[str] | None) – Glob pattern(s) for files to exclude. Applied after include. Can be string or list of strings.
filter (Callable[[StorageNode, str], bool] | None) – Custom filter function(node, relative_path) -> bool. Return True to include file, False to exclude. Applied after include/exclude patterns.
skip (SkipStrategy | Literal['never', 'exists', 'size', 'hash', 'custom']) – Skip strategy (default: ‘never’ = always copy)
skip_fn (Callable[[StorageNode, StorageNode], bool] | None) – Custom skip function(src, dest) -> bool (required if skip=’custom’)
progress (Callable[[int, int], None] | None) – Callback(current, total) called after each file
on_file (Callable[[StorageNode], None] | None) – Callback(src_node) called after each file copied
on_skip (Callable[[StorageNode, str], None] | None) – Callback(src_node, reason) called when file is skipped

Returns:

Destination StorageNode

Raises:

FileNotFoundError – If source doesn’t exist
ValueError – If skip=’custom’ but no skip_fn provided

Return type:

StorageNode

Examples

>>> # Simple copy (overwrite) - default behavior
>>> src.copy(dest)
>>>
>>> # Copy only Python files
>>> src.copy(dest, include='*.py')
>>>
>>> # Copy all except logs and temp files
>>> src.copy(dest, exclude=['*.log', '*.tmp', '__pycache__/**'])
>>>
>>> # Combine include and exclude
>>> src.copy(dest, include='*.py', exclude='test_*.py')
>>>
>>> # Custom filter: only files smaller than 10MB
>>> src.copy(dest, filter=lambda node, path: node.size < 10_000_000)
>>>
>>> # Filter by modification time
>>> from datetime import datetime, timedelta
>>> cutoff = datetime.now() - timedelta(days=7)
>>> src.copy(dest, filter=lambda n, p: n.mtime > cutoff.timestamp())
>>>
>>> # Combine filtering and skip strategy
>>> src.copy(dest,
...          include=['*.py', '*.json'],
...          exclude='__pycache__/**',
...          skip='hash')  # Skip if content identical
>>>
>>> # Full-featured backup with tracking
>>> src.copy(dest,
...          exclude=['*.log', '*.tmp', 'node_modules/**'],
...          filter=lambda n, p: n.size < 100_000_000,
...          skip='hash',
...          progress=lambda c, t: print(f"{c}/{t}"))

Performance Notes:

Filtering is applied before copying (saves bandwidth)
skip=’exists’: ~1-2ms per file (only existence check)
skip=’size’: ~2-5ms per file (existence + size read)
skip=’hash’: * S3/GCS: ~5-10ms per file (ETag from metadata, fast) * Local: ~100ms per MB (must read file to compute MD5)

For cloud storage, ‘hash’ is efficient due to ETag metadata. For local storage, ‘size’ is usually sufficient.

Note

Include/exclude patterns match against relative paths from source
If copying to base64 backend, destination path will be updated
Filtering is source-based (which files to copy)
Skip logic is destination-based (whether to overwrite)

move_to(dest)[source]

Move file/directory to destination.

append(node)[source]

Append a node to this virtual node (iternode only).

This method is only available for virtual nodes created with storage.iternode(). It adds a node reference to the accumulation list. Content is read lazily when materialized.

Parameters:: node (StorageNode) – StorageNode to append
Raises:: ValueError – If not a virtual iternode

Examples

>>> iternode = storage.iternode()
>>> n1 = storage.node('mem:part1.txt')
>>> iternode.append(n1)
>>> content = iternode.read_text()  # Materializes here

extend(*nodes)[source]

Extend this virtual node with multiple nodes (iternode only).

This method is only available for virtual nodes created with storage.iternode(). It adds multiple node references to the accumulation list. Content is read lazily when materialized.

Parameters:: *nodes (StorageNode) – StorageNodes to append
Raises:: ValueError – If not a virtual iternode

Examples

>>> iternode = storage.iternode(n1)
>>> iternode.extend(n2, n3, n4)
>>> content = iternode.read_text()  # Materializes all

zip()[source]

Create ZIP archive from node content.

Behavior depends on node type: - Regular file: Creates ZIP containing that file - Regular directory: Creates ZIP with all files recursively - Virtual iternode: Creates ZIP with all accumulated nodes as separate files

Returns:: ZIP archive as bytes
Return type:: bytes
Raises:: ValueError – If node doesn’t exist (for regular nodes)

Examples

>>> # ZIP a directory
>>> docs = storage.node('home:documents')
>>> zip_bytes = docs.zip()
>>>
>>> # ZIP accumulated files
>>> iternode = storage.iternode(n1, n2, n3)
>>> zip_bytes = iternode.zip()
>>>
>>> # Save ZIP
>>> archive = storage.node('backup.zip')
>>> archive.write_bytes(zip_bytes)

children()[source]

List child nodes (if directory).

child(*parts)[source]

Get a child node by path components.

Parameters:: *parts (Annotated[str, 'Path components to append']) – Path components to append. Can be: - Single string with path separators: ‘aaa/bbb/ccc’ - Multiple strings: ‘aaa’, ‘bbb’, ‘ccc’
Returns:: Child node with combined path
Return type:: StorageNode

Examples

>>> docs = storage.node('home:documents')
>>>
>>> # Single path string
>>> report = docs.child('2024/reports/q4.pdf')
>>>
>>> # Multiple components
>>> report = docs.child('2024', 'reports', 'q4.pdf')
>>>
>>> # Both produce: 'home:documents/2024/reports/q4.pdf'

mkdir(parents=False, exist_ok=False)[source]

Create directory.

local_path(mode='r')[source]

Get local filesystem path for this file.

Returns a context manager that provides a local filesystem path. For local storage, returns the actual path. For remote storage (S3, GCS, etc.), downloads to a temporary file, yields the temp path, and uploads changes on exit.

This is essential for integrating with external tools that only work with local filesystem paths (ffmpeg, ImageMagick, etc.).

Parameters:: mode (str) – Access mode - ‘r’: Read-only (download, no upload) - ‘w’: Write-only (no download, upload on exit) - ‘rw’: Read-write (download and upload)
Returns:: Context manager yielding str (local filesystem path)

Examples

>>> # Process video with ffmpeg
>>> video = storage.node('s3:videos/input.mp4')
>>> with video.local_path(mode='r') as path:
...     subprocess.run(['ffmpeg', '-i', path, 'output.mp4'])
>>>
>>> # Modify image in place
>>> image = storage.node('s3:photos/pic.jpg')
>>> with image.local_path(mode='rw') as path:
...     subprocess.run(['convert', path, '-resize', '800x600', path])
>>> # Changes automatically uploaded to S3

Notes

For local storage, returns the actual path (no copy)
For remote storage, uses temporary files
Temporary files are automatically cleaned up on exit
Large files are streamed in chunks to avoid memory issues

call(*args, callback=None, async_mode=False, return_output=False, **subprocess_kwargs)[source]

Execute external command with automatic local_path management.

Automatically manages local filesystem paths for StorageNode arguments, downloading from cloud storage as needed and uploading changes after execution. Perfect for integrating with external tools like ffmpeg, imagemagick, pandoc, etc.

Parameters:

*args – Command arguments (str or StorageNode) StorageNode arguments are automatically converted to local paths
callback (Callable[[], None] | None) – Function to call on completion (async mode only)
async_mode (bool) – Run in background thread (default: False)
return_output (bool) – Return subprocess output as string (default: False)
**subprocess_kwargs – Additional arguments passed to subprocess.run() (e.g., cwd, env, timeout, shell, etc.)

Returns:

Command output if return_output=True, None otherwise: In async mode, returns immediately (None)

Return type:

str | None

Raises:

subprocess.CalledProcessError – If command exits with non-zero status
FileNotFoundError – If command executable not found

Examples

>>> # Video conversion (cloud storage)
>>> input_video = storage.node('s3:videos/input.mp4')
>>> output_video = storage.node('s3:videos/output.mp4')
>>> input_video.call('ffmpeg', '-i', input_video, '-vcodec', 'h264', output_video)
>>> # Automatically downloads input, uploads output

>>> # Image resize (local storage)
>>> image = storage.node('home:photos/photo.jpg')
>>> image.call('convert', image, '-resize', '800x600', image)

>>> # With callback (async)
>>> def on_complete():
...     print("Processing complete!")
>>> video.call('ffmpeg', '-i', video, 'output.mp4',
...           callback=on_complete, async_mode=True)
>>> # Returns immediately, callback called when done

>>> # Capture output
>>> pdf = storage.node('documents:report.pdf')
>>> info = pdf.call('pdfinfo', pdf, return_output=True)
>>> print(info)

>>> # With subprocess options
>>> script = storage.node('scripts:process.py')
>>> script.call('python', script, 'arg1', 'arg2',
...            cwd='/tmp', timeout=60, env={'DEBUG': '1'})

Notes

StorageNode arguments use local_path(mode=’rw’) automatically
Files are downloaded before command execution
Modified files are uploaded after command execution
In async mode, cleanup happens in background thread
Use return_output=False for commands with large output
For shell commands, use shell=True in subprocess_kwargs

serve(environ, start_response, download=False, download_name=None, cache_max_age=None)[source]

Serve file via WSGI interface with caching support.

Serves the file through a WSGI application with: - ETag support for caching (304 Not Modified responses) - Content-Disposition headers for downloads - Cache-Control headers - Efficient streaming for large files

Perfect for integrating storage with web frameworks like Flask, Django, Pyramid, or any WSGI application.

Parameters:

environ (dict) – WSGI environment dict (contains HTTP headers, request info)
start_response (callable) – WSGI start_response callable
download (bool) – If True, force download with Content-Disposition: attachment
download_name (str | None) – Custom filename for downloads (default: basename of file)
cache_max_age (int | None) – Cache-Control max-age in seconds (default: no caching)

Returns:

Response body as list of byte chunks (WSGI response)

Return type:

list[bytes]

Raises:

FileNotFoundError – If file doesn’t exist
StorageError – If file cannot be read

Examples

>>> # Flask integration
>>> from flask import Flask, request
>>> app = Flask(__name__)
>>>
>>> @app.route('/files/<path:filepath>')
>>> def serve_file(filepath):
>>>     node = storage.node(f'uploads:{filepath}')
>>>     return node.serve(request.environ, lambda s, h: None,
>>>                       cache_max_age=3600)
>>>
>>> # Download endpoint
>>> @app.route('/download/<path:filepath>')
>>> def download_file(filepath):
>>>     node = storage.node(f'uploads:{filepath}')
>>>     return node.serve(request.environ, lambda s, h: None,
>>>                       download=True,
>>>                       download_name='report.pdf')
>>>
>>> # Plain WSGI application
>>> def application(environ, start_response):
>>>     path = environ['PATH_INFO']
>>>     node = storage.node(f'static:{path}')
>>>     if not node.exists:
>>>         start_response('404 Not Found', [('Content-Type', 'text/plain')])
>>>         return [b'Not Found']
>>>     return node.serve(environ, start_response, cache_max_age=86400)

Notes

ETag is computed as “{mtime}-{size}” for efficient caching
Returns 304 Not Modified when client ETag matches
Uses local_path() for efficient cloud storage serving
Streams large files in chunks (doesn’t load entire file in memory)

get_metadata()[source]

Get custom metadata for this file.

Returns user-defined metadata attached to the file. Supported for cloud storage (S3, GCS, Azure). For local storage, returns empty dict.

Returns:: Metadata key-value pairs
Return type:: dict[str, str]
Raises:: FileNotFoundError – If file doesn’t exist

Examples

>>> file = storage.node('s3:documents/report.pdf')
>>> metadata = file.get_metadata()
>>> print(metadata.get('Author'))
'John Doe'

set_metadata(metadata)[source]

Set custom metadata for this file.

Attaches user-defined metadata to the file. Supported for cloud storage (S3, GCS, Azure). For local storage, raises PermissionError.

Parameters:

metadata (Annotated[dict[str, str], 'Metadata key-value pairs to set']) – Metadata key-value pairs to set

Raises:

FileNotFoundError – If file doesn’t exist
PermissionError – If backend doesn’t support metadata
ValueError – If metadata keys/values are invalid

Examples

>>> file = storage.node('s3:documents/report.pdf')
>>> file.set_metadata({
...     'Author': 'John Doe',
...     'Version': '1.0',
...     'Department': 'Engineering'
... })

Notes

Keys and values must be strings
This typically replaces all existing metadata
Cloud providers may have size/format restrictions

url(expires_in=3600, **kwargs)[source]

Generate public URL for accessing this file.

Returns a URL that can be used to access the file directly. For cloud storage (S3, GCS), generates a presigned/signed URL. For HTTP storage, returns the direct URL. For local storage, returns None.

Parameters:

expires_in (int) – URL expiration time in seconds (default: 3600 = 1 hour)
**kwargs – Backend-specific options

Returns:

Public URL or None if not supported

Return type:

str | None

Examples

>>> # S3 presigned URL
>>> file = storage.node('s3:documents/report.pdf')
>>> url = file.url()
>>> print(url)
'https://bucket.s3.amazonaws.com/documents/report.pdf?X-Amz-...'
>>>
>>> # Custom expiration (24 hours)
>>> url = file.url(expires_in=86400)

Notes

Cloud storage URLs are temporary and expire
Use this for sharing files externally
HTTP URLs are direct (no expiration)

internal_url(nocache=False)[source]

Generate internal/relative URL for this file.

Returns a URL suitable for internal application use. Optionally includes cache busting parameters.

Parameters:: nocache (bool) – If True, append mtime for cache busting
Returns:: Internal URL or None if not supported
Return type:: str | None

Examples

>>> file = storage.node('home:static/app.js')
>>> url = file.internal_url(nocache=True)
>>> print(url)
'/storage/home/static/app.js?mtime=1234567890'

Notes

Useful for web applications
Cache busting helps with CDN/browser caching

property versions: list[dict]

Get list of available versions for this file.

Returns version history for versioned storage (S3 with versioning enabled). For non-versioned storage, returns empty list.

Returns:: List of version info dicts
Return type:: list[dict]

Examples

>>> file = storage.node('s3:documents/report.pdf')
>>> for v in file.versions:
...     print(f"Version {v['version_id']}: {v['last_modified']}")

Notes

Only S3 with versioning enabled returns versions
Empty list if versioning not supported

property version_count: int

Get total number of versions available.

Returns:: Number of versions, or 0 if versioning not supported
Return type:: int

Examples

>>> print(f"File has {node.version_count} versions")

compact_versions(dry_run=False)[source]

Compact version history by removing consecutive duplicates.

Scans version history and removes versions that have identical content to the immediately preceding version. This cleans up unnecessary versions created by repeated writes of the same content, reducing storage costs.

The rule: For each pair of consecutive versions with the same ETag, delete the second (more recent) one, keeping the first (older) one.

Non-consecutive duplicates are preserved to maintain history (e.g., reverting to an earlier state).

Parameters:: dry_run (bool) – If True, only report what would be deleted without actually deleting
Returns:: Number of versions removed (or would be removed if dry_run=True)
Return type:: int
Raises:: PermissionError – If versioning not supported

Examples

>>> # Check what would be removed
>>> count = node.compact_versions(dry_run=True)
>>> print(f"Would remove {count} duplicate versions")

>>> # Actually compact the history
>>> removed = node.compact_versions()
>>> print(f"Removed {removed} redundant versions")

Notes

Only works with backends that support versioning
Requires backend to support version deletion (S3)
Preserves the oldest of each duplicate pair
History of changes is maintained (non-consecutive duplicates kept)
Useful for reducing storage costs on versioned buckets

Example scenario:: v1: content A (etag: xxx) v2: content A (etag: xxx) ← REMOVED (consecutive duplicate) v3: content B (etag: yyy) v4: content B (etag: yyy) ← REMOVED (consecutive duplicate) v5: content A (etag: xxx) ← KEPT (not consecutive to v1, shows revert)

fill_from_url(url, timeout=30)[source]

Download content from URL and write to this file.

Fetches content from the specified URL and writes it to this storage node. Useful for downloading files from the internet into storage.

Parameters:

url (str) – URL to download from (http:// or https://)
timeout (int) – Request timeout in seconds (default: 30)

Raises:

ValueError – If URL is invalid
IOError – If download fails
PermissionError – If storage is read-only

Examples

>>> # Download image from internet
>>> img = storage.node('s3:downloads/logo.png')
>>> img.fill_from_url('https://example.com/logo.png')
>>>
>>> # Download with custom timeout
>>> file = storage.node('local:data.json')
>>> file.fill_from_url('https://api.example.com/data', timeout=60)

Notes

Uses urllib for HTTP requests (no external dependencies)
Overwrites existing file if present
Parent directory must exist or backend must support auto-creation

to_base64(mime=None, include_uri=True)[source]

Encode file content as base64 string.

Converts the file content to a base64-encoded string, optionally formatted as a data URI for direct embedding in HTML/CSS.

Parameters:

mime (str | None) – MIME type to include in data URI (auto-detected if None)
include_uri (bool) – If True, format as data URI; if False, return raw base64

Returns:

Base64-encoded string or data URI

Return type:

str

Raises:

FileNotFoundError – If file doesn’t exist
ValueError – If node is a directory

Examples

>>> # Data URI with auto-detected MIME type
>>> img = storage.node('images:logo.png')
>>> data_uri = img.to_base64()
>>> print(data_uri)
'data:image/png;base64,iVBORw0KGgo...'
>>>
>>> # Raw base64 without URI wrapper
>>> b64 = img.to_base64(include_uri=False)
>>> print(b64)
'iVBORw0KGgo...'
>>>
>>> # Custom MIME type
>>> data_uri = img.to_base64(mime='image/x-icon')

Notes

Useful for embedding small images/files in HTML
MIME type auto-detection based on file extension
Large files will result in very long strings

__repr__()[source]

String representation for debugging.

__str__()[source]

String representation.

__eq__(other)[source]

Compare nodes by content (MD5 hash).

Two nodes are considered equal if they have the same file content, regardless of their path or location. Comparison is done via MD5 hash.

Parameters:: other (object) – Another StorageNode or object to compare
Returns:: True if both nodes have identical content
Return type:: bool

Examples

>>> file1 = storage.node('home:original.txt')
>>> file2 = storage.node('backup:copy.txt')
>>> if file1 == file2:
...     print("Files have identical content")

Notes

Only files can be compared (directories return False)
Non-existent files return False
Comparing with non-StorageNode returns NotImplemented

__ne__(other)[source]

Compare nodes for inequality.

Parameters:: other (object) – Another StorageNode or object to compare
Returns:: True if nodes have different content
Return type:: bool

Examples

>>> if file1 != file2:
...     print("Files differ")

Exceptions

Exception classes for genro-storage.

All exceptions inherit from StorageError base class for easy catching. Exceptions also inherit from standard Python exceptions where appropriate to maintain compatibility with existing code.

exception genro_storage.exceptions.StorageError[source]

Bases: Exception

Base exception for all storage-related errors.

This is the base class that all genro-storage exceptions inherit from. You can catch this to handle any storage-related error.

Examples

>>> try:
...     node.read_bytes()
... except StorageError as e:
...     print(f"Storage error occurred: {e}")

exception genro_storage.exceptions.StorageNotFoundError[source]

Bases: StorageError, FileNotFoundError

Raised when a file, directory, or mount point is not found.

This exception inherits from both StorageError and FileNotFoundError, so it can be caught by either exception type.

Common causes:

Attempting to access a mount point that hasn’t been configured
Reading a file that doesn’t exist
Accessing a path in a non-existent directory

Examples

>>> try:
...     node = storage.node('missing_mount:file.txt')
... except StorageNotFoundError:
...     print("Mount or file not found")

exception genro_storage.exceptions.StoragePermissionError[source]

Bases: StorageError, PermissionError

Raised when a permission-related error occurs.

This exception inherits from both StorageError and PermissionError, so it can be caught by either exception type.

Common causes:

Insufficient permissions to read/write a file
Insufficient AWS/GCS/Azure credentials or permissions
Attempting to write to a read-only storage backend (e.g., HTTP)

Examples

>>> try:
...     node.write_bytes(b'data')
... except StoragePermissionError:
...     print("Permission denied")

exception genro_storage.exceptions.StorageConfigError[source]

Bases: StorageError, ValueError

Raised when configuration is invalid.

This exception inherits from both StorageError and ValueError, so it can be caught by either exception type.

Common causes:

Invalid configuration format (missing required fields)
Unsupported storage backend type
Invalid path format
Malformed YAML/JSON configuration file

Examples

>>> try:
...     storage.configure([{'name': 'test'}])  # missing 'type'
... except StorageConfigError as e:
...     print(f"Configuration error: {e}")

Backend Classes

Base Backend

class genro_storage.backends.StorageBackend[source]

Bases: ABC

Abstract base class for storage backends.

All storage backend implementations (Local, S3, GCS, Azure, HTTP, etc.) must inherit from this class and implement all abstract methods.

This ensures a consistent interface across all storage types and makes it easy to add new backends in the future.

Note

Backend implementations should not be instantiated directly by users. They are created internally by StorageManager based on configuration.

Capability System:: Capabilities are automatically derived from methods decorated with @capability. The decorator populates the _capabilities set during class definition, and __init_subclass__ ensures proper inheritance.

PROTOCOL_CAPABILITIES: dict[str, set[str]] = {}

classmethod __init_subclass__(**kwargs)[source]

Automatically collect and inherit capabilities when subclass is created.

This method is called when a subclass of StorageBackend is defined. It collects PROTOCOL_CAPABILITIES from parent classes and merges them.

classmethod get_capabilities(protocol=None)[source]

Get capability set for a given protocol.

For single-protocol backends (LocalStorage, Base64Backend), protocol parameter is optional and defaults to the backend’s only protocol. For multi-protocol backends (FsspecBackend), protocol must be specified.

Parameters:: protocol (str | None) – Protocol name (e.g., ‘s3’, ‘gcs’, ‘local’, ‘base64’) If None, returns capabilities for the only available protocol
Returns:: Set of capability names
Return type:: set

Examples

>>> # Single-protocol backend
>>> LocalStorage.get_capabilities()  # protocol auto-detected
{'read', 'write', 'delete', 'mkdir', ...}

>>> # Multi-protocol backend
>>> FsspecBackend.get_capabilities('s3')
{'read', 'write', 'metadata', 'presigned_urls', ...}

property capabilities: BackendCapabilities

Return the capabilities of this backend instance.

For single-protocol backends, automatically uses the only protocol. For multi-protocol backends (FsspecBackend), uses self.protocol.

Returns:: Object describing supported features
Return type:: BackendCapabilities

Examples

>>> backend = LocalStorage('/tmp')
>>> caps = backend.capabilities
>>> if caps.versioning:
...     versions = backend.get_versions('file.txt')

classmethod get_json_info(protocol=None)[source]

Return complete backend information in JSON format.

This classmethod can be overridden by backend subclasses to provide complete information including configuration schema, capabilities, and description. This is useful for UI generation and documentation.

The default implementation returns capabilities derived from @capability decorators, but no schema information.

Parameters:: protocol (str | None) – Protocol name for multi-protocol backends (optional for single-protocol)
Returns:: Backend information with schema, capabilities, and description
Return type:: dict

Examples

>>> # Single-protocol backend
>>> info = LocalStorage.get_json_info()
>>> print(info['schema']['fields'])
[{'name': 'path', 'type': 'text', 'required': True}]

>>> # Multi-protocol backend
>>> info = FsspecBackend.get_json_info('s3')
>>> print(info['capabilities']['metadata'])
True

abstractmethod exists(path)[source]

Check if a file or directory exists.

Parameters:: path (str) – Relative path within this storage backend
Returns:: True if file or directory exists
Return type:: bool

Examples

>>> exists = backend.exists('documents/report.pdf')

abstractmethod is_file(path)[source]

Check if path points to a file.

Parameters:: path (str) – Relative path within this storage backend
Returns:: True if path is a file, False otherwise
Return type:: bool

Examples

>>> if backend.is_file('documents/report.pdf'):
...     print("It's a file")

abstractmethod is_dir(path)[source]

Check if path points to a directory.

Parameters:: path (str) – Relative path within this storage backend
Returns:: True if path is a directory, False otherwise
Return type:: bool

Examples

>>> if backend.is_dir('documents'):
...     print("It's a directory")

abstractmethod size(path)[source]

Get file size in bytes.

Parameters:

path (str) – Relative path to file

Returns:

File size in bytes

Return type:

int

Raises:

FileNotFoundError – If file doesn’t exist
ValueError – If path is a directory

Examples

>>> size = backend.size('documents/report.pdf')
>>> print(f"File is {size} bytes")

abstractmethod mtime(path)[source]

Get last modification time.

Parameters:: path (str) – Relative path to file or directory
Returns:: Unix timestamp of last modification
Return type:: float
Raises:: FileNotFoundError – If path doesn’t exist

Examples

>>> from datetime import datetime
>>> timestamp = backend.mtime('documents/report.pdf')
>>> mod_time = datetime.fromtimestamp(timestamp)

abstractmethod open(path, mode='rb')[source]

Open a file and return file-like object.

Parameters:

path (str) – Relative path to file
mode (str) – File mode (‘r’, ‘rb’, ‘w’, ‘wb’, ‘a’, ‘ab’)

Returns:

File-like object supporting context manager

Return type:

BinaryIO | TextIO

Raises:

FileNotFoundError – If file doesn’t exist (in read mode)
PermissionError – If insufficient permissions

Examples

>>> with backend.open('file.txt', 'rb') as f:
...     data = f.read()

abstractmethod read_bytes(path)[source]

Read entire file as bytes.

Parameters:: path (str) – Relative path to file
Returns:: Complete file contents
Return type:: bytes
Raises:: FileNotFoundError – If file doesn’t exist

Examples

>>> data = backend.read_bytes('image.jpg')

abstractmethod read_text(path, encoding='utf-8')[source]

Read entire file as text.

Parameters:

path (str) – Relative path to file
encoding (str) – Text encoding

Returns:

Complete file contents as string

Return type:

str

Raises:

FileNotFoundError – If file doesn’t exist
UnicodeDecodeError – If encoding is incorrect

Examples

>>> content = backend.read_text('document.txt')

abstractmethod write_bytes(path, data)[source]

Write bytes to file.

Parameters:

path (str) – Relative path to file
data (bytes) – Bytes to write

Raises:

PermissionError – If insufficient permissions
FileNotFoundError – If parent directory doesn’t exist

Examples

>>> backend.write_bytes('file.bin', b'Hello')

abstractmethod write_text(path, text, encoding='utf-8')[source]

Write text to file.

Parameters:

path (str) – Relative path to file
text (str) – String to write
encoding (str) – Text encoding

Raises:

PermissionError – If insufficient permissions
FileNotFoundError – If parent directory doesn’t exist

Examples

>>> backend.write_text('file.txt', 'Hello World')

abstractmethod delete(path, recursive=False)[source]

Delete file or directory.

Parameters:

path (str) – Relative path to delete
recursive (bool) – If True, delete directories recursively

Raises:

FileNotFoundError – If path doesn’t exist (implementation may choose to be idempotent)
ValueError – If path is non-empty directory and recursive=False

Examples

>>> backend.delete('file.txt')
>>> backend.delete('folder', recursive=True)

abstractmethod list_dir(path)[source]

List directory contents.

Parameters:

path (str) – Relative path to directory

Returns:

List of names (not full paths) in the directory

Return type:

list[str]

Raises:

FileNotFoundError – If directory doesn’t exist
ValueError – If path is not a directory

Examples

>>> names = backend.list_dir('documents')
>>> for name in names:
...     print(name)  # Just 'report.pdf', not 'documents/report.pdf'

abstractmethod mkdir(path, parents=False, exist_ok=False)[source]

Create directory.

Parameters:

path (str) – Relative path to create
parents (bool) – If True, create parent directories as needed
exist_ok (bool) – If True, don’t error if directory exists

Raises:

FileExistsError – If exists and exist_ok=False
FileNotFoundError – If parent doesn’t exist and parents=False

Examples

>>> backend.mkdir('new_folder')
>>> backend.mkdir('a/b/c', parents=True)

abstractmethod copy(src_path, dest_backend, dest_path)[source]

Copy file/directory to another backend.

This method handles cross-backend copying efficiently, streaming data when possible to avoid loading large files in memory.

Parameters:

src_path (str) – Source path in this backend
dest_backend (StorageBackend) – Destination backend (may be different type)
dest_path (str) – Destination path in dest_backend

Returns:

New destination path if destination backend changes it: (e.g., base64 backend), or None if path unchanged

Return type:

str | None

Raises:

FileNotFoundError – If source doesn’t exist
PermissionError – If insufficient permissions

Examples

>>> # Copy within same backend
>>> backend.copy('file.txt', backend, 'backup/file.txt')
>>>
>>> # Copy to different backend
>>> backend.copy('file.txt', other_backend, 'file.txt')

get_hash(path)[source]

Get MD5 hash from filesystem metadata if available.

This method attempts to retrieve the MD5 hash from the storage backend’s metadata without reading the file content. For cloud storage like S3, this uses the ETag. For local storage, this returns None and the hash must be computed by reading the file.

Parameters:: path (str) – Relative path to file
Returns:: MD5 hash as hexadecimal string, or None if not available
Return type:: str | None

Examples

>>> hash_value = backend.get_hash('file.txt')
>>> if hash_value:
...     print(f"MD5: {hash_value}")

get_metadata(path)[source]

Get custom metadata for a file.

Returns user-defined metadata attached to the file. For cloud storage (S3, GCS, Azure), this retrieves custom metadata stored with the file. For local storage, this typically returns an empty dict or uses extended attributes if supported.

Parameters:: path (str) – Relative path to file
Returns:: Metadata key-value pairs
Return type:: dict[str, str]

Examples

>>> metadata = backend.get_metadata('document.pdf')
>>> print(metadata.get('Content-Type'))
'application/pdf'

Notes

Keys and values are strings
Cloud storage may have restrictions on key names (e.g., lowercase only)
Returns empty dict if no metadata or not supported

set_metadata(path, metadata)[source]

Set custom metadata for a file.

Attaches user-defined metadata to the file. For cloud storage (S3, GCS, Azure), this sets custom metadata that persists with the file. For local storage, this may use extended attributes if supported, or raise PermissionError if not supported.

Parameters:

path (str) – Relative path to file
metadata (dict[str, str]) – Metadata key-value pairs to set

Raises:

FileNotFoundError – If file doesn’t exist
PermissionError – If backend doesn’t support metadata

Examples

>>> backend.set_metadata('document.pdf', {
...     'Content-Type': 'application/pdf',
...     'Author': 'John Doe',
...     'Version': '1.0'
... })

Notes

Keys and values must be strings
Cloud storage may have restrictions (e.g., max metadata size)
This typically replaces all metadata (not merge)

get_versions(path)[source]

Get list of available versions for a file.

Returns version history for versioned storage. Default implementation returns empty list (no versioning support).

Parameters:: path (str) – Relative path to file
Returns:: List of version info dicts
Return type:: list[dict]

Notes

Override in subclasses that support versioning
S3 with versioning enabled can implement this

open_version(path, version_id, mode='rb')[source]

Open a specific version of a file.

Default implementation raises PermissionError. Override in subclasses that support versioning (e.g., S3).

Parameters:

path (str) – Relative path to file
version_id (str) – Version identifier
mode (str) – Open mode (read-only)

Raises:

PermissionError – Always (base implementation)

delete_version(path, version_id)[source]

Delete a specific version of a file.

Removes a specific version from versioned storage. The current version and other versions remain unaffected. This is useful for cleaning up duplicate or unwanted versions.

Default implementation raises PermissionError. Override in subclasses that support versioning (e.g., S3).

Parameters:

path (str) – Relative path to file
version_id (str) – Version identifier to delete

Raises:

PermissionError – If backend doesn’t support versioning
FileNotFoundError – If version doesn’t exist
ValueError – If attempting to delete the only remaining version

Examples

>>> # Delete a specific version
>>> backend.delete_version('file.txt', 'abc123')

Notes

Cannot delete the current version if it’s the only version
Some backends may have restrictions on version deletion
This operation is typically irreversible

url(path, expires_in=3600, **kwargs)[source]

Generate public URL for file access.

Returns a URL that can be used to access the file directly. For cloud storage (S3, GCS, Azure), this generates a presigned URL. For local storage, this returns None or a local file path URL.

Parameters:

path (str) – Relative path to file
expires_in (int) – URL expiration time in seconds (default: 3600 = 1 hour)
**kwargs – Backend-specific options

Returns:

Public URL or None if not supported

Return type:

str | None

Examples

>>> # S3 presigned URL (expires in 1 hour)
>>> url = backend.url('documents/report.pdf')
>>> print(url)
'https://bucket.s3.amazonaws.com/documents/report.pdf?X-Amz-...'
>>>
>>> # Custom expiration (24 hours)
>>> url = backend.url('video.mp4', expires_in=86400)

Notes

Cloud storage URLs are temporary and expire
Local storage typically returns None
HTTP storage returns the direct URL

internal_url(path, nocache=False)[source]

Generate internal/relative URL for file access.

Returns a URL suitable for internal application use, typically relative to the application’s base URL. Optionally includes cache busting parameters.

Parameters:

path (str) – Relative path to file
nocache (bool) – If True, append mtime as query parameter for cache busting

Returns:

Internal URL or None if not supported

Return type:

str | None

Examples

>>> # Simple internal URL
>>> url = backend.internal_url('images/logo.png')
>>> print(url)
'/storage/home/images/logo.png'
>>>
>>> # With cache busting
>>> url = backend.internal_url('app.js', nocache=True)
>>> print(url)
'/storage/home/app.js?mtime=1234567890'

Notes

Useful for web applications
Cache busting helps with CDN/browser caching
Format depends on application configuration

local_path(path, mode='r')[source]

Get a local filesystem path for the file.

Returns a context manager that provides a local filesystem path to the file. For local storage, this returns the actual path. For remote storage (S3, GCS, etc.), this downloads the file to a temporary location, yields the temp path, and uploads changes back on exit if the file was modified.

This is essential for integrating with external tools that only work with local filesystem paths (ffmpeg, ImageMagick, etc.).

Parameters:

path (str) – Relative path to file
mode (str) – Access mode - ‘r’ (read-only), ‘w’ (write-only), ‘rw’ (read-write)

Returns:

Context manager yielding str (local filesystem path)

Examples

>>> # Process remote file with external tool
>>> with backend.local_path('video.mp4', mode='r') as local_path:
...     subprocess.run(['ffmpeg', '-i', local_path, 'output.mp4'])
>>>
>>> # Modify remote file in place
>>> with backend.local_path('image.jpg', mode='rw') as local_path:
...     subprocess.run(['convert', local_path, '-resize', '800x600', local_path])
>>> # Changes automatically uploaded on exit

Notes

For read mode (‘r’), the file is downloaded but not uploaded
For write mode (‘w’), the file is uploaded on exit
For read-write mode (‘rw’), both download and upload occur
Temporary files are automatically cleaned up on exit
For local storage, returns the original path (no copy)

close()[source]

Close backend and release resources.

This method is called when the backend is no longer needed. Implementations should close any open connections, file handles, etc.

The default implementation does nothing. Backends that manage resources should override this method.

Examples

>>> backend.close()

Local Storage

class genro_storage.backends.LocalStorage(path)[source]

Bases: StorageBackend

Local filesystem storage backend.

This backend provides access to files on the local filesystem. All paths are relative to a configured base directory.

The base_path can be either a string or a callable that returns a string. When a callable is provided, it will be evaluated each time the base_path property is accessed, allowing for dynamic paths (e.g., user-specific directories).

Parameters:

path (Union[str, Callable[[], str]]) – Absolute path to the base directory, or callable returning path

Raises:

ValueError – If resolved path is not absolute or not a directory
FileNotFoundError – If resolved path doesn’t exist

Examples

>>> # Static path
>>> backend = LocalStorage('/home/user')
>>>
>>> # Dynamic path with context-based callable (no parameters)
>>> def get_user_dir():
...     user_id = get_current_user()
...     return f'/data/users/{user_id}'
>>> backend = LocalStorage(get_user_dir)
>>>
>>> # Switched mount: callable with prefix parameter
>>> # Single mount behaves like multiple mounts based on first path component
>>> def resource_resolver(prefix):
...     # prefix = 'sys', 'adm', 'gnr', etc.
...     return f'/path/to/{prefix}-package'
>>> backend = LocalStorage(resource_resolver)
>>> # Accessing 'sys/folder/file.txt' routes to '/path/to/sys-package/folder/file.txt'
>>> # Accessing 'adm/folder/file.txt' routes to '/path/to/adm-package/folder/file.txt'
>>>
>>> # Access files relative to base
>>> data = backend.read_bytes('documents/report.pdf')

Note

Switched Mounts: When the callable accepts a parameter, it receives the first path component (prefix) and should return the base directory for that prefix. The backend then appends the remaining path. This allows a single mount to route to different base directories based on the prefix.

__init__(path)[source]

Initialize LocalStorage backend.

Parameters:

path (str | Callable[[], str]) – Absolute path or callable returning absolute path

Raises:

ValueError – If path (string only) is not absolute or not a directory
FileNotFoundError – If path (string only) doesn’t exist

Note

When path is a callable, validation is deferred until first access. This allows configuration before the context (e.g., current user) is available.

property base_path: Path

Get current base path (evaluates callable if needed).

Returns:: Current base path as Path object

classmethod get_json_info()[source]

Return complete backend information in JSON format.

Returns:: Backend information with schema, capabilities, and description.
Return type:: dict

exists(path)[source]

Check if file or directory exists.

is_file(path)[source]

Check if path points to a file.

is_dir(path)[source]

Check if path points to a directory.

size(path)[source]

Get file size in bytes.

mtime(path)[source]

Get last modification time.

open(path, mode='rb')[source]

Open file and return file-like object.

read_bytes(path)[source]

Read entire file as bytes.

read_text(path, encoding='utf-8')[source]

Read entire file as text.

write_bytes(path, data)[source]

Write bytes to file.

write_text(path, text, encoding='utf-8')[source]

Write text to file.

delete(path, recursive=False)[source]

Delete file or directory.

list_dir(path)[source]

List directory contents.

mkdir(path, parents=False, exist_ok=False)[source]

Create directory.

copy(src_path, dest_backend, dest_path)[source]

Copy file/directory to another backend.

For local-to-local copies, uses efficient filesystem operations. For copies to other backends, streams the data.

local_path(path, mode='r')[source]

Get local filesystem path (returns the actual path).

For local storage, this simply returns the actual filesystem path since the file is already local. No temporary copy is needed.

Parameters:

path (str) – Relative path to file
mode (str) – Access mode (ignored for local storage)

Returns:

Context manager yielding str (the actual filesystem path)

Examples

>>> with backend.local_path('video.mp4') as local_path:
...     subprocess.run(['ffmpeg', '-i', local_path, 'out.mp4'])

__repr__()[source]

String representation.

PROTOCOL_CAPABILITIES: dict[str, set[str]] = {'local': {'append_mode', 'atomic_operations', 'copy_optimization', 'delete', 'list_dir', 'mkdir', 'read', 'seek_support', 'write'}}

Base64 Backend

class genro_storage.backends.Base64Backend[source]

Bases: StorageBackend

Storage backend that decodes base64 data from the path/URI.

This backend treats the path as base64-encoded data and provides read-only access to the decoded content. It’s useful for embedding small amounts of data directly in URIs without requiring actual file storage.

_creation_time: Fixed timestamp for mtime() calls

__init__()[source]

Initialize the Base64 backend.

property capabilities: BackendCapabilities

Return the capabilities of this backend.

Overrides the base implementation to add base64-specific meta-capabilities.

classmethod get_json_info()[source]

Return complete backend information in JSON format.

Returns:: Backend information with schema, capabilities, and description.
Return type:: dict

exists(path)[source]

Check if the base64 data is valid.

Parameters:: path (str) – Base64-encoded string
Returns:: True if valid base64, False otherwise
Return type:: bool

is_file(path)[source]

Check if path is a valid base64 file.

Parameters:: path (str) – Base64-encoded string
Returns:: True if valid base64 (treated as a file)
Return type:: bool

is_dir(path)[source]

Check if path is a directory.

Parameters:: path (str) – Base64-encoded string
Returns:: Always False (base64 backend has no directories)
Return type:: bool

size(path)[source]

Get size of decoded data in bytes.

Parameters:: path (str) – Base64-encoded string
Returns:: Size of decoded data
Raises:: FileNotFoundError – If invalid base64
Return type:: int

mtime(path)[source]

Get modification time.

Parameters:: path (str) – Base64-encoded string
Returns:: Fixed timestamp (base64 data has no modification time)
Raises:: FileNotFoundError – If invalid base64
Return type:: float

open(path, mode='rb')[source]

Open base64 data as file-like object.

Parameters:

path (str) – Base64-encoded string (ignored for write modes)
mode (str) – Open mode (‘rb’, ‘r’, ‘wb’, ‘w’, ‘ab’, ‘a’)

Returns:

File-like object (BytesIO or StringIO)

Raises:

FileNotFoundError – If invalid base64 (read modes only)

Return type:

BinaryIO | TextIO

Note

Write modes return empty BytesIO/StringIO. The caller must handle retrieving the content and calling write_bytes/write_text to get the new base64 path.

read_bytes(path)[source]

Read and decode base64 data.

Parameters:: path (str) – Base64-encoded string
Returns:: Decoded bytes
Raises:: FileNotFoundError – If invalid base64
Return type:: bytes

read_text(path, encoding='utf-8')[source]

Read and decode base64 data as text.

Parameters:

path (str) – Base64-encoded string
encoding (str) – Text encoding (default: utf-8)

Returns:

Decoded text string

Raises:

FileNotFoundError – If invalid base64
UnicodeDecodeError – If data is not valid text

Return type:

str

write_bytes(path, data)[source]

Write bytes to base64 node.

Creates a new base64-encoded string from the data. The path parameter is ignored as the base64 content itself becomes the new path.

Parameters:

path (str) – Ignored (base64 backend is pathless)
data (bytes) – Bytes to encode

Returns:

New base64-encoded path

Return type:

str

Note

This operation changes the node’s path to the new base64 string. The old path becomes invalid.

Examples

>>> new_path = backend.write_bytes("old", b"Hello")
>>> # new_path is now "SGVsbG8=" (base64 of "Hello")

write_text(path, text, encoding='utf-8')[source]

Write text to base64 node.

Creates a new base64-encoded string from the text. The path parameter is ignored as the base64 content itself becomes the new path.

Parameters:

path (str) – Ignored (base64 backend is pathless)
text (str) – String to encode
encoding (str) – Text encoding (default: utf-8)

Returns:

New base64-encoded path

Return type:

str

Note

This operation changes the node’s path to the new base64 string. The old path becomes invalid.

Examples

>>> new_path = backend.write_text("old", "Hello World")
>>> # new_path is now "SGVsbG8gV29ybGQ=" (base64 of "Hello World")

delete(path, recursive=False)[source]

Delete operation not supported.

Parameters:

path (str) – Unused
recursive (bool) – Unused

Raises:

PermissionError – Always (read-only backend)

list_dir(path)[source]

List directory contents.

Parameters:: path (str) – Base64-encoded string
Returns:: Empty list
Raises:: ValueError – Always (no directories in base64 backend)
Return type:: list[str]

mkdir(path, parents=False, exist_ok=False)[source]

Create directory operation not supported.

Parameters:

path (str) – Unused
parents (bool) – Unused
exist_ok (bool) – Unused

Raises:

PermissionError – Always (read-only backend)

copy(src_path, dest_backend, dest_path)[source]

Copy base64 data to another backend.

This decodes the base64 data and writes it to the destination backend.

Parameters:

src_path (str) – Base64-encoded source data
dest_backend (StorageBackend) – Destination backend
dest_path (str) – Destination path

Returns:

New destination path if destination backend changes it,: or None if path unchanged

Return type:

str | None

Raises:

FileNotFoundError – If invalid base64

get_hash(path)[source]

Get MD5 hash of decoded data.

Parameters:: path (str) – Base64-encoded string
Returns:: MD5 hash of decoded data
Raises:: FileNotFoundError – If invalid base64
Return type:: str | None

local_path(path, mode='r')[source]

Get local filesystem path for base64 data.

Creates a temporary file with the decoded base64 content. Since Base64Backend is read-only, write modes are not supported.

Parameters:

path (str) – Base64-encoded string
mode (str) – Access mode (only ‘r’ is supported)

Returns:

Context manager yielding str (temp file path)

Raises:

PermissionError – If mode is not ‘r’
FileNotFoundError – If invalid base64

Examples

>>> # Use base64 data with external tool
>>> node = storage.node('b64:SGVsbG8gV29ybGQ=')
>>> with node.local_path() as path:
...     subprocess.run(['cat', path])

PROTOCOL_CAPABILITIES: dict[str, set[str]] = {'base64': {'read', 'seek_support'}}

API Reference

StorageManager

StorageNode

Exceptions

Backend Classes

Base Backend

Local Storage

Base64 Backend

Fsspec Backend