Advanced Usage

Convenience Methods

The Container class provides a couple of convenience methods, which make it behave very similar to a dictionary:

>>> dc = Container(items=items)
>>> dc["content.json"]["uuid"]
'306e2c2d-a9f6-4306-8851-1ee0fceeb852'
>>> dc["log/console.txt"] = "Hello World!"
>>> "log/console.txt" in dc
True
>>> del dc["log/console.txt"]
>>> "log/console.txt" in dc
False

The method keys() returns a list of all full item names including the respective parts, values() a list of all item objects, and items() a list of all (name, item) tuples as you would expect from a dictionary object.

You may use the method hash() to calculate an SHA256 hash of the container content. The hex digest of this value is stored in the attribute hash of the item container.json.

Container objects generated from an items dictionary using the parameter items=... are mutable, which means that you can add, modify and delete items. As soon as you call one of the methods write(), upload(), freeze(), or hash(), the container becomes immutable. Containers loaded from a local file or a server are immutable as well.

An immutable container will throw an exception if you try to modify its content. However, this feature is not bulletproof. The Container class is not aware of any internal modifications of item objects.

You can convert an immutable container into a mutable one by calling its method release(). This generates a new UUID and resets the attributes replaces, created, storageTime, hash and modelVersion.

Server Storage

It is most convenient to store the server name and the API key in the configuration file. However, both values can also be specified as method parameters:

>>> dc.upload(server="...", key="...")
>>> dc = Container(uuid="306e2c2d-a9f6-4306-8851-1ee0fceeb852", server="...", key="...")

Files Items

To add an already existing file to the container (without loading it into the memory) assign a dictionary to an item with its path key being an instance of pathlib.Path. This file won’t be encoded during write() or upload() but will instead be added chunkwise to the container:

>>> dc["test.hdf5"] = {"path": pathlib.Path("/in/filesystem/path.hdf5"), "compression": zipfile.ZIP_DEFLATED, "compression_level": 9}

compression and compression_level are optional. If not specified, the default settings of the container will be used.

If you don’t want to load all items of a container into memory, you can use the ignore_items parameter. This will skip the specified items during decoding:

>>> dc = Container(file="...", ignore_items=["data/test.hdf5", "log/test.txt"])
>>> # print(dc["data/test.hdf5"]) <- this would throw an error!

If the container was loaded with ignore_items, it is not mutable! This could cause problems if a file was read and then deleted or modified between reading and writing.

File Formats

The Container class can handle virtually any file format. However, in order to store and read a certain file format, it needs to know how to convert the respective Python object into a bytes stream and vice versa. File formats are identified by their file extension. The following file extensions are currently supported by the package scidatacontainer out of the box:

Extension

File format

Python object

Required packages

.json

JSON file (UTF-8 encoding)

dictionary or others

.txt

Text file (UTF-8 encoding)

string

.log

Text file (UTF-8 encoding)

string

.pgm

Text file (UTF-8 encoding)

string

.png

PNG image file

NumPy array

cv2, numpy

.npy

NumPy array

NumPy array

numpy

.hdf5

HDF5 container

NumPy array/dict of NumPy arrays

h5py, numpy

.bin

Raw binary data file

bytes

Native support for image and NumPy objects is only available when your Python environment contains the packages cv2 and/or numpy. The container class tries to guess the format of items with unknown extension. However, it is more reliable to use the function register() to add alternative file extensions to already known file formats. The following commands will register the extension .py as a text file:

>>> from scidatacontainer import register
>>> register("py", "txt")

If you want to register another Python object, you need to provide a conversion class which can convert this object to and from a bytes string. This class should be inherited from the class AbstractFile. The storage of NumPy arrays for example may be realized by the following code:

 1import io
 2
 3import numpy as np
 4from scidatacontainer import AbstractFile, register
 5
 6
 7class NpyFile(AbstractFile):
 8    """Data conversion class for NumPy arrays (ndarray)."""
 9
10    allow_pickle = False
11
12    def encode(self):
13        """Convert NumPy array to bytes string."""
14
15        with io.BytesIO() as fp:
16            np.save(fp, self.data, allow_pickle=self.allow_pickle)
17            fp.seek(0)
18            data = fp.read()
19        return data
20
21    def decode(self, data):
22        """Decode NumPy array from bytes string."""
23
24        with io.BytesIO() as fp:
25            fp.write(data)
26            fp.seek(0)
27            self.data = np.load(fp, allow_pickle=self.allow_pickle)
28
29
30register("npy", NpyFile, np.ndarray)

The third argument of the function register() sets this conversion class as default for NumPy array objects overriding any previous default class. This argument is optional.

Hash values are usually derived from the bytes string of an encoded object. If you require a different behaviour, you may also override the method hash() of the class AbstractFile.