Skip to content

format

allennlp.tango.format

[SOURCE]


AllenNLP Tango is an experimental API and parts of it might change or disappear every time we release a new version.

T

T = TypeVar("T")

Format

class Format(Registrable,  Generic[T])

Formats write objects to directories and read them back out.

In the context of AllenNLP, the objects that are written by formats are usually results from Steps.

VERSION

class Format(Registrable,  Generic[T]):
 | ...
 | VERSION: int = NotImplemented

Formats can have versions. Versions are part of a step's unique signature, part of Step.unique_id(), so when a step's format changes, that will cause the step to be recomputed.

default_implementation

class Format(Registrable,  Generic[T]):
 | ...
 | default_implementation = "dill"

write

class Format(Registrable,  Generic[T]):
 | ...
 | @abstractmethod
 | def write(self, artifact: T, dir: Union[str, PathLike])

Writes the artifact to the directory at dir.

read

class Format(Registrable,  Generic[T]):
 | ...
 | @abstractmethod
 | def read(self, dir: Union[str, PathLike]) -> T

Reads an artifact from the directory at dir and returns it.

checksum

class Format(Registrable,  Generic[T]):
 | ...
 | def checksum(self, dir: Union[str, PathLike]) -> str

Produces a checksum of a serialized artifact.

The default checksum mechanism computes a checksum of all the files in the directory except for metadata.json.

DillFormat

@Format.register("dill")
class DillFormat(Format[T],  Generic[T]):
 | def __init__(self, compress: Optional[str] = None)

This format writes the artifact as a single file using dill (a drop-in replacement for pickle). Optionally, it can compress the data. This is very flexible, but not always the fastest.

This format has special support for iterables. If you write an iterator, it will consume the iterator. If you read an iterator, it will read the iterator lazily.

VERSION

class DillFormat(Format[T],  Generic[T]):
 | ...
 | VERSION = 1

write

class DillFormat(Format[T],  Generic[T]):
 | ...
 | def write(self, artifact: T, dir: Union[str, PathLike])

read

class DillFormat(Format[T],  Generic[T]):
 | ...
 | def read(self, dir: Union[str, PathLike]) -> T

DillFormatIterator

class DillFormatIterator(Iterator[T],  Generic[T]):
 | def __init__(self, filename: Union[str, PathLike])

This class is used so we can return an iterator from DillFormat.read().

__iter__

class DillFormatIterator(Iterator[T],  Generic[T]):
 | ...
 | def __iter__(self) -> Iterator[T]

JsonFormat

@Format.register("json")
class JsonFormat(Format[T],  Generic[T]):
 | def __init__(self, compress: Optional[str] = None)

This format writes the artifact as a single file in json format. Optionally, it can compress the data. This is very flexible, but not always the fastest.

This format has special support for iterables. If you write an iterator, it will consume the iterator. If you read an iterator, it will read the iterator lazily.

VERSION

class JsonFormat(Format[T],  Generic[T]):
 | ...
 | VERSION = 2

write

class JsonFormat(Format[T],  Generic[T]):
 | ...
 | def write(self, artifact: T, dir: Union[str, PathLike])

read

class JsonFormat(Format[T],  Generic[T]):
 | ...
 | def read(self, dir: Union[str, PathLike]) -> T

JsonFormatIterator

class JsonFormatIterator(Iterator[T],  Generic[T]):
 | def __init__(self, filename: Union[str, PathLike])

This class is used so we can return an iterator from JsonFormat.read().

__iter__

class JsonFormatIterator(Iterator[T],  Generic[T]):
 | ...
 | def __iter__(self) -> Iterator[T]

TorchFormat

@Format.register("torch")
class TorchFormat(Format[T],  Generic[T])

This format writes the artifact using torch.save().

Unlike DillFormat, this has no special support for iterators.

VERSION

class TorchFormat(Format[T],  Generic[T]):
 | ...
 | VERSION = 2

write

class TorchFormat(Format[T],  Generic[T]):
 | ...
 | def write(self, artifact: T, dir: Union[str, PathLike])

read

class TorchFormat(Format[T],  Generic[T]):
 | ...
 | def read(self, dir: Union[str, PathLike]) -> T