format
allennlp.tango.format
AllenNLP Tango is an experimental API and parts of it might change or disappear every time we release a new version.
T¶
T = TypeVar("T")
Format¶
class Format(Registrable, Generic[T])
Formats write objects to directories and read them back out.
In the context of AllenNLP, the objects that are written by formats are usually
results from Step
s.
VERSION¶
class Format(Registrable, Generic[T]):
| ...
| VERSION: int = NotImplemented
Formats can have versions. Versions are part of a step's unique signature, part of Step.unique_id()
,
so when a step's format changes, that will cause the step to be recomputed.
default_implementation¶
class Format(Registrable, Generic[T]):
| ...
| default_implementation = "dill"
write¶
class Format(Registrable, Generic[T]):
| ...
| @abstractmethod
| def write(self, artifact: T, dir: Union[str, PathLike])
Writes the artifact
to the directory at dir
.
read¶
class Format(Registrable, Generic[T]):
| ...
| @abstractmethod
| def read(self, dir: Union[str, PathLike]) -> T
Reads an artifact from the directory at dir
and returns it.
checksum¶
class Format(Registrable, Generic[T]):
| ...
| def checksum(self, dir: Union[str, PathLike]) -> str
Produces a checksum of a serialized artifact.
The default checksum mechanism computes a checksum of all the files in the
directory except for metadata.json
.
DillFormat¶
@Format.register("dill")
class DillFormat(Format[T], Generic[T]):
| def __init__(self, compress: Optional[str] = None)
This format writes the artifact as a single file using dill (a drop-in replacement for pickle). Optionally, it can compress the data. This is very flexible, but not always the fastest.
This format has special support for iterables. If you write an iterator, it will consume the iterator. If you read an iterator, it will read the iterator lazily.
VERSION¶
class DillFormat(Format[T], Generic[T]):
| ...
| VERSION = 1
write¶
class DillFormat(Format[T], Generic[T]):
| ...
| def write(self, artifact: T, dir: Union[str, PathLike])
read¶
class DillFormat(Format[T], Generic[T]):
| ...
| def read(self, dir: Union[str, PathLike]) -> T
DillFormatIterator¶
class DillFormatIterator(Iterator[T], Generic[T]):
| def __init__(self, filename: Union[str, PathLike])
This class is used so we can return an iterator from DillFormat.read()
.
__iter__¶
class DillFormatIterator(Iterator[T], Generic[T]):
| ...
| def __iter__(self) -> Iterator[T]
JsonFormat¶
@Format.register("json")
class JsonFormat(Format[T], Generic[T]):
| def __init__(self, compress: Optional[str] = None)
This format writes the artifact as a single file in json format. Optionally, it can compress the data. This is very flexible, but not always the fastest.
This format has special support for iterables. If you write an iterator, it will consume the iterator. If you read an iterator, it will read the iterator lazily.
VERSION¶
class JsonFormat(Format[T], Generic[T]):
| ...
| VERSION = 2
write¶
class JsonFormat(Format[T], Generic[T]):
| ...
| def write(self, artifact: T, dir: Union[str, PathLike])
read¶
class JsonFormat(Format[T], Generic[T]):
| ...
| def read(self, dir: Union[str, PathLike]) -> T
JsonFormatIterator¶
class JsonFormatIterator(Iterator[T], Generic[T]):
| def __init__(self, filename: Union[str, PathLike])
This class is used so we can return an iterator from JsonFormat.read()
.
__iter__¶
class JsonFormatIterator(Iterator[T], Generic[T]):
| ...
| def __iter__(self) -> Iterator[T]
TorchFormat¶
@Format.register("torch")
class TorchFormat(Format[T], Generic[T])
This format writes the artifact using torch.save().
Unlike DillFormat
, this has no special support for iterators.
VERSION¶
class TorchFormat(Format[T], Generic[T]):
| ...
| VERSION = 2
write¶
class TorchFormat(Format[T], Generic[T]):
| ...
| def write(self, artifact: T, dir: Union[str, PathLike])
read¶
class TorchFormat(Format[T], Generic[T]):
| ...
| def read(self, dir: Union[str, PathLike]) -> T