AllenNLP Tango is an experimental API and parts of it might change or disappear every time we release a new version.
@Step.register("text_only") class TextOnlyDataset(Step)
This step converts a dataset into another dataset that contains only the strings from the original dataset.
You can specify exactly which fields to keep from the original dataset (default is all of them). You can specify a minimum length of string to keep, to filter out strings that are too short.
class TextOnlyDataset(Step): | ... | DETERMINISTIC = True
class TextOnlyDataset(Step): | ... | def run( | self, | input: DatasetDict, | *, | fields_to_keep: Optional[Set[str]] = None, | min_length: Optional[int] = None | ) -> DatasetDict
input dataset into another dataset that contains only the strings from the
fields_to_keepis an optional list of field names that you want to keep in the result. If this is
None, all fields are kept.
min_lengthspecifies the minimum length that a string must have to be part of the result. If this is
None, all strings are considered.