Skip to content

text_only

allennlp.tango.text_only

[SOURCE]


AllenNLP Tango is an experimental API and parts of it might change or disappear every time we release a new version.

TextOnlyDataset

@Step.register("text_only")
class TextOnlyDataset(Step)

This step converts a dataset into another dataset that contains only the strings from the original dataset.

You can specify exactly which fields to keep from the original dataset (default is all of them). You can specify a minimum length of string to keep, to filter out strings that are too short.

DETERMINISTIC

class TextOnlyDataset(Step):
 | ...
 | DETERMINISTIC = True

run

class TextOnlyDataset(Step):
 | ...
 | def run(
 |     self,
 |     input: DatasetDict,
 |     *,
 |     fields_to_keep: Optional[Set[str]] = None,
 |     min_length: Optional[int] = None
 | ) -> DatasetDict

Turns the input dataset into another dataset that contains only the strings from the original dataset.

  • fields_to_keep is an optional list of field names that you want to keep in the result. If this is None, all fields are kept.
  • min_length specifies the minimum length that a string must have to be part of the result. If this is None, all strings are considered.