StanfordSentimentTreeBankDatasetReader( self, token_indexers: Dict[str, allennlp.data.token_indexers.token_indexer.TokenIndexer] = None, use_subtrees: bool = False, granularity: str = '5-class', kwargs, ) -> None
Reads tokens and their sentiment labels from the Stanford Sentiment Treebank.
The Stanford Sentiment Treebank comes with labels
from 0 to 4.
"5-class" uses these labels as is.
"3-class" converts the
problem into one of identifying whether a sentence is negative, positive, or
neutral sentiment. In this case, 0 and 1 are grouped as label 0 (negative sentiment),
2 is converted to label 1 (neutral sentiment) and 3 and 4 are grouped as label 2
"2-class" turns it into a binary classification problem
between positive and negative sentiment. 0 and 1 are grouped as the label 0
(negative sentiment), 2 (neutral) is discarded, and 3 and 4 are grouped as the label 1
Expected format for each input line: a linearized tree, where nodes are labeled by their sentiment.
The output of
read is a list of
Instance s with the fields:
Registered as a
DatasetReader with name "sst_tokens".
- token_indexers :
Dict[str, TokenIndexer], optional (default=
- We use this to define the input representation for the text. See :class:
- use_subtrees :
bool, optional, (default =
False) Whether or not to use sentiment-tagged subtrees.
- granularity :
str, optional (default =
"5-class") One of
"2-class", indicating the number of sentiment labels to use.
StanfordSentimentTreeBankDatasetReader.text_to_instance( self, tokens: List[str], sentiment: str = None, ) -> allennlp.data.instance.Instance
pre-tokenized input here, because we don't have a tokenizer in this class.
- tokens :
List[str], required. The tokens in a given sentence.
- sentiment :
str, optional, (default = None). The sentiment for this sentence.
containing the following fields:
The tokens in the sentence or phrase.
The sentiment label of the sentence or phrase.