stanford_sentiment_tree_bank
allennlp_models.classification.dataset_readers.stanford_sentiment_tree_bank
StanfordSentimentTreeBankDatasetReader#
@DatasetReader.register("sst_tokens")
class StanfordSentimentTreeBankDatasetReader(DatasetReader):
| def __init__(
| self,
| token_indexers: Dict[str, TokenIndexer] = None,
| tokenizer: Optional[Tokenizer] = None,
| use_subtrees: bool = False,
| granularity: str = "5-class",
| **kwargs
| ) -> None
Reads tokens and their sentiment labels from the Stanford Sentiment Treebank.
The Stanford Sentiment Treebank comes with labels
from 0 to 4. "5-class"
uses these labels as is. "3-class"
converts the
problem into one of identifying whether a sentence is negative, positive, or
neutral sentiment. In this case, 0 and 1 are grouped as label 0 (negative sentiment),
2 is converted to label 1 (neutral sentiment) and 3 and 4 are grouped as label 2
(positive sentiment). "2-class"
turns it into a binary classification problem
between positive and negative sentiment. 0 and 1 are grouped as the label 0
(negative sentiment), 2 (neutral) is discarded, and 3 and 4 are grouped as the label 1
(positive sentiment).
Expected format for each input line: a linearized tree, where nodes are labeled by their sentiment.
The output of read
is a list of Instance
s with the fields:
tokens : TextField
and
label : LabelField
Registered as a DatasetReader
with name "sst_tokens".
Parameters¶
- token_indexers :
Dict[str, TokenIndexer]
, optional (default ={"tokens": SingleIdTokenIndexer()}
)
We use this to define the input representation for the text. SeeTokenIndexer
. - use_subtrees :
bool
, optional (default =False
)
Whether or not to use sentiment-tagged subtrees. - granularity :
str
, optional (default ="5-class"
)
One of"5-class"
,"3-class"
, or"2-class"
, indicating the number of sentiment labels to use.
text_to_instance#
class StanfordSentimentTreeBankDatasetReader(DatasetReader):
| ...
| def text_to_instance(
| self,
| tokens: List[str],
| sentiment: str = None
| ) -> Optional[Instance]
We take pre-tokenized
input here, because we might not have a tokenizer in this class.
Parameters¶
- tokens :
List[str]
The tokens in a given sentence. - sentiment :
str
, optional (default =None
)
The sentiment for this sentence.
Returns¶
- An
Instance
containing the following fields:
tokens :TextField
The tokens in the sentence or phrase. label :LabelField
The sentiment label of the sentence or phrase.
apply_token_indexers#
class StanfordSentimentTreeBankDatasetReader(DatasetReader):
| ...
| def apply_token_indexers(self, instance: Instance) -> None