allennlp.semparse.contexts¶
A KnowledgeGraph
is a graphical representation of some structured knowledge source: say a
table, figure or an explicit knowledge base.
-
class
allennlp.semparse.contexts.knowledge_graph.
KnowledgeGraph
(entities: Set[str], neighbors: Dict[str, List[str]], entity_text: Dict[str, str] = None)[source]¶ Bases:
object
A
KnowledgeGraph
represents a collection of entities and their relationships.The
KnowledgeGraph
currently stores (untyped) neighborhood information and text representations of each entity (if there is any).The knowledge base itself can be a table (like in WikitableQuestions), a figure (like in NLVR) or some other structured knowledge source. This abstract class needs to be inherited for implementing the functionality appropriate for a given KB.
All of the parameters listed below are stored as public attributes.
- Parameters
- entities
Set[str]
The string identifiers of the entities in this knowledge graph. We sort this set and store it as a list. The sorting is so that we get a guaranteed consistent ordering across separate runs of the code.
- neighbors
Dict[str, List[str]]
A mapping from string identifiers to other string identifiers, denoting which entities are neighbors in the graph.
- entity_text
Dict[str, str]
If you have additional text associated with each entity (other than its string identifier), you can store that here. This might be, e.g., the text in a table cell, or the description of a wikipedia entity.
- entities
-
class
allennlp.semparse.contexts.table_question_context.
TableQuestionContext
(table_data: List[Dict[str, Union[str, float, allennlp.semparse.common.date.Date]]], column_name_type_mapping: Dict[str, Set[str]], column_names: Set[str], question_tokens: List[allennlp.data.tokenizers.token.Token])[source]¶ Bases:
object
Representation of table context similar to the one used by Memory Augmented Policy Optimization (MAPO, Liang et al., 2018). Most of the functionality is a reimplementation of https://github.com/crazydonkey200/neural-symbolic-machines/blob/master/table/wtq/preprocess.py for extracting entities from a question given a table and type its columns with <string> | <date> | <number>
-
classmethod
get_table_data_from_tagged_lines
(lines: List[List[str]]) → Tuple[List[Dict[str, Dict[str, str]]], Dict[str, Set[str]]][source]¶
-
classmethod
get_table_data_from_untagged_lines
(lines: List[List[str]]) → Tuple[List[Dict[str, Dict[str, str]]], Dict[str, Set[str]]][source]¶ This method will be called only when we do not have tagged information from CoreNLP. That is, when we are running the parser on data outside the WikiTableQuestions dataset. We try to do the same processing that CoreNLP does for WTQ, but what we do here may not be as effective.
-
get_table_knowledge_graph
(self) → allennlp.semparse.contexts.knowledge_graph.KnowledgeGraph[source]¶
-
static
normalize_string
(string: str) → str[source]¶ These are the transformation rules used to normalize cell in column names in Sempre. See
edu.stanford.nlp.sempre.tables.StringNormalizationUtils.characterNormalize
andedu.stanford.nlp.sempre.tables.TableTypeSystem.canonicalizeName
. We reproduce those rules here to normalize and canonicalize cells and columns in the same way so that we can match them against constants in logical forms appropriately.
-
classmethod
-
allennlp.semparse.contexts.atis_tables.
convert_to_string_list_value_dict
(trigger_dict: Dict[str, int]) → Dict[str, List[str]][source]¶
-
allennlp.semparse.contexts.atis_tables.
digit_to_query_time
(digit: str) → List[int][source]¶ Given a digit in the utterance, return a list of the times that it corresponds to.
-
allennlp.semparse.contexts.atis_tables.
get_approximate_times
(times: List[int]) → List[int][source]¶ Given a list of times that follow a word such as
about
, we return a list of times that could appear in the query as a result of this. For example ifabout 7pm
appears in the utterance, then we also want to add1830
and1930
.
-
allennlp.semparse.contexts.atis_tables.
get_costs_from_utterance
(utterance: str, tokenized_utterance: List[allennlp.data.tokenizers.token.Token]) → Dict[str, List[int]][source]¶
-
allennlp.semparse.contexts.atis_tables.
get_date_from_utterance
(tokenized_utterance: List[allennlp.data.tokenizers.token.Token], year: int = 1993) → List[datetime.datetime][source]¶ When the year is not explicitly mentioned in the utterance, the query assumes that it is 1993 so we do the same here. If there is no mention of the month or day then we do not return any dates from the utterance.
-
allennlp.semparse.contexts.atis_tables.
get_flight_numbers_from_utterance
(utterance: str, tokenized_utterance: List[allennlp.data.tokenizers.token.Token]) → Dict[str, List[int]][source]¶
-
allennlp.semparse.contexts.atis_tables.
get_numbers_from_utterance
(utterance: str, tokenized_utterance: List[allennlp.data.tokenizers.token.Token]) → Dict[str, List[int]][source]¶ Given an utterance, this function finds all the numbers that are in the action space. Since we need to keep track of linking scores, we represent the numbers as a dictionary, where the keys are the string representation of the number and the values are lists of the token indices that triggers that number.
-
allennlp.semparse.contexts.atis_tables.
get_time_range_end_from_utterance
(utterance: str, tokenized_utterance: List[allennlp.data.tokenizers.token.Token]) → Dict[str, List[int]][source]¶
-
allennlp.semparse.contexts.atis_tables.
get_time_range_start_from_utterance
(utterance: str, tokenized_utterance: List[allennlp.data.tokenizers.token.Token]) → Dict[str, List[int]][source]¶
-
allennlp.semparse.contexts.atis_tables.
get_times_from_utterance
(utterance: str, char_offset_to_token_index: Dict[int, int], indices_of_approximate_words: Set[int]) → Dict[str, List[int]][source]¶ Given an utterance, we get the numbers that correspond to times and convert them to values that may appear in the query. For example: convert
7pm
to1900
.
-
allennlp.semparse.contexts.atis_tables.
get_trigger_dict
(trigger_lists: List[List[str]], trigger_dicts: List[Dict[str, List[str]]]) → Dict[str, List[str]][source]¶
An AtisSqlTableContext
represents the SQL context in which an utterance appears
for the Atis dataset, with the grammar and the valid actions.
-
class
allennlp.semparse.contexts.atis_sql_table_context.
AtisSqlTableContext
(all_tables: Dict[str, List[str]] = None, tables_with_strings: Dict[str, List[str]] = None, database_file: str = None)[source]¶ Bases:
object
An
AtisSqlTableContext
represents the SQL context with a grammar of SQL and the valid actions based on the schema of the tables that it represents.- Parameters
- all_tables: ``Dict[str, List[str]]``
A dictionary representing the SQL tables in the dataset, the keys are the names of the tables that map to lists of the table’s column names.
- tables_with_strings: ``Dict[str, List[str]]``
A dictionary representing the SQL tables that we want to generate strings for. The keys are the names of the tables that map to lists of the table’s column names.
- database_file
str
, optional The directory to find the sqlite database file. We query the sqlite database to find the strings that are allowed.
A Text2SqlTableContext
represents the SQL context in which an utterance appears
for the any of the text2sql datasets, with the grammar and the valid actions.
-
allennlp.semparse.contexts.text2sql_table_context.
update_grammar_numbers_and_strings_with_variables
(grammar_dictionary: Dict[str, List[str]], prelinked_entities: Dict[str, Dict[str, str]], columns: Dict[str, allennlp.data.dataset_readers.dataset_utils.text2sql_utils.TableColumn]) → None[source]¶
-
allennlp.semparse.contexts.text2sql_table_context.
update_grammar_to_be_variable_free
(grammar_dictionary: Dict[str, List[str]])[source]¶ SQL is a predominately variable free language in terms of simple usage, in the sense that most queries do not create references to variables which are not already static tables in a dataset. However, it is possible to do this via derived tables. If we don’t require this functionality, we can tighten the grammar, because we don’t need to support aliased tables.
-
allennlp.semparse.contexts.text2sql_table_context.
update_grammar_values_with_variables
(grammar_dictionary: Dict[str, List[str]], prelinked_entities: Dict[str, Dict[str, str]]) → None[source]¶
-
allennlp.semparse.contexts.text2sql_table_context.
update_grammar_with_global_values
(grammar_dictionary: Dict[str, List[str]], dataset_name: str)[source]¶
-
allennlp.semparse.contexts.text2sql_table_context.
update_grammar_with_table_values
(grammar_dictionary: Dict[str, List[str]], schema: Dict[str, List[allennlp.data.dataset_readers.dataset_utils.text2sql_utils.TableColumn]], cursor: sqlite3.Cursor) → None[source]¶
-
allennlp.semparse.contexts.text2sql_table_context.
update_grammar_with_tables
(grammar_dictionary: Dict[str, List[str]], schema: Dict[str, List[allennlp.data.dataset_readers.dataset_utils.text2sql_utils.TableColumn]]) → None[source]¶
-
allennlp.semparse.contexts.text2sql_table_context.
update_grammar_with_untyped_entities
(grammar_dictionary: Dict[str, List[str]]) → None[source]¶ Variables can be treated as numbers or strings if their type can be inferred - however, that can be difficult, so instead, we can just treat them all as values and be a bit looser on the typing we allow in our grammar. Here we just remove all references to number and string from the grammar, replacing them with value.
-
class
allennlp.semparse.contexts.sql_context_utils.
SqlVisitor
(grammar: parsimonious.grammar.Grammar, keywords_to_uppercase: List[str] = None)[source]¶ Bases:
parsimonious.nodes.NodeVisitor
SqlVisitor
performs a depth-first traversal of the the AST. It takes the parse tree and gives us an action sequence that resulted in that parse. Since the visitor has mutable state, we define a newSqlVisitor
for each query. To get the action sequence, we create aSqlVisitor
and call parse on it, which returns a list of actions. Ex.sql_visitor = SqlVisitor(grammar_string) action_sequence = sql_visitor.parse(query)
Importantly, this
SqlVisitor
skips overws
andwsp
nodes, because they do not hold any meaning, and make an action sequence much longer than it needs to be.- Parameters
- grammar
Grammar
A Grammar object that we use to parse the text.
- keywords_to_uppercase: ``List[str]``, optional, (default = None)
Keywords in the grammar to uppercase. In the case of sql, this might be SELECT, MAX etc.
- grammar
-
add_action
(self, node: parsimonious.nodes.Node) → None[source]¶ For each node, we accumulate the rules that generated its children in a list.
-
generic_visit
(self, node: parsimonious.nodes.Node, visited_children: List[NoneType]) → List[str][source]¶ Default visitor method
- Parameters
node – The node we’re visiting
visited_children – The results of visiting the children of that node, in a list
I’m not sure there’s an implementation of this that makes sense across all (or even most) use cases, so we leave it to subclasses to implement for now.
-
allennlp.semparse.contexts.sql_context_utils.
action_sequence_to_sql
(action_sequences: List[str]) → str[source]¶
-
allennlp.semparse.contexts.sql_context_utils.
format_action
(nonterminal: str, right_hand_side: str, is_string: bool = False, is_number: bool = False, keywords_to_uppercase: List[str] = None) → str[source]¶ This function formats an action as it appears in models. It splits productions based on the special ws and wsp rules, which are used in grammars to denote whitespace, and then rejoins these tokens a formatted, comma separated list. Importantly, note that it does not split on spaces in the grammar string, because these might not correspond to spaces in the language the grammar recognises.
- Parameters
- nonterminal
str
, required. The nonterminal in the action.
- right_hand_side
str
, required. The right hand side of the action (i.e the thing which is produced).
- is_string
bool
, optional (default = False). Whether the production produces a string. If it does, it is formatted as
nonterminal -> ['string']
- is_number
bool
, optional, (default = False). Whether the production produces a string. If it does, it is formatted as
nonterminal -> ['number']
- keywords_to_uppercase: ``List[str]``, optional, (default = None)
Keywords in the grammar to uppercase. In the case of sql, this might be SELECT, MAX etc.
- nonterminal
-
allennlp.semparse.contexts.sql_context_utils.
format_grammar_string
(grammar_dictionary: Dict[str, List[str]]) → str[source]¶ Formats a dictionary of production rules into the string format expected by the Parsimonious Grammar class.
-
allennlp.semparse.contexts.sql_context_utils.
initialize_valid_actions
(grammar: parsimonious.grammar.Grammar, keywords_to_uppercase: List[str] = None) → Dict[str, List[str]][source]¶ We initialize the valid actions with the global actions. These include the valid actions that result from the grammar and also those that result from the tables provided. The keys represent the nonterminals in the grammar and the values are lists of the valid actions of that nonterminal.
-
class
allennlp.semparse.contexts.quarel_utils.
WorldTaggerExtractor
(tagger_archive)[source]¶ Bases:
object
-
allennlp.semparse.contexts.quarel_utils.
align_entities
(extracted: List[str], literals: Dict[str, Any], stemmer: nltk.stem.porter.PorterStemmer) → List[str][source]¶ Use stemming to attempt alignment between extracted world and given world literals. If more words align to one world vs the other, it’s considered aligned.
-
allennlp.semparse.contexts.quarel_utils.
from_bio
(tags: List[str], target: str) → List[Tuple[int, int]][source]¶
-
allennlp.semparse.contexts.quarel_utils.
from_entity_cues_string
(cues_string: str) → Dict[str, List[str]][source]¶
-
allennlp.semparse.contexts.quarel_utils.
from_qr_spec_string
(qr_spec: str) → List[Dict[str, int]][source]¶
-
allennlp.semparse.contexts.quarel_utils.
get_explanation
(logical_form: str, world_extractions: Dict[str, Any], answer_index: int, world: allennlp.semparse.worlds.quarel_world.QuarelWorld) → List[Dict[str, Any]][source]¶ Create explanation (as a list of header/content entries) for an answer
-
allennlp.semparse.contexts.quarel_utils.
get_stem_overlaps
(query: str, references: List[str], stemmer: nltk.stem.porter.PorterStemmer) → List[int][source]¶
-
allennlp.semparse.contexts.quarel_utils.
group_worlds
(tags: List[str], tokens: List[str]) → Dict[str, List[str]][source]¶
-
allennlp.semparse.contexts.quarel_utils.
nl_triple
(triple: List[str], nl_world: Dict[str, Any]) → str[source]¶
-
allennlp.semparse.contexts.quarel_utils.
str_join
(string_or_list: Union[str, List[str]], joiner: str, prefixes: str = '', postfixes: str = '') → str[source]¶