allennlp.semparse.contexts

A KnowledgeGraph is a graphical representation of some structured knowledge source: say a table, figure or an explicit knowledge base.

class allennlp.semparse.contexts.knowledge_graph.KnowledgeGraph(entities: Set[str], neighbors: Dict[str, List[str]], entity_text: Dict[str, str] = None)[source]

Bases: object

A KnowledgeGraph represents a collection of entities and their relationships.

The KnowledgeGraph currently stores (untyped) neighborhood information and text representations of each entity (if there is any).

The knowledge base itself can be a table (like in WikitableQuestions), a figure (like in NLVR) or some other structured knowledge source. This abstract class needs to be inherited for implementing the functionality appropriate for a given KB.

All of the parameters listed below are stored as public attributes.

Parameters
entitiesSet[str]

The string identifiers of the entities in this knowledge graph. We sort this set and store it as a list. The sorting is so that we get a guaranteed consistent ordering across separate runs of the code.

neighborsDict[str, List[str]]

A mapping from string identifiers to other string identifiers, denoting which entities are neighbors in the graph.

entity_textDict[str, str]

If you have additional text associated with each entity (other than its string identifier), you can store that here. This might be, e.g., the text in a table cell, or the description of a wikipedia entity.

class allennlp.semparse.contexts.table_question_context.TableQuestionContext(table_data: List[Dict[str, Union[str, float, allennlp.semparse.common.date.Date]]], column_name_type_mapping: Dict[str, Set[str]], column_names: Set[str], question_tokens: List[allennlp.data.tokenizers.token.Token])[source]

Bases: object

Representation of table context similar to the one used by Memory Augmented Policy Optimization (MAPO, Liang et al., 2018). Most of the functionality is a reimplementation of https://github.com/crazydonkey200/neural-symbolic-machines/blob/master/table/wtq/preprocess.py for extracting entities from a question given a table and type its columns with <string> | <date> | <number>

get_entities_from_question(self) → Tuple[List[Tuple[str, str]], List[Tuple[str, int]]][source]
classmethod get_table_data_from_tagged_lines(lines: List[List[str]]) → Tuple[List[Dict[str, Dict[str, str]]], Dict[str, Set[str]]][source]
classmethod get_table_data_from_untagged_lines(lines: List[List[str]]) → Tuple[List[Dict[str, Dict[str, str]]], Dict[str, Set[str]]][source]

This method will be called only when we do not have tagged information from CoreNLP. That is, when we are running the parser on data outside the WikiTableQuestions dataset. We try to do the same processing that CoreNLP does for WTQ, but what we do here may not be as effective.

get_table_knowledge_graph(self) → allennlp.semparse.contexts.knowledge_graph.KnowledgeGraph[source]
static normalize_string(string: str) → str[source]

These are the transformation rules used to normalize cell in column names in Sempre. See edu.stanford.nlp.sempre.tables.StringNormalizationUtils.characterNormalize and edu.stanford.nlp.sempre.tables.TableTypeSystem.canonicalizeName. We reproduce those rules here to normalize and canonicalize cells and columns in the same way so that we can match them against constants in logical forms appropriately.

classmethod read_from_file(filename: str, question_tokens: List[allennlp.data.tokenizers.token.Token]) → 'TableQuestionContext'[source]
classmethod read_from_lines(lines: List, question_tokens: List[allennlp.data.tokenizers.token.Token]) → 'TableQuestionContext'[source]
allennlp.semparse.contexts.atis_tables.am_map_match_to_query_value(match: str)[source]
allennlp.semparse.contexts.atis_tables.convert_to_string_list_value_dict(trigger_dict: Dict[str, int]) → Dict[str, List[str]][source]
allennlp.semparse.contexts.atis_tables.digit_to_query_time(digit: str) → List[int][source]

Given a digit in the utterance, return a list of the times that it corresponds to.

allennlp.semparse.contexts.atis_tables.get_approximate_times(times: List[int]) → List[int][source]

Given a list of times that follow a word such as about, we return a list of times that could appear in the query as a result of this. For example if about 7pm appears in the utterance, then we also want to add 1830 and 1930.

allennlp.semparse.contexts.atis_tables.get_costs_from_utterance(utterance: str, tokenized_utterance: List[allennlp.data.tokenizers.token.Token]) → Dict[str, List[int]][source]
allennlp.semparse.contexts.atis_tables.get_date_from_utterance(tokenized_utterance: List[allennlp.data.tokenizers.token.Token], year: int = 1993) → List[datetime.datetime][source]

When the year is not explicitly mentioned in the utterance, the query assumes that it is 1993 so we do the same here. If there is no mention of the month or day then we do not return any dates from the utterance.

allennlp.semparse.contexts.atis_tables.get_flight_numbers_from_utterance(utterance: str, tokenized_utterance: List[allennlp.data.tokenizers.token.Token]) → Dict[str, List[int]][source]
allennlp.semparse.contexts.atis_tables.get_numbers_from_utterance(utterance: str, tokenized_utterance: List[allennlp.data.tokenizers.token.Token]) → Dict[str, List[int]][source]

Given an utterance, this function finds all the numbers that are in the action space. Since we need to keep track of linking scores, we represent the numbers as a dictionary, where the keys are the string representation of the number and the values are lists of the token indices that triggers that number.

allennlp.semparse.contexts.atis_tables.get_time_range_end_from_utterance(utterance: str, tokenized_utterance: List[allennlp.data.tokenizers.token.Token]) → Dict[str, List[int]][source]
allennlp.semparse.contexts.atis_tables.get_time_range_start_from_utterance(utterance: str, tokenized_utterance: List[allennlp.data.tokenizers.token.Token]) → Dict[str, List[int]][source]
allennlp.semparse.contexts.atis_tables.get_times_from_utterance(utterance: str, char_offset_to_token_index: Dict[int, int], indices_of_approximate_words: Set[int]) → Dict[str, List[int]][source]

Given an utterance, we get the numbers that correspond to times and convert them to values that may appear in the query. For example: convert 7pm to 1900.

allennlp.semparse.contexts.atis_tables.get_trigger_dict(trigger_lists: List[List[str]], trigger_dicts: List[Dict[str, List[str]]]) → Dict[str, List[str]][source]
allennlp.semparse.contexts.atis_tables.pm_map_match_to_query_value(match: str)[source]

An AtisSqlTableContext represents the SQL context in which an utterance appears for the Atis dataset, with the grammar and the valid actions.

class allennlp.semparse.contexts.atis_sql_table_context.AtisSqlTableContext(all_tables: Dict[str, List[str]] = None, tables_with_strings: Dict[str, List[str]] = None, database_file: str = None)[source]

Bases: object

An AtisSqlTableContext represents the SQL context with a grammar of SQL and the valid actions based on the schema of the tables that it represents.

Parameters
all_tables: ``Dict[str, List[str]]``

A dictionary representing the SQL tables in the dataset, the keys are the names of the tables that map to lists of the table’s column names.

tables_with_strings: ``Dict[str, List[str]]``

A dictionary representing the SQL tables that we want to generate strings for. The keys are the names of the tables that map to lists of the table’s column names.

database_filestr, optional

The directory to find the sqlite database file. We query the sqlite database to find the strings that are allowed.

create_grammar_dict_and_strings(self) → Tuple[Dict[str, List[str]], List[Tuple[str, str]]][source]
get_grammar_dictionary(self) → Dict[str, List[str]][source]
get_grammar_string(self)[source]
get_valid_actions(self) → Dict[str, List[str]][source]

A Text2SqlTableContext represents the SQL context in which an utterance appears for the any of the text2sql datasets, with the grammar and the valid actions.

allennlp.semparse.contexts.text2sql_table_context.update_grammar_numbers_and_strings_with_variables(grammar_dictionary: Dict[str, List[str]], prelinked_entities: Dict[str, Dict[str, str]], columns: Dict[str, allennlp.data.dataset_readers.dataset_utils.text2sql_utils.TableColumn]) → None[source]
allennlp.semparse.contexts.text2sql_table_context.update_grammar_to_be_variable_free(grammar_dictionary: Dict[str, List[str]])[source]

SQL is a predominately variable free language in terms of simple usage, in the sense that most queries do not create references to variables which are not already static tables in a dataset. However, it is possible to do this via derived tables. If we don’t require this functionality, we can tighten the grammar, because we don’t need to support aliased tables.

allennlp.semparse.contexts.text2sql_table_context.update_grammar_values_with_variables(grammar_dictionary: Dict[str, List[str]], prelinked_entities: Dict[str, Dict[str, str]]) → None[source]
allennlp.semparse.contexts.text2sql_table_context.update_grammar_with_global_values(grammar_dictionary: Dict[str, List[str]], dataset_name: str)[source]
allennlp.semparse.contexts.text2sql_table_context.update_grammar_with_table_values(grammar_dictionary: Dict[str, List[str]], schema: Dict[str, List[allennlp.data.dataset_readers.dataset_utils.text2sql_utils.TableColumn]], cursor: sqlite3.Cursor) → None[source]
allennlp.semparse.contexts.text2sql_table_context.update_grammar_with_tables(grammar_dictionary: Dict[str, List[str]], schema: Dict[str, List[allennlp.data.dataset_readers.dataset_utils.text2sql_utils.TableColumn]]) → None[source]
allennlp.semparse.contexts.text2sql_table_context.update_grammar_with_untyped_entities(grammar_dictionary: Dict[str, List[str]]) → None[source]

Variables can be treated as numbers or strings if their type can be inferred - however, that can be difficult, so instead, we can just treat them all as values and be a bit looser on the typing we allow in our grammar. Here we just remove all references to number and string from the grammar, replacing them with value.

class allennlp.semparse.contexts.sql_context_utils.SqlVisitor(grammar: parsimonious.grammar.Grammar, keywords_to_uppercase: List[str] = None)[source]

Bases: parsimonious.nodes.NodeVisitor

SqlVisitor performs a depth-first traversal of the the AST. It takes the parse tree and gives us an action sequence that resulted in that parse. Since the visitor has mutable state, we define a new SqlVisitor for each query. To get the action sequence, we create a SqlVisitor and call parse on it, which returns a list of actions. Ex.

sql_visitor = SqlVisitor(grammar_string) action_sequence = sql_visitor.parse(query)

Importantly, this SqlVisitor skips over ws and wsp nodes, because they do not hold any meaning, and make an action sequence much longer than it needs to be.

Parameters
grammarGrammar

A Grammar object that we use to parse the text.

keywords_to_uppercase: ``List[str]``, optional, (default = None)

Keywords in the grammar to uppercase. In the case of sql, this might be SELECT, MAX etc.

add_action(self, node: parsimonious.nodes.Node) → None[source]

For each node, we accumulate the rules that generated its children in a list.

generic_visit(self, node: parsimonious.nodes.Node, visited_children: List[NoneType]) → List[str][source]

Default visitor method

Parameters
  • node – The node we’re visiting

  • visited_children – The results of visiting the children of that node, in a list

I’m not sure there’s an implementation of this that makes sense across all (or even most) use cases, so we leave it to subclasses to implement for now.

visit(self, node)[source]

See the NodeVisitor visit method. This just changes the order in which we visit nonterminals from right to left to left to right.

allennlp.semparse.contexts.sql_context_utils.action_sequence_to_sql(action_sequences: List[str]) → str[source]
allennlp.semparse.contexts.sql_context_utils.format_action(nonterminal: str, right_hand_side: str, is_string: bool = False, is_number: bool = False, keywords_to_uppercase: List[str] = None) → str[source]

This function formats an action as it appears in models. It splits productions based on the special ws and wsp rules, which are used in grammars to denote whitespace, and then rejoins these tokens a formatted, comma separated list. Importantly, note that it does not split on spaces in the grammar string, because these might not correspond to spaces in the language the grammar recognises.

Parameters
nonterminalstr, required.

The nonterminal in the action.

right_hand_sidestr, required.

The right hand side of the action (i.e the thing which is produced).

is_stringbool, optional (default = False).

Whether the production produces a string. If it does, it is formatted as nonterminal -> ['string']

is_numberbool, optional, (default = False).

Whether the production produces a string. If it does, it is formatted as nonterminal -> ['number']

keywords_to_uppercase: ``List[str]``, optional, (default = None)

Keywords in the grammar to uppercase. In the case of sql, this might be SELECT, MAX etc.

allennlp.semparse.contexts.sql_context_utils.format_grammar_string(grammar_dictionary: Dict[str, List[str]]) → str[source]

Formats a dictionary of production rules into the string format expected by the Parsimonious Grammar class.

allennlp.semparse.contexts.sql_context_utils.initialize_valid_actions(grammar: parsimonious.grammar.Grammar, keywords_to_uppercase: List[str] = None) → Dict[str, List[str]][source]

We initialize the valid actions with the global actions. These include the valid actions that result from the grammar and also those that result from the tables provided. The keys represent the nonterminals in the grammar and the values are lists of the valid actions of that nonterminal.

class allennlp.semparse.contexts.quarel_utils.WorldTaggerExtractor(tagger_archive)[source]

Bases: object

get_world_entities(self, question: str, tokenized_question: List[allennlp.data.tokenizers.token.Token] = None) → Dict[str, List[str]][source]
allennlp.semparse.contexts.quarel_utils.align_entities(extracted: List[str], literals: Dict[str, Any], stemmer: nltk.stem.porter.PorterStemmer) → List[str][source]

Use stemming to attempt alignment between extracted world and given world literals. If more words align to one world vs the other, it’s considered aligned.

allennlp.semparse.contexts.quarel_utils.delete_duplicates(expr: List) → List[source]
allennlp.semparse.contexts.quarel_utils.from_bio(tags: List[str], target: str) → List[Tuple[int, int]][source]
allennlp.semparse.contexts.quarel_utils.from_entity_cues_string(cues_string: str) → Dict[str, List[str]][source]
allennlp.semparse.contexts.quarel_utils.from_qr_spec_string(qr_spec: str) → List[Dict[str, int]][source]
allennlp.semparse.contexts.quarel_utils.get_explanation(logical_form: str, world_extractions: Dict[str, Any], answer_index: int, world: allennlp.semparse.worlds.quarel_world.QuarelWorld) → List[Dict[str, Any]][source]

Create explanation (as a list of header/content entries) for an answer

allennlp.semparse.contexts.quarel_utils.get_stem_overlaps(query: str, references: List[str], stemmer: nltk.stem.porter.PorterStemmer) → List[int][source]
allennlp.semparse.contexts.quarel_utils.get_words(string: str) → List[str][source]
allennlp.semparse.contexts.quarel_utils.group_worlds(tags: List[str], tokens: List[str]) → Dict[str, List[str]][source]
allennlp.semparse.contexts.quarel_utils.nl_arg(arg: Any, nl_world: Dict[str, Any]) → Any[source]
allennlp.semparse.contexts.quarel_utils.nl_attr(attr: str) → str[source]
allennlp.semparse.contexts.quarel_utils.nl_dir(sign: int) → str[source]
allennlp.semparse.contexts.quarel_utils.nl_triple(triple: List[str], nl_world: Dict[str, Any]) → str[source]
allennlp.semparse.contexts.quarel_utils.nl_world_string(world: List[str]) → str[source]
allennlp.semparse.contexts.quarel_utils.split_question(question: str) → List[str][source]
allennlp.semparse.contexts.quarel_utils.str_join(string_or_list: Union[str, List[str]], joiner: str, prefixes: str = '', postfixes: str = '') → str[source]
allennlp.semparse.contexts.quarel_utils.strip_entity_type(entity: str) → str[source]
allennlp.semparse.contexts.quarel_utils.to_camel(string: str) → str[source]
allennlp.semparse.contexts.quarel_utils.to_qr_spec_string(qr_coeff_sets: List[Dict[str, int]]) → str[source]
allennlp.semparse.contexts.quarel_utils.words_from_entity_string(entity: str) → str[source]