Command-line interface
Graphbrain provides a command-line interface that can be used to execute a variety of tasks. You can access it either by using python to run the graphbrain root module as a script:
python -m graphbrain ...
or using the provided command:
graphbrain ...
All cases below work with both.
Here’s an overview of the interface:
usage: graphbrain [-h] [--col COL] [--corefs] [--hg HG]
[--indir INDIR] [--infile INFILE] [--lang LANG]
[--outfile OUTFILE] [--parser PARSER]
[--sequence SEQUENCE] [--url URL]
command
positional arguments:
command command to execute
optional arguments:
-h, --help show this help message and exit
--col table column
--corefs perform coreference resolution
--hg HG hypergraph db
--indir INDIR input directory
--infile INFILE input file
--lang LANG language
--outfile OUTFILE output file
--parser PARSER parser
--sequence SEQUENCE sequence name
--url URL url
The only obligatory argument, command, is used to specify the task to perform. Each command uses a subset of the optional arguments. Presented below are the details for each command.
Commands
create
Creates an empty hypergraph database:
graphbrain --hg <hypergraph_database> create
export
Exports a hypergraph database to a JSON file:
graphbrain --hg <hypergraph_database> --outfile <json_file> export
import
Imports a hypergraph database from a JSON file:
graphbrain --hg <hypergraph_database> --infile <json_file> import
txt
Takes a text file as input and converts each one of its sentences to hyperedges, adding them to the hypergraph.
This is a very simple but also useful, general-purpose reader/parser.
wikipedia
Takes a wikipedia URL as input, extracts the contents and converts each one of its sentences to hyperedges, adding them to the hypergraph.
reddit
Takes a Reddit JSON corpus as input and converts each one of thread titles, and optionally thread comments to hyperedges, adding them to the hypergraph. Titles and comments are attributed to authors.
taxonomy
Derives a taxonomy from concepts defined with builders of modifiers. For example, (of/Br.ma founder/Cc.s psychoanalysis/Cc.s)
is a type of founder/Cc.s
, so the following hyperedge is added:
(type_of/P/. (of/Br.ma founder/Cc.s psychoanalysis/Cc.s) founder/Cc.s)
Or, if we consider modifier-defined concepts such as (black/Ma cat/Cc.s)
:
(type_of/P/. (black/Ma cat/Cc.s) cat/Cc.s)
names
Performs coreference resolution for compound proper name concepts, for example detecting that “Barack Obama” and “Obama” refer to the same person but “Michelle Obama” refers to someone else).
onto
Depends on: taxonomy
Performs coreference resolution based on probabilistic reasoning over taxonomies. For example, detecting that “United States” and “United States of America” refer to the same entity.
actors
Depends on: coreference resolution
We define actors as specific entities that are capable of acting in some sense. This simple command identifies hyperedges corresponding to actor by applying the following criteria:
The hyperedge or one of its coreferences appears at least two times as the subject of a declarative relation
The hyperedge is of type concept and subtype proper concept
If coreferences are used, the hyperedge is the main coreference
This command transverses the entire hypergraph to identify actors, and then adds hyperedges like the following:
(actor/P/. mary/Cp.s/en)
The above simply means that mary/Cp.s/en
was identified as an actor.
claims
Depends on: coreference resolution
Identifies hyperedges that represent a claim. Claims are sentences such as: “North Korea says it’s not afraid of US military strike”. The claim is that “North Korea is not afraid of US military strike” and the author of the claim is “North Korea”.
More specifically, claims are detected according to the following criteria:
Hyperedge is a relation with predicate of type
Pd
.The deep predicate atom of the predicate hyperedge has a lemma belonging to a predetermined lists of verb lemmas that denote a claim (e.g.: “say”, “claim”).
The hyperedge has a subject and a clausal complement. The first is used to identify the actor making the claim, the second the claim itself.
Claim relations follow the format:
(claim/P/. *actor* *claim* *edge*)
Furthermore, simple anaphora resolution on the claim is performed (e.g. in “Pink Panther says that she loves pink.”, the hyperedge for “she” is replaced with the hyperedge for “Pink Panther” in the claim). In these cases, pronouns are used to guess gender or nature of actors. Actors can be classified as female:
(female/P/. *actor*)
Or as a group:
(group/P/. *actor*)
Or as male:
(male/P/. *actor*)
Or as non-human:
(non-human/P/. *actor*)
conflicts
Depends on: coreference resolution
Identifies hyperedges that represent a conflict. Conflicts are sentences such as: “Germany warns Russia against military engagement in Syria”. The source of the expression of conflict here is “Germany”, the target is “Russia” and the topic is “military engagement in Syria”.
More specifically, claims are detected according to the following criteria:
Hyperedge is a relation with predicate of type
Pd
.The deep predicate atom of the predicate hyperedge has a lemma belonging to a predetermined lists of verb lemmas that denote an expression of conflict (e.g.: “warn”, “kill”).
The hyperedge has a subject and an object. The first is used to identify the actor originating the expression of conflict and the second the actor which is the target of this expression.
[optional] Beyond subject and object, if any specifier arguments are present, and their trigger atoms belong to a predetermined list (e.g. “over”, “against”), then topics of conflict are extracted from these specifiers.
Conflict relations follow the format:
(conflict/P/. *actor_orig* *actor_targ* *edge*)
These conflict relations are connected to their topics by further relations with the format:
(conflict-topic/P/. *actor_orig* *actor_targ* *concept* *edge*)