Semantic Hypergraph notation

SH notation is based on two simple principles:

Every hyperedge belongs to one of eight basic types.
The first element of a hyperedge is a connector, followed by arguments that can be atomic or non-atomic hyperedges.

This is enough to define a valid hyperedge that conveys the meaning of a sentence in natural language, for example “The sky is blue”:

(is/P (the/M sky/C) blue/C)

Beyond this, the notation allows for more things: argument roles, subtypes, further type-specific additional information and namespaces. We will introduce these concepts gradually, by order of general usefulness. The two principles above extended with argument roles (which we will yet introduce) already capture a great deal of information contained in natural language, and sophisticated knowledge inference and exploration tasks can be performed at this level, while keeping the notation light and friendly to humans.

Then, the additional notational devices can be employed when useful. Of course, it is always possible to utilize full notation for machine tasks while presented a simplified version for human-friendliness.

Hyperedge types

All valid semantic hyperedges are of one of the 8 types shown in the table below. The first 6 types can be explicit (directly annotating an atomic hyepredge) of implicit (inferred from the types of the elements of the hyperedge). The last two types are always implicit.

Code	Type	Purpose	Example
Atomic or non-atomic
Atomic or non-atomic
C	concept	Define atomic concepts	apple/C
P	predicate	Build relations	(is/P berlin/C nice/C)
M	modifier	Modify any other hyperedge type, including itself	(red/M shoes/C)
M	modifier	Modify any other hyperedge type, including itself	(red/M shoes/C)
B	builder	Build concepts from concepts	(of/B capital/C germany/C)
T	trigger	Build specifications	(in/T 1994/C)
J	conjunction	Define sequences of hyperedges	(and/J meat/C potatoes/C)
J	conjunction	Define sequences of hyperedges	(and/J meat/C potatoes/C)
Non-atomic only
Non-atomic only
R	relation	Express facts, statements, questions, orders, …	(is/P berlin/C nice/C)
R	relation	Express facts, statements, questions, orders, …	(is/P berlin/C nice/C)
S	specifier	Relation specification (e.g. condition, time, …)	(in/T 1976/C)
S	specifier	Relation specification (e.g. condition, time, …)	(in/T 1976/C)

Type inference rules

The table below shows how implicit hyperedge types are inferred.

Element types	Resulting type
(M x)	x
(B C C+)	C
(T [CR])	S
(P [CRS]+)	P
(J x y+)	x

We use the notation of regular expressions: the symbol + is used to denote one or more entities with the type that precedes it, while square brackets indicate several possibilities (for instance, [CRS]+ means “at least one of any of C, R or S” types). x means any type: (M x) is of type x.

Argument roles

The type part of the atom can include subparts after the type specifier. The meaning of these subsequent subparts is type-specific. The most useful of those are argument roles in predicates and builders. As the name indicates, they specify the role that the following arguments play in the construct.

Predicates

When present, the first additional information subpart for predicates is used to specify the role played in a relation by each of its parameters, with the following codes:

s: subject
p: passive subject
a: agent
c: subject complement
o: direct object
i: indirect object
x: specification
t: parataxis
j: interjection
r: relative relation
?: undetermined

These codes are used to build strings, where each character corresponds to the parameter of the relation in the equivalent position. For example, consider the hyperedge:

(is/P.sc (the/M sky/C) blue/C)

The sc subpart indicates that the first parameter (“the sky”) plays the role of subject, and the second one (“blue”), plays the role of subject complement.

Builders

When present, the first additional information subpart for builders is used to distinguish the main concepts from the auxiliary ones, with the following codes:

m: main concept
a: auxiliary concept

These codes are used to build strings, where each character corresponds to the parameter of the builder in the equivalent position. For example, consider the hyperedge:

(of/B.ma founder/C psychoanalysis/C)

The ma subpart indicates that the first concept following the builder should be considered a main concept, and the next one auxiliary. This means that “founder of psychoanalysis” is a type of “founder”. In other words, auxiliary concepts serve the role of making the main ones more specific.

Subtypes

Subtypes are represented by a lowercase character following the main type code. They provide further distinctions, for example that a predicate is declarative (Pd), or that a concept is common (Cc), or that a modifier is a determinant (Md):

(is/Pd.sc (the/Md sky/Cc) blue/Cc)

Below we show possible subtypes for several main types.

Concept

Code	Subtype	Example
Cc	common	apple/Cc
Cp	proper	mary/Cp
Cn	number	27/Cn
Ci	pronoun	she/Ci
Cw	interrogative	who/Cw

Predicate

Code	Subtype	Example
Pd	declarative	is/Pd
P?	interrogative	is/P?
P!	imperative	go/P!

Builder

Code	Subtype	Example
Bp	possessive	‘s/Bp
Br	relational	in/Br

Modifier

Code	Subtype	Example
Ma	adjective	green/Ma
Mp	possessive	my/Mp
Md	determinant	the/Md
M#	number	100/M#
Mn	negation	not/Mn
Mv	verbal	will/Mv

Trigger

Code	Subtype	Example
T?	conditional	if/Tc
Tt	temporal	when/Tt
Tl	local	where/Tl
Tm	modal	modal/Tm
T>	causal	because/T>
T=	comparative	like/T=
Tc	concessive	although/Tc

Further type-specific additional information

Beyond argument roles, other forms of type-specific additional information are possible for the various types.

Concepts

When present, the first additional information subpart for concepts indicates number, with the following codes:

s: singular, example: apple/Cc.s
p: plural, example: apples/Cc.p

Predicates

Beyond argument roles, a second additional information subpart for predicates can be used to specify the features of the verb underlying the predicate. The following 7 features are specified:

tense: past (<), present (|) or future (>)
verb form: finite (f) or infinitive (i)
aspect: perfect (f) or progressive (g)
mood
person: first (1), second (2) or third (3)
number: singular (s) or plural (p)
verb type

A string is built in the above order to specify the verb features of a predicate. Any feature can be left unspecified, by using a dash character (-). For example, consider the hyperedge:

(is/P?.cs.|f–3s- (what/Mw time/Cc.s) it/Ci)

The predicate specifies four verb features: present tense (|), finite form (f), third person (3) and singular number (s).

Modifiers

When the modifer is verbal, the first additional information subpart can be used to specify the features of the underlying verb. The notation is exactly the same as the one used for predicates, but in predicates this corresponds to the second additional information subpart. For example, consider the non-atomic predicate:

(have/Mv.|f—– (been/Mv.<pf—- tracking/Pd.sox.|pg—-))

Namespaces

Namespaces serve two functions:

To identify the language or symbolic space to which an atom belongs;
To distinguish atoms that have different meanings, but would otherwise correspond to the exact same string.

In the first case, we can specify that an atom corresponds to an English word like this:

sky/Cp.s/en

Or to a German word like this:

himmel/Cp.s/de

Or that it is a special atom defined by Graphbrain:

+/B/.

In the second case, another subparts can be added to provide a distinction. For example, suppose we want to distinguish Cambridge (UK) from Cambridge (Mass., USA). We could use:

cambridge/Cp.s/en.1
cambridge/Cp.s/en.2

Full atom structure

We show here the full atom structure, including all optional parts.

Special atoms

The two special atoms below come predefined with Graphbrain and are very frequently useful.

Atom	Purpose	Example
+/B/.	Define compound nouns	(+/B.am/. alan/Cp.s turing/Cp.s)
:/J/.	Generic conjunction