Basic hypergraph operations¶
At the heart of Graphbrain lies the Semantic Hypergraph (SH). In practical terms, we will talk simply about hypergraphs, and we will treat them as a type of database, which contains a searchable collection of hyperedges.
Graphbrain provides abstractions to create, modify and search persistent hypergraph databases, as well as to define and manipulate hyperedges. In this section we introduce these basic operations, upon which all aspects of the library rely on.
The two central functions of Graphbrain: hgraph() and hedge()¶
The root namespace graphbrain
contains the two most fundamental functions of the library:
hgraph(locator_string)
, which creates/opens a persistent hypergraph.hedge(source)
, which creates a hyperedge from a string or a Python list or tuple.
In fact, the latter is implemented in graphbrain.hyperedge
, but it is imported to the root namespace by default for convenience. We will see that, with just these two functions, a lot can be achieved.
Creating and manipulating hyperedges¶
Graphbrain defines the object class Hyperedge
, which provides a variety of methods to work with hyperedges. The full interface of this class is described in the API reference. However, these objects are not meant to be instantiated directly. Instead, the hedge
function can be used to create such an object directly from a string representation conforming to the SH notation. For example:
from graphbrain import hedge
edge = hedge('(plays/P.so mary/C chess/C)')
In the above example, edge
is an instance of the Hyperedge
class. Hyerpedges are Python sequences. In fact, the class Hyperedge
is derived from tuple
, so it makes it possible to do things such as:
person = edge[1]
In this case, person
will be assigned the second element of the initial hyperedge, which happens to be the atom mary/C
. Range selector also work, but they do not automatically produce hyperedges, because subranges of the element of a semantic hyperedge are not guaranteed to be valid semantic hyperedges themselves. Instead, simple tulpes are returned. For example, edge[1:]
from the example is not a valid hyperedge. Nevertheless, such tuples of hyperedges are often useful:
>>> edge[1:]
(mary/C, chess/C)
>>> type(edge[1:])
<class 'tuple'>
It is possible to test a hyperedge for atomicity like this:
>>> edge.is_atom()
False
>>> person.is_atom()
True
Another frequently useful task if that of determining the type of a hyperedge:
>>> edge.type()
'R'
>>> edge[0].type()
'P'
>>> person.type()
'C'
Creating and populating hypergraphs¶
Graphbrain hypergraphs are created and/or opened like this:
from graphbrain import hgraph
hg = hgraph('example.hg')
The argument 'example.hg'
corresponds to a local directory in the filesystem, where the hypergraph is persisted. A full path can also be provided, e.g.: hgraph('users/alice/books.hg')
. The object returned by this function is of type Hypergraph
. Like Hyperedge
, it provides a number of general-purpose methods to work with hypergraphs and is not meant to be directly instantiated.
Graphbrain currently only provides one implementation of hypergraph database, which is based on a local filesytem-based high-performance key-value store (LevelDB). By convention, we use the .hg
extension to identify directories containing hypergraph databases of this type. In the future, we expect other hypergraph database types to be included in the library (for example, fully in-memory hypergraphs for even higher performance at relatively small sizes, or distributed hypergraphs for huge datasets, fault-tolerance, etc.). Graphbrain is an open source project, so contributions from people interested in developing such implementations are very welcome!
Adding hyperedges to a hypergraph is simple. For example, let us add the edge that was defined above:
hg.add(edge)
We can then check if this edge exists in the hypergraph:
>>> hg.exists(edge)
True
Notice that adding a hyperedge to a hypergraph implies the recursive addition of all of the elements of the hyperedge, so it is also the case that:
>>> hg.exists(edge[1])
True
A distinction is made with the notion of primary hyperedge. A primary hyperedge is, by default, one that was added directly, while the recursively added ones are considered non-primary. It is possible to check this:
>>> hg.is_primary(edge)
True
>>> hg.is_primary(edge[1])
False
It is possible to add an edge as non-primary:
>>> moon = hedge('(of/B.ma moon/C jupiter/C)')
>>> hg.add(moon, primary=False)
(of/B.ma moon/C jupiter/C)
>>> hg.is_primary(moon)
False
As with Hyperedge
, the full range of methods of Hypergraph
is documented in the API reference.
The neighborhood of a hyperedge (star)¶
Hypergraphs are fundamentally about relationships. In an analogous fashion to graphs/networks, the neighborhood of an entity (other entities that it is directly connected to) is a simple but powerful concept. With graphbrain, the star()
method provides one type of neighborhood that is particularly natural for hypergraphs and has wide applicability: it produces the set of hyperedges that contain a given hyperedge. For example, let us populate a hypergraph like this:
>>> hg.add(hedge('(of/B.ma moon/C jupiter/C)'))
>>> hg.add(hedge('(of/B.ma moon/C saturn/C)'))
Se let us obtain the star of the atom moon/C
:
>>> hg.star(hedge('moon/C'))
<generator object at 0x102382d30>
It returns a generator, allowing for the iteration through a very large number of hyperedges without exhausting memory. In this case, let us just convert the generator into a list to see the results:
>>> list(hg.star(hedge('moon/C')))
[(of/B.ma moon/C jupiter/C), (of/B.ma moon/C saturn/C)]
Let us combine several of the previous ideas to define a specific type of neighborhood: the set of hyperedges of type concept that are directly connected to moon/C
:
concepts = set()
for edge in hg.star(hedge('moon/C')):
for subedge in edge:
if edge.type() == 'C':
concepts.add(subedge)
The set concepts
will then contain: moon/C
, jupiter/C
, saturn/C
.
Hyperedges containing a given set of hyperedges¶
The hypergraph database provides a very efficient way to query for all hyperedges that include a given set of hyperedges, with the method edges_with_edges()
:
>>> hg.add('(plays/P mary/C chess/C)')
(plays/P mary/C chess/C)
>>> hg.add('(plays/P john/C chess/C)')
(plays/P john/C chess/C)
>>> hg.add('(plays/P alice/C handball/C)')
(plays/P alice/C handball/C)
>>> list(hg.edges_with_edges([hedge('plays/P'), hedge('chess/C')]))
[(plays/P john/C chess/C), (plays/P mary/C chess/C)]
An optional root
argument can be added, further requiring the matching edges to contain an atom with that root (at the top level):
>>> list(hg.edges_with_edges([hedge('plays/C'), hedge('chess/C')], root='john'))
[(plays/C john/C chess/C)]
Searching for hyperedges¶
Another fundamental way to query a hyperedge is by search patterns. Search patterns are templates that match hyperedges. Graphbrain provides a sophisticated pattern language that allows for semantically rich modes of matching. This will be discussed in greater detail in the next section. For now, let us just consider the wildcard *
, which matches any hyperedge (atomic or not). For example, the pattern (of/B.ma * *)
matches both of the previously defined hyperedges. The search()
method of Hypergraph
allows for search using these patterns. Like star()
, it returns a generator:
>>> list(hg.search('(of/B.ma * *)'))
[(of/B.ma moon/C jupiter/C), (of/B.ma moon/C saturn/C)]
Degrees and deep degrees¶
In conventional graph theory, there is the notion of the degree of a node, which is the number of other nodes that it is directly connected to. This is a simple but generally useful measure of the centrality of a node in the graph. In hypergraphs we can also have the same notion of degree, with the only difference that a single hyperedge can connect one entity to several others. Graphbrain keeps track of the degree of every hyperedge, and the Hypergraph
class provides a method to obtain it:
>>> from graphbrain import *
>>> hg = hgraph('example.hg')
>>> hg.degree('alice/C')
0
The degree of any hyperedge that does not exist in the hypergraph is 0. Notice also that degree()
, as well as many other Hypergraph
methods, conveniently accept the string representation of hyperedge, and transparently perform the conversion.
Let us add a few hyperedges and check the resulting degrees:
>>> hg.add('(in/B alice/C wonderland/C)')
(in/B alice/C wonderland/C)
>>> hg.degree('alice/C')
1
>>> hg.degree('(in/B alice/C wonderland/C)')
0
>>> hg.add('(reads/P john/C (in/B alice/C wonderland/C))')
(reads/P john/C (in/B alice/C wonderland/C))
>>> hg.degree('(in/B alice/C wonderland/C)')
1
>>> hg.add('(plays/P alice/C chess/C)')
(plays/P alice/C chess/C)
>>> hg.degree('alice/C')
2
Given that hyperedges can contain recursively contain other hyperedges, we can also consider the deep degree, which takes into account deep connections. For example, consider the edge (reads/P john/C (in/B alice/C wonderland/C))
. For the calculation of degrees, john/C
is not considered here to be connected to alice/C
, but such a connection is counter for the deep degree:
>>> hg.degree('alice/C')
2
>>> hg.deep_degree('alice/C')
3
Hyperedge attributes¶
The hypergraph database allows for the association of attributes to hyperedges. These can be strings, integer or floats, and are identified by a label. For example, one can associate a hyperedge to the text that it corresponds to:
>>> hg.set_attribute('(in/B alice/C wonderland/C)', 'text', 'Alice in Wonderland')
True
>>> hg.get_str_attribute('(in/B alice/C wonderland/C)', 'text')
'Alice in Wonderland'
Notice that the method set_attribute()
is used to set attributes of any type, but it is up to the programmer to choose the getter method according to the desired output type:
>>> hg.set_attribute('alice/C', 'age', 7)
True
>>> hg.get_int_attribute('alice/C', 'age')
7
>>> hg.set_attribute('alice/C', 'height', 1.2)
True
>>> hg.get_float_attribute('alice/C', 'height')
1.2
In fact, this is how degrees and deep degrees are stored, respectively in the attributes “d” and “dd”, so these attribute names should not be used for other purposes. The call hg.degree(edge)
is equivalent to hg.get_int_attribute(edge, 'd')
.
Integer attributes can also be incremented and decremented:
>>> hg.add('(red/M button/C)')
(red/M button/C)
>>> hg.set_attribute('(red/M button/C)', 'clicks', 0)
True
>>> hg.inc_attribute('(red/M button/C)', 'clicks')
True
>>> hg.get_int_attribute('(red/M button/C)', 'clicks')
1
>>> hg.dec_attribute('(red/M button/C)', 'clicks')
True
>>> hg.get_int_attribute('(red/M button/C)', 'clicks')
0
Local and global counters¶
Normally, when adding a hyperedge that already exists, nothing is changed. It is sometimes useful to count occurrences while adding hyperedges, and in this case the count=True
optional argument can be specified when calling add()
. This increments the count
integer argument of the hyperedge every time it is added:
>>> hg.add('(counting/P sheep/C)', count=True)
(counting/P sheep/C)
>>> hg.get_int_attribute('(counting/P sheep/C)', 'count')
1
>>> hg.add('(counting/P sheep/C)', count=True)
(counting/P sheep/C)
>>> hg.get_int_attribute('(counting/P sheep/C)', 'count')
2
The hypergraph database also provides the following global counters:
Hypergraph.atom_count()
: total number of atomsHypergraph.edge_count()
: total number of hyperedgesHypergraph.primary_atom_count()
: total number of primary atomsHypergraph.primary_edge_count()
: total number of primary hyperedges
Working with hyperedge sequences¶
The hypergraph database provides for a mechanism to organize hyerpedges into sequences. This is useful when storing hyperedges extracted from natural language sources where the order in which they appear can be relevant. For example, we might be interested in parsing every sentence in a book into a hyperedge and then being able to know which hyperedges correspond to the sentence that came before and after.
A hyperedge can be added to a given sequence in the hypergraph (identified by a string label), at a given position. For example:
>>> hg.add_to_sequence('sentences', 0, '(is/P this/C (the/M (first/M sentence/C)))')
(seq/P/. sentences 0 (is/P this/C (the/M (first/M sentence/C))))
The outer edge with the special predicate seq/P/.
assigns the hyperedge to the sequence “sentences” at position 0. Sequences are thus defines using hypergraphic semantics in this very simple way, and can be directly manipulated using basic hypergraph operations. The methods provided to work with them are just a convenience. There is no “safety net” to maintain consistency, it is possible to insert several hyperedges in the same position, or skip positions. This is meant to be a very simple and fast mechanism. The method sequences
returns a generator for all the sequences contained in the hypergraph:
>>> list(hg.sequences())
[sentences]
The method sequence
provides a generator for all the hyperedges contained in a given sequence, in order:
>>> hg.add_to_sequence('sentences', 1, '(is/P this/C (the/M (second/M sentence/C)))')
(seq/P/. sentences 1 (is/P this/C (the/M (second/M sentence/C))))
>>> hg.add_to_sequence('sentences', 2, '(is/P this/C (the/M (third/M sentence/C)))')
(seq/P/. sentences 2 (is/P this/C (the/M (third/M sentence/C))))
>>> list(hg.sequence('sentences'))
[(is/P this/C (the/M (first/M sentence/C))), (is/P this/C (the/M (second/M sentence/C))), (is/P this/C (the/M (third/M sentence/C)))]