Ebran Grim Save

grim brings property graphs to the Nim language. Look around you: everything is a graph!

Project README

grim

grim brings property graphs to Nim

Grim brings the labeled property graph (LPG) data structure to the Nim language. This storage model is used in the Neo4j database and consists of labeled Nodes and Edges. The data itself is stored in key/value-pairs on these entities.

grim

News: See the Changelog.

  • [2021-01-03] v0.3.1 released!
  • [2021-01-01] v0.3.0 released!
  • [2020-02-07] v0.2.0 released!

Using grim | Documentation and API | Running the tests | Built with | Contributing | Authors | License

Using grim

Grim is provided with the Nimble package manager.

Prerequisites

Install the Nim compiler; see nim-lang for instructions.

Installing

Use nimble to install grim on your local machine:

nimble install grim

Use grim in a project by adding

requires "grim"

to its .nimble file.

Documentation and API

Basic | Iteration | Loading/saving | Graph-building DSL | Neo4j | Paths | Navigating paths

The grim API is quite user-friendly. Check out usage examples in the tests/ folder. The Northwind tutorial shows how to translate a relational SQL model (of the sales in a small company) to a labeled property graph.

The grim documentation is hosted on Github.

Basic

Create a new graph:

import grim

var g = newGraph("my graph")
echo g.name    
# => "my graph"
doAssert g.numberOfNodes == 0 and g.numberOfEdges == 0

Add nodes to the graph:

let 
  c1 = g.addNode("Country", %(name: "Sweden", GDP_per_capita: 53208.0))
  c2 = g.addNode("Country", %(name: "Norway", GDP_per_capita: 74536.0))
  c3 = g.addNode("Country", %(name: "Germany", GDP_per_capita: 52559.0))

  o1 = g.addNode("Organization", %(name: "NATO", founded: 1949, active: true))
  o2 = g.addNode("Organization", %(name: "EU", founded: 1993, active: true))
  o3 = g.addNode("Organization", %(name: "Nordic Council", founded: 1952))

Note:

  • The %operator converts tuples to key/value properties which are stored as heterogeneous data tables on the LPG.
  • addNode and addEdge return the oid for the entity (node or edge). The oid is a unique (string) identifier that can be set explicitly with the oid argument to addNode and addEdge, or is auto-generated if the oid argument is omitted.

Here is how data is extracted from the graph:

doAssert g.node(c1).label == "Country"
doAssert g.node(c1)["name"] == "Sweden"

Note:

  • The oids are used to identify the node.

Add edges to the graph:

let
  e1 = g.addEdge(c1, o2, "MEMBER_OF", %(since: 1995))
  e2 = g.addEdge(c1, o3, "MEMBER_OF", %(since: 1952))
  e3 = g.addEdge(c2, o3, "MEMBER_OF", %(since: 1952, membership: "full"))

Note:

  • The oids are used to identify the nodes and create the new edges (labeled "MEMBER_OF") with key/value properties.
  • Since no oid argument is given to addEdge, the oid is auto-generated (and returned in the e1, e2, and e3 variables).

Iteration

All nodes:

for node in g.nodes:
  echo node.label, ": ", node     

Nodes with a certain label:

for node in g.nodes("Country"):
  echo node.label, ": ", node

Nodes with filter (node is a special variable available when filtering nodes):

for node in g.nodes("Person", node.name == "Alice" and node.age < 22):
  echo node.label, ": ", node

All edges:

for edge in g.edges:
  echo edge.label, ": ", edge

All edges with a certain label:

for edge in g.edges("MEMBER_OF"):
  echo edge.label, ": ", edge

Edges with filter (edge is a special variable available when filtering edges):

for node in g.edges("MEMBER_OF", edge.since > 2010):
  echo node.label, ": ", node

All edges between two nodes:

for edge in g.edgesBetween(c1, o3):
  echo edge.label, ": ", edge

Over neighbor nodes:

for node in g.neighbors(c1):
  echo node.label, ": ", node

Note:

  • The graph is directional so addEdge(A, B, "LABEL") adds an edge with label "LABEL" pointing from A to B.
  • All iterators take a direction argument that specifies whether to include outgoing (A->B), incoming (A<-B) or both (A<->B) edges/neighbors.
  • The direction is specified with the enum values Direction.Out, Direction.In, and Direction.OutIn.

Loading and saving graphs

Graph structures can be loaded and saved in YAML format with the NimYAML library. The procs loadYaml and saveYaml can be used (there are examples in the tests/ folder).

import grim

var g = loadYaml("example.yaml")    # Load graph from YAML file
g.saveYaml("example2.yaml")         # Save a copy of the file

DSL for building graphs

A small DSL is provided to reduce boilerplate when building graphs. A toy example:

import grim/dsl

graph g "Some people":
  nodes:
    Person:
      "a nice guy":
        name: "Santa Claus"
        age: 108
      "a smart girl":
        name: "Jane Austen"
        wealth: 10304.3
  edges:
    "a nice guy" -> "a smart girl":
      DELIVERS:
        category: "writing material"
        value: 204

This will expose the graph in the remainder of the code as the mutable variable g. This example shows how to access graph data:

let
  p1 = g.node("a nice guy")
  p2 = g.node("a smart girl")

doAssert p1.label == "Character" and p2.label == "Character"
doAssert p1["name"].getStr == "Santa Claus"
doAssert p1["age"].getInt == 108
doAssert p2["name"].getStr == "Jane Austen"
doAssert p2["wealth"].getFloat == 10304.3

for e in g.edgesBetween("a nice guy", "a smart girl"):
  doAssert e.label == "DELIVERS"
  doAssert e["category"].getStr == "writing material"
  doAssert e["value"].getInt == 204

Communicating with a Neo4j database

The neo4j submodule is used to communicate with a Neo4j database. Data is transferred via Neo4j's http REST API since the bolt protocol is not supported at present.

import grim/[neo4j, utils]

The utils module provides the getEnvOrRaise proc, which reads an evironment variable or raises a runtime error when the variable is not defined.

let
    username = getEnvOrRaise("NEO4J_USERNAME")
    password = getEnvOrRaise("NEO4J_PASSWORD")
    hostname = getEnvOrRaise("NEO4J_HOSTNAME")

The contents of NEO4J_USERNAME and NEO4J_PASSWORD are self-explanatory, and the NEO4J_HOSTNAME contains the address to the database on the form mydatabase.com (or simply localhost if you are running a local instance).

Start the client and dump the database as a grim LPG:

  var
    client = initNeo4jClient(hostname, auth = (username, password))
    g = client.dump("my graph")
  
  echo g.describe

Paths

A path in the graph is defined by a sequence of continuous edges (members), which link together a number of nodes. The path can be walked (or traversed) by iterating from the beginning of the path. The paths starts at an anchor node.

var p = newPath(myNode)

p is now an empty path starting at myNode. We can now start building the path by repeatedly adding members to the path.

p = p.add(myFirstEdge).add(mySecondEdge).add(myThirdEdge)

The add proc returns the path and can therefore be chained. Note that paths and members are ref objects so to create a copy of a path we need to use the explicit copy function

var p2 = p.copy

Walk the path by iterating

for edge in p:
  echo edge
# myFirstEdge, mySecondEdge, myThirdEdge

Get the first, last, and n:th member by convenience functions:

echo p.first    # myFirstEdge
echo p.last     # myThirdEdge
echo p.nth(1)   # mySecondEdge (zero-indexed)

The first two are O(1) operations, but the last is O(n) and slower on long paths.

The real power of paths emerge when navigating path via patterns. This is an efficient method for simple traversal of similar paths in the graph. The paths can be scanned and modified in a single sweep.

Let's start our example by building a graph:

import grim/dsl

graph g "People":
  nodes:
    Person:
      "alice":
        name: "Alice"
      "bob":
        name: "Bob"
      "charlie":
        name: "Charlie"
  edges:
    "alice" -> "bob":
      KNOWS
    "bob" -> "charlie"
      KNOWS
    "bob" -> "charlie"
      KNOWS  

The graph is Alice -KNOWS-> Bob =KNOWS=> Charlie ,where -> and => denotes single and double edges, respectively. Let's say we want to navigate this graph by the pattern Person-KNOWS-Person. There are three such paths of length 1 (containing one member): one is Alice-Bob (which is a single edge) and the other two are Bob-Charlie (which is a double edge).

We start navigating with the graph's navigate proc:

var pc = g.navigate("Person")    # PathCollection

The navigation starts at an anchor node ("Person"). The result is a PathCollection, which is exactly what it sounds like: A collection of paths matching the given pattern. In other words, the navigate constructs a PathCollection with empty paths anchored at nodes matching the label Person. We can iterate over the matched paths in the PathCollection:

for path in pc:
  echo path, path.anchor["name"]
# ("Empty Path", "Alice"), ("Empty Path", "Bob"), ("Empty Path", "Charlie")
# The matching order is not guaranteed.

Let's now expand the navigation to the pattern matched by taking the step:

pc = pc.step("KNOWS", "Person")

With the help of the anchors, we have now matched all paths fulfilling the pattern Person-KNOWS-Person. Each step taken when navigating returns a modified copy of the PathCollection, encouraging sequential steps to be chained:

var pc = g
  .navigate("Person")
  .step("KNOWS", "Person")
  .step("KNOWS", "Person")

In fact, this pattern is so common that there is a convenience function for repeating a number of identical steps:

var pc = g
  .navigate("Person")
  .steps("KNOWS", "Person", 2)

This navigation will search the graph for motifs of the kind Person-KNOWS->Person-KNOWS->Person, i.e., friends of friends. Note that the match is exhaustive, i.e.,

var pc = g
  .navigate("Person")
  .step("KNOWS", "Person", 3)

will return no matches (i.e., a path collection of empty paths anchored at "Person" labels).

There is one more important path navigation function, namely follow. This will match all patterns of variable length until there are no more matching paths. In our example,

var pc = g
  .navigate("Person")
  .follow("KNOWS", "Person")

will match Person-KNOWS-Person, Person-KNOWS-Person-KNOWS-Person, etc. In our graph, we expect matches:

  • One 1-step path Alice-KNOWS-Bob.
  • Two 1-step paths Bob-KNOWS-Charlie (because of two edges).
  • Two 2-step paths Alice-KNOWS-Bob-KNOWS-Charlie (because of two edges between Bob and Charlie).

After matching patterns, we can simply iterate over the paths in the collection:

for path in pc:
  echo path, path.anchor
# Three 1-step paths, two 2-step paths
# Anchors: Alice (1-step path), Bob (1-step path), Bob (1-step path), Alice (2-step path), Alice (2-step path).

Running the tests

The unit tests can be run with nimble, they test basic usage of the package such as creating and modifying graphs, the DSL, and loading/saving graphs as YAML.

nimble test

Built With

  • Nim - The Nim programming language.
  • NimYAML - A pure YAML implementation for Nim.

Contributing

I'll be happy to accept and review PRs and discuss other things on submitted to Issues.

Authors

  • Erik G. Brandt - Original author - ebran

License

This project is licensed under the MIT License.

Open Source Agenda is not affiliated with "Ebran Grim" Project. README Source: ebran/grim
Stars
52
Open Issues
4
Last Commit
3 years ago
Repository

Open Source Agenda Badge

Open Source Agenda Rating