2

Introducing the ArangoDB-DGL Adapter

 2 years ago
source link: https://www.arangodb.com/2022/01/introducing-the-arangodb-dgl-adapter/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

ArangoDB DGL Adapter Getting Started Guide

arangodbdgl_logo.png

Version: 1.0.2

Objective: Export Graphs from ArangoDB, a multi-model Graph Database, to Deep Graph Library (DGL), a python package for graph neural networks, and vice-versa.

Setup

In [1]:
%%capture
!git clone -b oasis_connector --single-branch https://github.com/arangodb/interactive_tutorials.git
!git clone -b 1.0.2 --single-branch https://github.com/arangoml/dgl-adapter.git
!rsync -av interactive_tutorials/ ./ --exclude=.git
!pip3 install adbdgl_adapter==1.0.2
!pip3 install matplotlib
!pip3 install pyArango
!pip3 install networkx ## For drawing purposes 
In [2]:
import json
import oasis
import matplotlib.pyplot as plt

import dgl
import torch
import networkx as nx

from dgl import remove_self_loop
from dgl.data import KarateClubDataset
from dgl.data import MiniGCDataset

from adbdgl_adapter.adapter import ADBDGL_Adapter
from adbdgl_adapter.controller import ADBDGL_Controller
from adbdgl_adapter.typings import Json, ArangoMetagraph, DGLCanonicalEType, DGLDataDict
DGL backend not selected or invalid.  Assuming PyTorch for now.
Setting the default backend to "pytorch". You can change it in the ~/.dgl/config.json file or export the DGLBACKEND environment variable.  Valid options are: pytorch, mxnet, tensorflow (all lowercase)
Using backend: pytorch

Understanding DGL

(referenced from docs.dgl.ai)

Deep Graph Library (DGL) is a Python package built for easy implementation of graph neural network model family, on top of existing DL frameworks (currently supporting PyTorch, MXNet and TensorFlow).

DGL represents a directed graph as a DGLGraph object. You can construct a graph by specifying the number of nodes in the graph as well as the list of source and destination nodes. Nodes in the graph have consecutive IDs starting from 0.

The following code constructs a directed "star" homogeneous graph with 6 nodes and 5 edges.

In [3]:
# A homogeneous graph with 6 nodes, and 5 edges
g = dgl.graph(([0, 0, 0, 0, 0], [1, 2, 3, 4, 5]))
print(g)

# Print the graph's canonical edge types
print("\nCanonical Edge Types: ", g.canonical_etypes)
# [('_N', '_E', '_N')]
# '_N' being the only Node type
# '_E' being the only Edge type
Graph(num_nodes=6, num_edges=5,
      ndata_schemes={}
      edata_schemes={})

Canonical Edge Types:  [('_N', '_E', '_N')]

In DGL, a heterogeneous graph (heterograph for short) is specified with a series of graphs as below, one per relation. Each relation is a string triplet (source node type, edge type, destination node type). Since relations disambiguate the edge types, DGL calls them canonical edge types:

In [4]:
# A heterogeneous graph with 8 nodes, and 7 edges
g = dgl.heterograph({
    ('user', 'follows', 'user'): (torch.tensor([0, 1]), torch.tensor([1, 2])),
    ('user', 'follows', 'game'): (torch.tensor([0, 1, 2]), torch.tensor([1, 2, 3])),
    ('user', 'plays', 'game'): (torch.tensor([1, 3]), torch.tensor([2, 3]))
})

print(g)
print("\nCanonical Edge Types: ", g.canonical_etypes)
print("\nNode Types: ", g.ntypes)
print("\nEdge Types: ", g.etypes)
Graph(num_nodes={'game': 4, 'user': 4},
      num_edges={('user', 'follows', 'game'): 3, ('user', 'follows', 'user'): 2, ('user', 'plays', 'game'): 2},
      metagraph=[('user', 'game', 'follows'), ('user', 'game', 'plays'), ('user', 'user', 'follows')])

Canonical Edge Types:  [('user', 'follows', 'game'), ('user', 'follows', 'user'), ('user', 'plays', 'game')]

Node Types:  ['game', 'user']

Edge Types:  ['follows', 'follows', 'plays']

Many graph data contain attributes on nodes and edges. Although the types of node and edge attributes can be arbitrary in real world, DGLGraph only accepts attributes stored in tensors (with numerical contents). Consequently, an attribute of all the nodes or edges must have the same shape. In the context of deep learning, those attributes are often called features.

You can assign and retrieve node and edge features via ndata and edata interface.

In [5]:
# A homogeneous graph with 6 nodes, and 5 edges
g = dgl.graph(([0, 0, 0, 0, 0], [1, 2, 3, 4, 5]))

# Assign an integer value for each node.
g.ndata['x'] = torch.tensor([151, 124, 41, 89, 76, 55])
# Assign a 4-dimensional edge feature vector for each edge.
g.edata['a'] = torch.randn(5, 4)

print(g)
print("\nNode Data X attribute: ", g.ndata['x'])
print("\nEdge Data A attribute: ", g.edata['a'])


# NOTE: The following line ndata insertion will fail, since not all nodes have been assigned an attribute value
# g.ndata['bad_attribute'] = torch.tensor([0,10,20,30,40])
Graph(num_nodes=6, num_edges=5,
      ndata_schemes={'x': Scheme(shape=(), dtype=torch.int64)}
      edata_schemes={'a': Scheme(shape=(4,), dtype=torch.float32)})

Node Data X attribute:  tensor([151, 124,  41,  89,  76,  55])

Edge Data A attribute:  tensor([[-0.9712,  0.3131, -1.7787, -0.4953],
        [ 1.5366, -0.8591, -1.4719,  0.5857],
        [-0.5803,  0.6757,  0.9276, -0.9756],
        [ 0.4396,  1.0612,  0.0943,  0.6856],
        [-0.8685, -1.3693, -0.1184, -1.0903]])

When multiple node/edge types are introduced, users need to specify the particular node/edge type when invoking a DGLGraph API for type-specific information. In addition, nodes/edges of different types have separate IDs.

In [6]:
g = dgl.heterograph({
    ('user', 'follows', 'user'): (torch.tensor([0, 1]), torch.tensor([1, 2])),
    ('user', 'follows', 'game'): (torch.tensor([0, 1, 2]), torch.tensor([1, 2, 3])),
    ('user', 'plays', 'game'): (torch.tensor([1, 3]), torch.tensor([2, 3]))
})

# Get the number of all nodes in the graph
print("All nodes: ", g.num_nodes())

# Get the number of user nodes
print("User nodes: ", g.num_nodes('user'))

# Nodes of different types have separate IDs,
# hence not well-defined without a type specified
# print(g.nodes())
#DGLError: Node type name must be specified if there are more than one node types.

print(g.nodes('user'))
All nodes:  8
User nodes:  4
tensor([0, 1, 2, 3])

To set/get features for a specific node/edge type, DGL provides two new types of syntax – g.nodes[‘node_type’].data[‘feat_name’] and g.edges[‘edge_type’].data[‘feat_name’].

Note: If the graph only has one node/edge type, there is no need to specify the node/edge type.

In [7]:
g = dgl.heterograph({
    ('user', 'follows', 'user'): (torch.tensor([0, 1]), torch.tensor([1, 2])),
    ('user', 'follows', 'game'): (torch.tensor([0, 1, 2]), torch.tensor([1, 2, 3])),
    ('user', 'plays', 'game'): (torch.tensor([1, 3]), torch.tensor([2, 3]))
})

g.nodes['user'].data['age'] = torch.tensor([21, 16, 38, 64])
# An alternative (yet equivalent) syntax:
# g.ndata['age'] = {'user': torch.tensor([21, 16, 38, 64])}

print(g.ndata)
defaultdict(<class 'dict'>, {'age': {'user': tensor([21, 16, 38, 64])}})

For more info, visit https://docs.dgl.ai/en/0.6.x/.

Create a Temporary ArangoDB Instance

In [8]:
# Request temporary instance from the managed ArangoDB Cloud Oasis.
con = oasis.getTempCredentials()

# Connect to the db via the python-arango driver
db = oasis.connect_python_arango(con)

print('\n--------------------')
print("https://{}:{}".format(con["hostname"], con["port"]))
print("Username: " + con["username"])
print("Password: " + con["password"])
print("Database: " + con["dbName"])
print('--------------------\n')
Requesting new temp credentials.
Temp database ready to use.

--------------------
https://tutorials.arangodb.cloud:8529
Username: TUT487i8kal98gb73c2iklds
Password: TUTn5t85w8t50kcupmo2mmyb
Database: TUTn187e39v9qho3768ilyk4
--------------------

Feel free to use to above URL to checkout the UI!

Data Import

For demo purposes, we will be using the ArangoDB Fraud Detection example graph.

In [9]:
%%capture
!chmod -R 755 ./tools
!./tools/arangorestore -c none --server.endpoint http+ssl://{con["hostname"]}:{con["port"]} --server.username {con["username"]} --server.database {con["dbName"]} --server.password {con["password"]} --replication-factor 3  --input-directory "dgl-adapter/examples/data/fraud_dump" --include-system-collections true

Instantiate the Adapter

Connect the ArangoDB-DGL Adapter to our temporary ArangoDB cluster:

In [10]:
adbdgl_adapter = ADBDGL_Adapter(con)
Connecting to https://tutorials.arangodb.cloud:8529

ArangoDB to DGL

Via ArangoDB Graph

Data source

  • ArangoDB Fraud-Detection Graph

Package methods used

Important notes

  • The name parameter in this case must point to an existing ArangoDB graph in your ArangoDB instance.
In [11]:
# Define graph name
graph_name = "fraud-detection"

# Create DGL graph from ArangoDB graph
dgl_g = adbdgl_adapter.arangodb_graph_to_dgl(graph_name)

# You can also provide valid Python-Arango AQL query options to the command above, like such:
# dgl_g = aadbdgl_adapter.arangodb_graph_to_dgl(graph_name, ttl=1000, stream=True)
# See more here: https://docs.python-arango.com/en/main/specs.html#arango.aql.AQL.execute

# Show graph data
print('\n--------------------')
print(dgl_g)
print(dgl_g.ntypes)
print(dgl_g.etypes)
DGL: fraud-detection created

--------------------
Graph(num_nodes={'account': 54, 'customer': 17},
      num_edges={('account', 'accountHolder', 'customer'): 54, ('account', 'transaction', 'account'): 62},
      metagraph=[('account', 'customer', 'accountHolder'), ('account', 'account', 'transaction')])
['account', 'customer']
['accountHolder', 'transaction']

Via ArangoDB Collections

Data source

  • ArangoDB Fraud-Detection Collections

Package methods used

Important notes

  • The name parameter in this case is simply for naming your DGL graph.
  • The vertex_collections & edge_collections parameters must point to existing ArangoDB collections within your ArangoDB instance.
In [12]:
# Define collection
vertex_collections = {"account", "Class", "customer"}
edge_collections = {"accountHolder", "Relationship", "transaction"}

# Create DGL from ArangoDB collections
dgl_g = adbdgl_adapter.arangodb_collections_to_dgl("fraud-detection", vertex_collections, edge_collections)

# You can also provide valid Python-Arango AQL query options to the command above, like such:
# dgl_g = adbdgl_adapter.arangodb_collections_to_dgl("fraud-detection", vertex_collections, edge_collections, ttl=1000, stream=True)
# See more here: https://docs.python-arango.com/en/main/specs.html#arango.aql.AQL.execute

# Show graph data
print('\n--------------------')
print(dgl_g)
print(dgl_g.ntypes)
print(dgl_g.etypes)
DGL: fraud-detection created

--------------------
Graph(num_nodes={'Class': 4, 'account': 54, 'customer': 17},
      num_edges={('Class', 'Relationship', 'Class'): 4, ('account', 'accountHolder', 'customer'): 54, ('account', 'transaction', 'account'): 62},
      metagraph=[('Class', 'Class', 'Relationship'), ('account', 'customer', 'accountHolder'), ('account', 'account', 'transaction')])
['Class', 'account', 'customer']
['Relationship', 'accountHolder', 'transaction']

Via ArangoDB Metagraph

Data source

  • ArangoDB Fraud-Detection Collections

Package methods used

Important notes

  • The name parameter in this case is simply for naming your DGL graph.
  • The metagraph parameter should contain collections & associated document attributes names that exist within your ArangoDB instance.
In [13]:
# Define Metagraph
fraud_detection_metagraph = {
    "vertexCollections": {
        "account": {"rank", "Balance", "customer_id"},
        "Class": {"concrete"},
        "customer": {"rank"},
    },
    "edgeCollections": {
        "accountHolder": {},
        "Relationship": {},
        "transaction": {"receiver_bank_id", "sender_bank_id", "transaction_amt"},
    },
}

# Create DGL Graph from attributes
dgl_g = adbdgl_adapter.arangodb_to_dgl('FraudDetection',  fraud_detection_metagraph)

# You can also provide valid Python-Arango AQL query options to the command above, like such:
# dgl_g = adbdgl_adapter.arangodb_to_dgl(graph_name = 'FraudDetection',  fraud_detection_metagraph, ttl=1000, stream=True)
# See more here: https://docs.python-arango.com/en/main/specs.html#arango.aql.AQL.execute

# Show graph data
print('\n--------------')
print(dgl_g)
print('\n--------------')
print(dgl_g.ndata)
print('--------------\n')
print(dgl_g.edata)
DGL: FraudDetection created

--------------
Graph(num_nodes={'Class': 4, 'account': 54, 'customer': 17},
      num_edges={('Class', 'Relationship', 'Class'): 4, ('account', 'accountHolder', 'customer'): 54, ('account', 'transaction', 'account'): 62},
      metagraph=[('Class', 'Class', 'Relationship'), ('account', 'customer', 'accountHolder'), ('account', 'account', 'transaction')])

--------------
defaultdict(<class 'dict'>, {'concrete': {'Class': tensor([True, True, True, True])}, 'Balance': {'account': tensor([5331, 7630, 1433, 2201, 4837, 5817, 1689, 1042, 4104,   10, 2338,   10,
        3779,    0,  529,    0, 1992, 2912, 6367, 1819,    0,  221, 5062, 2372,
         841, 5393, 1138, 8414, 4064, 5686, 6294, 6540, 7358, 3452,    0, 3993,
          10,    0,  471, 8148, 5832, 1758, 1747, 1679, 6789, 1599, 8320,    0,
          10, 8626, 7199, 8644, 3879,   10])}, 'customer_id': {'account': tensor([10000009, 10000004, 10000004, 10000010, 10000002, 10000011, 10000015,
        10000006, 10000010,    10810, 10000002, 10000014, 10000008,        0,
        10000002,        0, 10000008, 10000006, 10000012, 10000015, 10000001,
        10000010, 10000015, 10000005, 10000009, 10000008, 10000011, 10000014,
        10000010, 10000006, 10000002, 10000007, 10000006, 10000005,        0,
        10000010,    10810,        0, 10000009, 10000006, 10000002, 10000005,
        10000009, 10000012, 10000007, 10000002, 10000014,        0,    10810,
        10000016, 10000006, 10000016, 10000013,    10810])}, 'rank': {'account': tensor([0.0021, 0.0031, 0.0052, 0.0021, 0.0046, 0.0037, 0.0032, 0.0042, 0.0021,
        0.0021, 0.0030, 0.0037, 0.0040, 0.0037, 0.0021, 0.0046, 0.0040, 0.0030,
        0.0026, 0.0032, 0.0021, 0.0034, 0.0032, 0.0021, 0.0021, 0.0035, 0.0026,
        0.0026, 0.0046, 0.0021, 0.0021, 0.0035, 0.0036, 0.0036, 0.0038, 0.0055,
        0.0021, 0.0041, 0.0044, 0.0021, 0.0030, 0.0035, 0.0033, 0.0026, 0.0071,
        0.0036, 0.0032, 0.0059, 0.0021, 0.0090, 0.0057, 0.0032, 0.0026, 0.0021]), 'customer': tensor([0.0135, 0.0050, 0.0062, 0.0066, 0.0096, 0.0088, 0.0089, 0.0047, 0.0066,
        0.0045, 0.0062, 0.0103, 0.0081, 0.0039, 0.0054, 0.0044, 0.0093])}})
--------------

defaultdict(<class 'dict'>, {'sender_bank_id': {('account', 'transaction', 'account'): tensor([10000000003, 10000000002, 10000000001, 10000000001, 10000000002,
        10000000003, 10000000003, 10000000002, 10000000002, 10000000003,
        10000000001, 10000000001,           0, 10000000003, 10000000003,
                  0, 10000000002,           0, 10000000001, 10000000003,
        10000000001, 10000000003, 10000000002,           0, 10000000003,
        10000000003, 10000000003, 10000000003, 10000000001, 10000000001,
        10000000002, 10000000001, 10000000003, 10000000003, 10000000001,
        10000000001,           0, 10000000003, 10000000002, 10000000001,
        10000000002, 10000000003, 10000000003, 10000000003, 10000000002,
        10000000003, 10000000002, 10000000003, 10000000002, 10000000001,
        10000000001,           0, 10000000003, 10000000003,           0,
        10000000003, 10000000003, 10000000001, 10000000001, 10000000003,
        10000000003, 10000000002])}, 'receiver_bank_id': {('account', 'transaction', 'account'): tensor([10000000003, 10000000003, 10000000001, 10000000002, 10000000002,
        10000000003, 10000000001, 10000000003, 10000000001, 10000000003,
        10000000002, 10000000003,           0, 10000000003, 10000000003,
                  0, 10000000001,           0, 10000000002, 10000000003,
        10000000003, 10000000003, 10000000001,           0, 10000000003,
        10000000002, 10000000003, 10000000003, 10000000001, 10000000001,
        10000000003, 10000000003, 10000000003, 10000000003, 10000000001,
        10000000002,           0, 10000000001, 10000000001, 10000000002,
        10000000001, 10000000003, 10000000003, 10000000003, 10000000001,
        10000000003, 10000000002, 10000000003, 10000000002, 10000000001,
        10000000003,           0, 10000000003, 10000000003,           0,
        10000000003, 10000000002, 10000000002, 10000000001, 10000000003,
        10000000003, 10000000003])}, 'transaction_amt': {('account', 'transaction', 'account'): tensor([9000,  299,  498,  954,  756,  627,  142,  946,  920, 9000,  421,  343,
        9000,  457, 9000, 9000,   53, 9000,  284,  120,  441, 9000,  364,  901,
        9000,  279, 9000, 9000,  273,  127,  952,  354,  795, 9000,  835,  761,
        9000,  478,  172,  804,  665,  995, 9000, 9000,  670, 9000,  340, 9000,
         747,  347,   52,  911,  762, 9000,    0,  790,  619,  491,  954, 9000,
        9000,  843])}})

Via ArangoDB Metagraph with a custom controller

Data source

  • ArangoDB Fraud-Detection Collections

Package methods used

Important notes

  • The name parameter in this case is simply for naming your DGL graph.
  • The metagraph parameter should contain collections & associated document attributes names that exist within your ArangoDB instance.
  • We are creating a custom ADBDGL_Controller to specify how to convert our ArangoDB vertex/edge attributes into DGL node/edge features. View the default ADBDGL_Controller here.
In [14]:
# Define Metagraph
fraud_detection_metagraph = {
    "vertexCollections": {
        "account": {"rank"},
        "Class": {"concrete", "name"},
        "customer": {"Sex", "Ssn", "rank"},
    },
    "edgeCollections": {
        "accountHolder": {},
        "Relationship": {},
        "transaction": {"receiver_bank_id", "sender_bank_id", "transaction_amt", "transaction_date", "trans_time"},
    },
}

# When converting to DGL via an ArangoDB Metagraph that contains non-numerical values, a user-defined 
# Controller class is required to specify how ArangoDB attributes should be converted to DGL features.
class FraudDetection_ADBDGL_Controller(ADBDGL_Controller):
    """ArangoDB-DGL controller.

    Responsible for controlling how ArangoDB attributes
    are converted into DGL features, and vice-versa.

    You can derive your own custom ADBDGL_Controller if you want to maintain
    consistency between your ArangoDB attributes & your DGL features.
    """

    def _adb_attribute_to_dgl_feature(self, key: str, col: str, val):
        """
        Given an ArangoDB attribute key, its assigned value (for an arbitrary document),
        and the collection it belongs to, convert it to a valid
        DGL feature: https://docs.dgl.ai/en/0.6.x/guide/graph-feature.html.

        NOTE: You must override this function if you want to transfer non-numerical
        ArangoDB attributes to DGL (DGL only accepts 'attributes' (a.k.a features)
        of numerical types). Read more about DGL features here:
        https://docs.dgl.ai/en/0.6.x/new-tutorial/2_dglgraph.html#assigning-node-and-edge-features-to-graph.

        :param key: The ArangoDB attribute key name
        :type key: str
        :param col: The ArangoDB collection of the ArangoDB document.
        :type col: str
        :param val: The assigned attribute value of the ArangoDB document.
        :type val: Any
        :return: The attribute's representation as a DGL Feature
        :rtype: Any
        """
        try:
          if col == "transaction":
            if key == "transaction_date":
              return int(str(val).replace("-", ""))
    
            if key == "trans_time":
              return int(str(val).replace(":", ""))
    
          if col == "customer":
            if key == "Sex":
              return 0 if val == "M" else 1

            if key == "Ssn":
              return int(str(val).replace("-", ""))

          if col == "Class":
            if key == "name":
              if val == "Bank":
                return 0
              elif val == "Branch":
                return 1
              elif val == "Account":
                return 2
              elif val == "Customer":
                return 3
              else:
                return -1
        except (ValueError, TypeError, SyntaxError):
          return 0

        return super()._adb_attribute_to_dgl_feature(key, col, val)

fraud_adbdgl_adapter = ADBDGL_Adapter(con, FraudDetection_ADBDGL_Controller())

# Create DGL Graph from attributes
dgl_g = fraud_adbdgl_adapter.arangodb_to_dgl('FraudDetection',  fraud_detection_metagraph)

# You can also provide valid Python-Arango AQL query options to the command above, like such:
# dgl_g = fraud_adbdgl_adapter.arangodb_to_dgl(graph_name = 'FraudDetection',  fraud_detection_metagraph, ttl=1000, stream=True)
# See more here: https://docs.python-arango.com/en/main/specs.html#arango.aql.AQL.execute

# Show graph data
print('\n--------------')
print(dgl_g)
print('\n--------------')
print(dgl_g.ndata)
print('--------------\n')
print(dgl_g.edata)
Connecting to https://tutorials.arangodb.cloud:8529
DGL: FraudDetection created

--------------
Graph(num_nodes={'Class': 4, 'account': 54, 'customer': 17},
      num_edges={('Class', 'Relationship', 'Class'): 4, ('account', 'accountHolder', 'customer'): 54, ('account', 'transaction', 'account'): 62},
      metagraph=[('Class', 'Class', 'Relationship'), ('account', 'customer', 'accountHolder'), ('account', 'account', 'transaction')])

--------------
defaultdict(<class 'dict'>, {'concrete': {'Class': tensor([True, True, True, True])}, 'name': {'Class': tensor([0, 1, 2, 3])}, 'rank': {'account': tensor([0.0021, 0.0031, 0.0052, 0.0021, 0.0046, 0.0037, 0.0032, 0.0042, 0.0021,
        0.0021, 0.0030, 0.0037, 0.0040, 0.0037, 0.0021, 0.0046, 0.0040, 0.0030,
        0.0026, 0.0032, 0.0021, 0.0034, 0.0032, 0.0021, 0.0021, 0.0035, 0.0026,
        0.0026, 0.0046, 0.0021, 0.0021, 0.0035, 0.0036, 0.0036, 0.0038, 0.0055,
        0.0021, 0.0041, 0.0044, 0.0021, 0.0030, 0.0035, 0.0033, 0.0026, 0.0071,
        0.0036, 0.0032, 0.0059, 0.0021, 0.0090, 0.0057, 0.0032, 0.0026, 0.0021]), 'customer': tensor([0.0135, 0.0050, 0.0062, 0.0066, 0.0096, 0.0088, 0.0089, 0.0047, 0.0066,
        0.0045, 0.0062, 0.0103, 0.0081, 0.0039, 0.0054, 0.0044, 0.0093])}, 'Ssn': {'customer': tensor([123456786, 123456780, 123456780, 123456787, 123456780, 123456789,
        123456780, 123456785, 123456783, 123456784, 123456780, 123456788,
        123456782, 123456781, 123456780, 123456780, 111223333])}, 'Sex': {'customer': tensor([1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1])}})
--------------

defaultdict(<class 'dict'>, {'sender_bank_id': {('account', 'transaction', 'account'): tensor([10000000003, 10000000002, 10000000001, 10000000001, 10000000002,
        10000000003, 10000000003, 10000000002, 10000000002, 10000000003,
        10000000001, 10000000001,           0, 10000000003, 10000000003,
                  0, 10000000002,           0, 10000000001, 10000000003,
        10000000001, 10000000003, 10000000002,           0, 10000000003,
        10000000003, 10000000003, 10000000003, 10000000001, 10000000001,
        10000000002, 10000000001, 10000000003, 10000000003, 10000000001,
        10000000001,           0, 10000000003, 10000000002, 10000000001,
        10000000002, 10000000003, 10000000003, 10000000003, 10000000002,
        10000000003, 10000000002, 10000000003, 10000000002, 10000000001,
        10000000001,           0, 10000000003, 10000000003,           0,
        10000000003, 10000000003, 10000000001, 10000000001, 10000000003,
        10000000003, 10000000002])}, 'receiver_bank_id': {('account', 'transaction', 'account'): tensor([10000000003, 10000000003, 10000000001, 10000000002, 10000000002,
        10000000003, 10000000001, 10000000003, 10000000001, 10000000003,
        10000000002, 10000000003,           0, 10000000003, 10000000003,
                  0, 10000000001,           0, 10000000002, 10000000003,
        10000000003, 10000000003, 10000000001,           0, 10000000003,
        10000000002, 10000000003, 10000000003, 10000000001, 10000000001,
        10000000003, 10000000003, 10000000003, 10000000003, 10000000001,
        10000000002,           0, 10000000001, 10000000001, 10000000002,
        10000000001, 10000000003, 10000000003, 10000000003, 10000000001,
        10000000003, 10000000002, 10000000003, 10000000002, 10000000001,
        10000000003,           0, 10000000003, 10000000003,           0,
        10000000003, 10000000002, 10000000002, 10000000001, 10000000003,
        10000000003, 10000000003])}, 'transaction_date': {('account', 'transaction', 'account'): tensor([  201966,   201721,  2017528,  2018924,  2017516,  2018128,  2019213,
          201847,  2017914,   201966,  2017810, 20181020,        0,  2017724,
          201966,        0,  2019311,        0,  2018211,  2018125,   201932,
          201966,   201795,        0,   201966,  2017111,   201966,   201966,
         2019822,  2017317,  2019124,  2017121,  2017110,   201966,  2017717,
        20181012,        0, 20181023,  2019724,  2019611,  2019928,  2019117,
          201966,   201966,  2017328,   201966,  2019316,   201966,  2017914,
         2017521,   201713,        0,  2018124,   201966,        0,   201784,
          201713, 20171212,  2019413,   201966,   201966,   201887])}, 'trans_time': {('account', 'transaction', 'account'): tensor([1136, 1516, 1340, 1030, 1552, 1116, 1450,  924, 1046, 1426, 1247, 1459,
           0, 1459, 1258,    0, 1758,    0, 1230, 1210, 1252, 1039, 1741,    0,
        1420, 1713, 1710, 1028, 1636, 1054, 1658, 1332, 1316,  955, 1629, 1642,
           0, 1710,  932, 1652, 1018, 1527, 1555, 1640, 1158, 1035, 1015, 1133,
        1320, 1514, 1213,    0, 1133, 1340,    0, 1026, 1312, 1027, 1745, 1342,
        1520, 1141])}, 'transaction_amt': {('account', 'transaction', 'account'): tensor([9000,  299,  498,  954,  756,  627,  142,  946,  920, 9000,  421,  343,
        9000,  457, 9000, 9000,   53, 9000,  284,  120,  441, 9000,  364,  901,
        9000,  279, 9000, 9000,  273,  127,  952,  354,  795, 9000,  835,  761,
        9000,  478,  172,  804,  665,  995, 9000, 9000,  670, 9000,  340, 9000,
         747,  347,   52,  911,  762, 9000,    0,  790,  619,  491,  954, 9000,
        9000,  843])}})

DGL to ArangoDB

Example 1: DGL Karate Graph

Data source

Package methods used

Important notes

  • The name parameter in this case is simply for naming your ArangoDB graph.
In [15]:
# Load the dgl graph & draw
dgl_karate_graph = KarateClubDataset()[0]
nx.draw(dgl_karate_graph.to_networkx(), with_labels=True)

# Create the ArangoDB graph
name = "Karate"
db.delete_graph(name, drop_collections=True, ignore_missing=True)
adb_karate_graph = adbdgl_adapter.dgl_to_arangodb(name, dgl_karate_graph)

print('\n--------------------')
print("https://{}:{}".format(con["hostname"], con["port"]))
print("Username: " + con["username"])
print("Password: " + con["password"])
print("Database: " + con["dbName"])
print('--------------------\n')
print(f"Inspect the graph here: https://tutorials.arangodb.cloud:8529/_db/{con['dbName']}/_admin/aardvark/index.html#graph/{name}\n")
print(f"View the original graph below:")
ArangoDB: Karate created

--------------------
https://tutorials.arangodb.cloud:8529
Username: TUT487i8kal98gb73c2iklds
Password: TUTn5t85w8t50kcupmo2mmyb
Database: TUTn187e39v9qho3768ilyk4
--------------------

Inspect the graph here: https://tutorials.arangodb.cloud:8529/_db/TUTn187e39v9qho3768ilyk4/_admin/aardvark/index.html#graph/Karate

View the original graph below:

Example 2: DGL MiniGCDataset Graphs

Data source

Package methods used

Important notes

  • The name parameters in this case are simply for naming your ArangoDB graph.
In [16]:
# Load the dgl graphs & draw
dgl_lollipop_graph = remove_self_loop(MiniGCDataset(8, 7, 8)[3][0])
plt.figure(1)
nx.draw(dgl_lollipop_graph.to_networkx(), with_labels=True)

dgl_hypercube_graph = remove_self_loop(MiniGCDataset(8, 8, 9)[4][0])
plt.figure(2)
nx.draw(dgl_hypercube_graph.to_networkx(), with_labels=True)

dgl_clique_graph = remove_self_loop(MiniGCDataset(8, 6, 7)[6][0])
plt.figure(3)
nx.draw(dgl_clique_graph.to_networkx(), with_labels=True)

# Create the ArangoDB graphs
lollipop = "Lollipop"
hypercube = "Hypercube"
clique = "Clique"

db.delete_graph(lollipop, drop_collections=True, ignore_missing=True)
db.delete_graph(hypercube, drop_collections=True, ignore_missing=True)
db.delete_graph(clique, drop_collections=True, ignore_missing=True)

adb_lollipop_graph = adbdgl_adapter.dgl_to_arangodb(lollipop, dgl_lollipop_graph)
adb_hypercube_graph = adbdgl_adapter.dgl_to_arangodb(hypercube, dgl_hypercube_graph)
adb_clique_graph = adbdgl_adapter.dgl_to_arangodb(clique, dgl_clique_graph)

print('\n--------------------')
print("https://{}:{}".format(con["hostname"], con["port"]))
print("Username: " + con["username"])
print("Password: " + con["password"])
print("Database: " + con["dbName"])
print('--------------------\n')
print("\nInspect the graphs here:\n")
print(f"1) https://tutorials.arangodb.cloud:8529/_db/{con['dbName']}/_admin/aardvark/index.html#graph/{lollipop}")
print(f"2) https://tutorials.arangodb.cloud:8529/_db/{con['dbName']}/_admin/aardvark/index.html#graph/{hypercube}")
print(f"3) https://tutorials.arangodb.cloud:8529/_db/{con['dbName']}/_admin/aardvark/index.html#graph/{clique}\n")
print(f"\nView the original graphs below:")
ArangoDB: Lollipop created
ArangoDB: Hypercube created
ArangoDB: Clique created

--------------------
https://tutorials.arangodb.cloud:8529
Username: TUT487i8kal98gb73c2iklds
Password: TUTn5t85w8t50kcupmo2mmyb
Database: TUTn187e39v9qho3768ilyk4
--------------------


Inspect the graphs here:

1) https://tutorials.arangodb.cloud:8529/_db/TUTn187e39v9qho3768ilyk4/_admin/aardvark/index.html#graph/Lollipop
2) https://tutorials.arangodb.cloud:8529/_db/TUTn187e39v9qho3768ilyk4/_admin/aardvark/index.html#graph/Hypercube
3) https://tutorials.arangodb.cloud:8529/_db/TUTn187e39v9qho3768ilyk4/_admin/aardvark/index.html#graph/Clique


View the original graphs below:

Example 3: DGL MiniGCDataset Graphs with a custom controller

Data source

Package methods used

Important notes

  • The name parameters in this case are simply for naming your ArangoDB graph.
  • We are creating a custom ADBDGL_Controller to specify how to convert our DGL node/edge features into ArangoDB vertex/edge attributes. View the default ADBDGL_Controller here.
In [17]:
from torch.functional import Tensor

# Load the dgl graphs
dgl_lollipop_graph = remove_self_loop(MiniGCDataset(8, 7, 8)[3][0])
dgl_hypercube_graph = remove_self_loop(MiniGCDataset(8, 8, 9)[4][0])
dgl_clique_graph = remove_self_loop(MiniGCDataset(8, 6, 7)[6][0])

 # Add DGL Node & Edge Features to each graph
dgl_lollipop_graph.ndata["random_ndata"] = torch.tensor(
    [[i, i, i] for i in range(0, dgl_lollipop_graph.num_nodes())]
)
dgl_lollipop_graph.edata["random_edata"] = torch.rand(dgl_lollipop_graph.num_edges())

dgl_hypercube_graph.ndata["random_ndata"] = torch.rand(dgl_hypercube_graph.num_nodes())
dgl_hypercube_graph.edata["random_edata"] = torch.tensor(
    [[[i], [i], [i]] for i in range(0, dgl_hypercube_graph.num_edges())]
)

dgl_clique_graph.ndata['clique_ndata'] = torch.tensor([1,2,3,4,5,6])
dgl_clique_graph.edata['clique_edata'] = torch.tensor(
    [1 if i % 2 == 0 else 0 for i in range(0, dgl_clique_graph.num_edges())]
)


# When converting to ArangoDB from DGL, a user-defined Controller class
# is required to specify how DGL features (aka attributes) should be converted 
# into ArangoDB attributes. NOTE: A custom Controller is NOT needed you want to
# keep the numerical-based values of your DGL features.
class Clique_ADBDGL_Controller(ADBDGL_Controller):
    """ArangoDB-DGL controller.

    Responsible for controlling how ArangoDB attributes
    are converted into DGL features, and vice-versa.

    You can derive your own custom ADBDGL_Controller if you want to maintain
    consistency between your ArangoDB attributes & your DGL features.
    """

    def _dgl_feature_to_adb_attribute(self, key: str, col: str, val: Tensor):
        """
        Given a DGL feature key, its assigned value (for an arbitrary node or edge),
        and the collection it belongs to, convert it to a valid ArangoDB attribute
        (e.g string, list, number, ...).

        NOTE: No action is needed here if you want to keep the numerical-based values
        of your DGL features.

        :param key: The DGL attribute key name
        :type key: str
        :param col: The ArangoDB collection of the (soon-to-be) ArangoDB document.
        :type col: str
        :param val: The assigned attribute value of the DGL node.
        :type val: Tensor
        :return: The feature's representation as an ArangoDB Attribute
        :rtype: Any
        """
        if key == "clique_ndata":
          if val == 1:
            return "one is fun"
          elif val == 2:
            return "two is blue"
          elif val == 3:
            return "three is free"
          elif val == 4:
            return "four is more"
          else: # No special string for values 5 & 6
            return f"ERROR! Unrecognized value, got {val}"

        if key == "clique_edata":
          return bool(val)

        return super()._dgl_feature_to_adb_attribute(key, col, val)

# Re-instantiate a new adapter specifically for the Clique Graph Conversion
clique_adbgl_adapter = ADBDGL_Adapter(con, Clique_ADBDGL_Controller())

# Create the ArangoDB graphs
lollipop = "Lollipop_With_Attributes"
hypercube = "Hypercube_With_Attributes"
clique = "Clique_With_Attributes"

db.delete_graph(lollipop, drop_collections=True, ignore_missing=True)
db.delete_graph(hypercube, drop_collections=True, ignore_missing=True)
db.delete_graph(clique, drop_collections=True, ignore_missing=True)

adb_lollipop_graph = adbdgl_adapter.dgl_to_arangodb(lollipop, dgl_lollipop_graph)
adb_hypercube_graph = adbdgl_adapter.dgl_to_arangodb(hypercube, dgl_hypercube_graph)
adb_clique_graph = clique_adbgl_adapter.dgl_to_arangodb(clique, dgl_clique_graph) # Notice the new adapter here!

print('\n--------------------')
print("https://{}:{}".format(con["hostname"], con["port"]))
print("Username: " + con["username"])
print("Password: " + con["password"])
print("Database: " + con["dbName"])
print('--------------------\n')
print("\nInspect the graphs here:\n")
print(f"1) https://tutorials.arangodb.cloud:8529/_db/{con['dbName']}/_admin/aardvark/index.html#graph/{lollipop}")
print(f"2) https://tutorials.arangodb.cloud:8529/_db/{con['dbName']}/_admin/aardvark/index.html#graph/{hypercube}")
print(f"3) https://tutorials.arangodb.cloud:8529/_db/{con['dbName']}/_admin/aardvark/index.html#graph/{clique}\n")
Connecting to https://tutorials.arangodb.cloud:8529
ArangoDB: Lollipop_With_Attributes created
ArangoDB: Hypercube_With_Attributes created
ArangoDB: Clique_With_Attributes created

--------------------
https://tutorials.arangodb.cloud:8529
Username: TUT487i8kal98gb73c2iklds
Password: TUTn5t85w8t50kcupmo2mmyb
Database: TUTn187e39v9qho3768ilyk4
--------------------


Inspect the graphs here:

1) https://tutorials.arangodb.cloud:8529/_db/TUTn187e39v9qho3768ilyk4/_admin/aardvark/index.html#graph/Lollipop_With_Attributes
2) https://tutorials.arangodb.cloud:8529/_db/TUTn187e39v9qho3768ilyk4/_admin/aardvark/index.html#graph/Hypercube_With_Attributes
3) https://tutorials.arangodb.cloud:8529/_db/TUTn187e39v9qho3768ilyk4/_admin/aardvark/index.html#graph/Clique_With_Attributes


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK