ArangoDB PyG Adapter Getting Started Guide¶

Version: 1.0.0

Objective: Export Graphs from ArangoDB, the multi-model database for graph & beyond, to PyTorch Geometric (PyG), a python package for graph neural networks, and vice-versa.

Setup¶

In [1]:

%%capture
!pip install torch
!pip install adbpyg-adapter==1.0.0
!pip install adb-cloud-connector
!git clone -b 1.0.0 --single-branch https://github.com/arangoml/pyg-adapter.git

## For drawing purposes 
!pip install matplotlib
!pip install networkx

In [2]:

# All imports

import pandas
import torch
from torch_geometric.data import Data, HeteroData
from torch_geometric.datasets import FakeDataset, FakeHeteroDataset, KarateClub
from torch_geometric.utils import to_networkx
from torch_geometric.typing import EdgeType

from adbpyg_adapter import ADBPyG_Adapter, ADBPyG_Controller
from adbpyg_adapter.encoders import IdentityEncoder, CategoricalEncoder
from adbpyg_adapter.typings import Json, ADBMetagraph, PyGMetagraph

from arango import ArangoClient
from adb_cloud_connector import get_temp_credentials

import json
import logging

import matplotlib.pyplot as plt
import networkx as nx

Understanding PyG¶

(referenced from pytorch-geometric.readthedocs.io)

PyG (PyTorch Geometric) is a library built upon PyTorch to easily write and train Graph Neural Networks (GNNs) for a wide range of applications related to structured data.

At its core, PyG provides the following main features:

Data Handling of Graphs
Common Benchmark Datasets
Mini-batches
Data Transforms
Learning Methods on Graphs
Exercises

PyG represents a graph as an instance of torch_geometric.data.Data, which holds the following attributes by default:

data.x: Node feature matrix with shape [num_nodes, num_node_features]
data.edge_index: Graph connectivity in COO format with shape [2, num_edges] and type torch.long
data.edge_attr: Edge feature matrix with shape [num_edges, num_edge_features]
data.y: Target to train against (may have arbitrary shape), e.g., node-level targets of shape [num_nodes, *] or graph-level targets of shape [1, *]

We show a simple example of an unweighted and undirected graph with three nodes and four edges. Each node contains exactly one feature:

In [3]:

edge_index = torch.tensor([[0, 1, 1, 2], [1, 0, 2, 1]], dtype=torch.long)
x = torch.tensor([[-1], [0], [1]], dtype=torch.float)

data = Data(x=x, edge_index=edge_index)
print(data)

Data(x=[3, 1], edge_index=[2, 4])

Besides holding a number of node-level, edge-level or graph-level attributes, Data provides a number of useful utility functions, e.g.:

In [4]:

print(data.keys)

print(data['x'])


for key, item in data:
    print(f'{key} found in data')

print('edge_attr' in data)
print(data.num_nodes)
print(data.num_edges)
print(data.num_node_features)
print(data.has_isolated_nodes())
print(data.has_self_loops())
print(data.is_directed())

# Transfer data object to GPU (requires Tesla T4 GPU if running in Colab)
# device = torch.device('cuda')
# data = data.to(device)

['x', 'edge_index']
tensor([[-1.],
        [ 0.],
        [ 1.]])
x found in data
edge_index found in data
False
3
4
1
False
False
False

PyG also supports Heterogeneous graphs:

In [5]:

data = HeteroData()

data["user"].x = torch.tensor([[21], [16], [38], [64]])
data[("user", "follows", "user")].edge_index = torch.tensor([[0, 1], [1, 2]])
data[("user", "follows", "game")].edge_index = torch.tensor([[0, 1, 2], [0, 1, 2]])
data[("user", "plays", "game")].edge_index = torch.tensor([[3, 3], [1, 2]])
data[("user", "plays", "game")].edge_attr = torch.tensor([[3], [5]])

print(data)
print(data.node_types)
print(data.edge_types)

HeteroData(
  user={ x=[4, 1] },
  (user, follows, user)={ edge_index=[2, 2] },
  (user, follows, game)={ edge_index=[2, 3] },
  (user, plays, game)={
    edge_index=[2, 2],
    edge_attr=[2, 1]
  }
)
['user']
[('user', 'follows', 'user'), ('user', 'follows', 'game'), ('user', 'plays', 'game')]

For more info, visit pytorch-geometric.readthedocs.io.

Create a Temporary ArangoDB Cloud Instance¶

In [6]:

# Request temporary instance from the managed ArangoDB Cloud Service.
con = get_temp_credentials()
print(json.dumps(con, indent=2))

# Connect to the db via the python-arango driver
db = ArangoClient(hosts=con["url"]).db(con["dbName"], con["username"], con["password"], verify=True)

Log: requesting new credentials...
Succcess: new credentials acquired
{
  "dbName": "TUTc7mc78w0qlchle9za0opmc",
  "username": "TUTy0d4nq3jcidztw4rf5nyy",
  "password": "TUTg7njua0hhwpfr1u2m2b2zc",
  "hostname": "tutorials.arangodb.cloud",
  "port": 8529,
  "url": "https://tutorials.arangodb.cloud:8529"
}

Feel free to use to above URL to checkout the UI!

Data Import¶

For demo purposes, we will be using the ArangoDB IMDB example graph.

In [7]:

!chmod -R 755 pyg-adapter/
!./pyg-adapter/tests/tools/arangorestore -c none --server.endpoint http+ssl://{con["hostname"]}:{con["port"]} --server.username {con["username"]} --server.database {con["dbName"]} --server.password {con["password"]} --replication-factor 3  --input-directory "pyg-adapter/tests/data/adb/imdb_dump" --include-system-collections true

2022-07-29T22:41:50Z [437] INFO [05c30] {restore} Connected to ArangoDB 'http+ssl://tutorials.arangodb.cloud:8529'
2022-07-29T22:41:50Z [437] INFO [abeb4] {restore} Database name in source dump is 'TUTdit9ohpgz1ntnbetsjstwi'
2022-07-29T22:41:50Z [437] INFO [9b414] {restore} # Re-creating document collection 'Movies'...
2022-07-29T22:41:50Z [437] INFO [9b414] {restore} # Re-creating document collection 'Users'...
2022-07-29T22:41:51Z [437] INFO [9b414] {restore} # Re-creating edge collection 'Ratings'...
2022-07-29T22:41:51Z [437] INFO [6d69f] {restore} # Dispatched 3 job(s), using 2 worker(s)
2022-07-29T22:41:51Z [437] INFO [94913] {restore} # Loading data into document collection 'Movies', data size: 68107 byte(s)
2022-07-29T22:41:51Z [437] INFO [94913] {restore} # Loading data into document collection 'Users', data size: 16717 byte(s)
2022-07-29T22:41:51Z [437] INFO [6ae09] {restore} # Successfully restored document collection 'Users'
2022-07-29T22:41:51Z [437] INFO [94913] {restore} # Loading data into edge collection 'Ratings', data size: 1407601 byte(s)
2022-07-29T22:41:51Z [437] INFO [6ae09] {restore} # Successfully restored document collection 'Movies'
2022-07-29T22:41:56Z [437] INFO [75e65] {restore} # Current restore progress: restored 2 of 3 collection(s), read 9270558 byte(s) from datafiles, sent 3 data batch(es) of 881948 byte(s) total size, queued jobs: 0, workers: 2
2022-07-29T22:41:58Z [437] INFO [69a73] {restore} # Still loading data into edge collection 'Ratings', 10660073 byte(s) restored
2022-07-29T22:41:58Z [437] INFO [6ae09] {restore} # Successfully restored edge collection 'Ratings'
2022-07-29T22:41:58Z [437] INFO [a66e1] {restore} Processed 3 collection(s) in 7.461191 s, read 11542023 byte(s) from datafiles, sent 4 data batch(es) of 11542020 byte(s) total size

In [8]:

# Create the IMDB graph
db.create_graph(
    "imdb",
    edge_definitions=[
        {
            "edge_collection": "Ratings",
            "from_vertex_collections": ["Users"],
            "to_vertex_collections": ["Movies"],
        },
    ],
)

Out[8]:

<Graph imdb>

Instantiate the Adapter¶

Connect the ArangoDB-PyG Adapter to our temporary ArangoDB cluster:

In [9]:

adbpyg_adapter = ADBPyG_Adapter(db)

[2022/07/29 22:41:58 +0000] [58] [INFO] - adbpyg_adapter: Instantiated ADBPyG_Adapter with database 'TUTc7mc78w0qlchle9za0opmc'

PyG to ArangoDB¶

Karate Graph¶

PyG Karate Graph

adbpyg_adapter.adapter.pyg_to_arangodb()

Notes

The name parameter is used to name your ArangoDB graph.

In [10]:

# Create the PyG graph & draw it
pyg_karate_graph = KarateClub()[0]
print(pyg_karate_graph)
nx.draw(to_networkx(pyg_karate_graph), with_labels=True)

name = "Karate"

# Delete the graph if it already exists
db.delete_graph(name, drop_collections=True, ignore_missing=True)

# Create the ArangoDB graph
adb_karate_graph = adbpyg_adapter.pyg_to_arangodb(name, pyg_karate_graph)

# You can also provide valid Python-Arango Import Bulk options to the command above, like such:
# adb_karate_graph = adbpyg_adapter.pyg_to_arangodb(name, pyg_karate_graph, batch_size=5, on_duplicate="replace")
# See the full parameter list at https://docs.python-arango.com/en/main/specs.html#arango.collection.Collection.import_bulk

print('\n--------------------')
print("URL: " + con["url"])
print("Username: " + con["username"])
print("Password: " + con["password"])
print("Database: " + con["dbName"])
print('--------------------\n')
print(f"View the created graph here: {con['url']}/_db/{con['dbName']}/_admin/aardvark/index.html#graph/{name}\n")
print(f"View the original graph below:\n")

Data(x=[34, 34], edge_index=[2, 156], y=[34], train_mask=[34])

[2022/07/29 22:41:58 +0000] [58] [INFO] - adbpyg_adapter: Created ArangoDB 'Karate' Graph

--------------------
URL: https://tutorials.arangodb.cloud:8529
Username: TUTy0d4nq3jcidztw4rf5nyy
Password: TUTg7njua0hhwpfr1u2m2b2zc
Database: TUTc7mc78w0qlchle9za0opmc
--------------------

View the created graph here: https://tutorials.arangodb.cloud:8529/_db/TUTc7mc78w0qlchle9za0opmc/_admin/aardvark/index.html#graph/Karate

View the original graph below:

FakeHomogeneous Graph¶

PyG FakeDataset

adbpyg_adapter.adapter.pyg_to_arangodb()

Notes

The name parameter is used to name your ArangoDB graph.

In [11]:

# Create the PyG graph & draw it
pyg_homo_graph = FakeDataset(avg_num_nodes=30, edge_dim=1)[0] # 'edge_weight' property
print(pyg_homo_graph)
nx.draw(to_networkx(pyg_homo_graph), with_labels=True)

name = "FakeHomo"

# Delete the graph if it already exists
db.delete_graph(name, drop_collections=True, ignore_missing=True)

# Create the ArangoDB graph
adb_homo_graph = adbpyg_adapter.pyg_to_arangodb(name, pyg_homo_graph)

print('\n--------------------')
print("URL: " + con["url"])
print("Username: " + con["username"])
print("Password: " + con["password"])
print("Database: " + con["dbName"])
print('--------------------\n')
print(f"View the created graph here: {con['url']}/_db/{con['dbName']}/_admin/aardvark/index.html#graph/{name}\n")
print(f"View the original graph below:\n")

Data(y=[25], edge_index=[2, 346], x=[25, 64], edge_weight=[346])

[2022/07/29 22:42:00 +0000] [58] [INFO] - adbpyg_adapter: Created ArangoDB 'FakeHomo' Graph

--------------------
URL: https://tutorials.arangodb.cloud:8529
Username: TUTy0d4nq3jcidztw4rf5nyy
Password: TUTg7njua0hhwpfr1u2m2b2zc
Database: TUTc7mc78w0qlchle9za0opmc
--------------------

View the created graph here: https://tutorials.arangodb.cloud:8529/_db/TUTc7mc78w0qlchle9za0opmc/_admin/aardvark/index.html#graph/FakeHomo

View the original graph below:

FakeHeterogeneous Graph¶

PyG FakeHeteroDataset

adbpyg_adapter.adapter.pyg_to_arangodb()

Notes

The name parameter is used to name your ArangoDB graph.

In [12]:

# Create the PyG graph
pyg_hetero_graph = FakeHeteroDataset(avg_num_nodes=30, edge_dim=2)[0] # 'edge_attr' property

name = "FakeHetero"

# Delete the graph if it already exists
db.delete_graph(name, drop_collections=True, ignore_missing=True)

# Create the ArangoDB graphs
adb_hetero_graph = adbpyg_adapter.pyg_to_arangodb(name, pyg_hetero_graph)

print('\n--------------------')
print("URL: " + con["url"])
print("Username: " + con["username"])
print("Password: " + con["password"])
print("Database: " + con["dbName"])
print('--------------------\n')
print(f"View the created graph here: {con['url']}/_db/{con['dbName']}/_admin/aardvark/index.html#graph/{name}\n")
print(f"View the original graph below:\n")

[2022/07/29 22:42:03 +0000] [58] [INFO] - adbpyg_adapter: Created ArangoDB 'FakeHetero' Graph

--------------------
URL: https://tutorials.arangodb.cloud:8529
Username: TUTy0d4nq3jcidztw4rf5nyy
Password: TUTg7njua0hhwpfr1u2m2b2zc
Database: TUTc7mc78w0qlchle9za0opmc
--------------------

View the created graph here: https://tutorials.arangodb.cloud:8529/_db/TUTc7mc78w0qlchle9za0opmc/_admin/aardvark/index.html#graph/FakeHetero

View the original graph below:

FakeHeterogeneous Graph with a PyG-ArangoDB metagraph¶

PyG FakeHeteroDataset

adbpyg_adapter.adapter.pyg_to_arangodb()

Notes

The name parameter is used to name your ArangoDB graph.
The metagraph parameter is an optional object mapping the PyG keys of the node & edge data to strings, list of strings, or user-defined functions.

In [13]:

# Create the PyG graph
pyg_hetero_graph = FakeHeteroDataset(
    num_node_types=2,
    num_edge_types=3,
    avg_num_nodes=20,
    avg_num_channels=3,  # avg number of features per node
    edge_dim=2,  # number of features per edge
    num_classes=3,  # number of unique label values
)[0]
print(pyg_hetero_graph)
nx.draw(to_networkx(pyg_hetero_graph.to_homogeneous()), with_labels=True)

# Define the metagraph
def y_tensor_to_2_column_dataframe(pyg_tensor):
    label_map = {0: "Kiwi", 1: "Blueberry", 2: "Avocado"}

    df = pandas.DataFrame(columns=["label_num", "label_str"])
    df["label_num"] = pyg_tensor.tolist()
    df["label_str"] = df["label_num"].map(label_map)

    return df

metagraph = {
    "nodeTypes": {
        "v0": {
            "x": "features",  # 1) you can specify a string value for attribute renaming
            "y": y_tensor_to_2_column_dataframe,  # 2) you can specify a function for user-defined handling, as long as the function returns a Pandas DataFrame
        },
    },
    "edgeTypes": {
        ("v0", "e0", "v0"): {
            # 3) you can specify a list of strings for tensor dissasembly (if you know the number of node/edge features in advance)
            "edge_attr": [ "a", "b"]  
        },
    },
}

name = "FakeHetero"

db.delete_graph(name, drop_collections=True, ignore_missing=True)

# Create the ArangoDB graph with `explicit_metagraph=False`
adb_hetero_graph = adbpyg_adapter.pyg_to_arangodb(name, pyg_hetero_graph, metagraph, explicit_metagraph=False)

# Create the ArangoDB graph with `explicit_metagraph=True`
# With `explicit_metagraph=True`, the node & edge types omitted from the metagraph will NOT be converted to ArangoDB.
# Only 'v0' and ('v0', 'e0', 'v0') will be brought over (i.e 'v1', ('v0', 'e0', 'v1'), ... are ignored):
## adb_hetero_graph_explicit = adbpyg_adapter.pyg_to_arangodb(name, pyg_hetero_graph, metagraph, explicit_metagraph=True)

print('\n--------------------')
print("URL: " + con["url"])
print("Username: " + con["username"])
print("Password: " + con["password"])
print("Database: " + con["dbName"])
print('--------------------\n')
print(f"View the created graph here: {con['url']}/_db/{con['dbName']}/_admin/aardvark/index.html#graph/{name}\n")
print(f"View the original graph below:\n")

HeteroData(
  v0={
    x=[18, 2],
    y=[18]
  },
  v1={ x=[19, 3] },
  (v1, e0, v1)={
    edge_index=[2, 154],
    edge_attr=[154, 2]
  },
  (v1, e0, v0)={
    edge_index=[2, 141],
    edge_attr=[141, 2]
  },
  (v0, e0, v0)={
    edge_index=[2, 134],
    edge_attr=[134, 2]
  }
)

[2022/07/29 22:42:05 +0000] [58] [INFO] - adbpyg_adapter: Created ArangoDB 'FakeHetero' Graph

--------------------
URL: https://tutorials.arangodb.cloud:8529
Username: TUTy0d4nq3jcidztw4rf5nyy
Password: TUTg7njua0hhwpfr1u2m2b2zc
Database: TUTc7mc78w0qlchle9za0opmc
--------------------

View the created graph here: https://tutorials.arangodb.cloud:8529/_db/TUTc7mc78w0qlchle9za0opmc/_admin/aardvark/index.html#graph/FakeHetero

View the original graph below:

FakeHeterogeneous Graph with a user-defined ADBPyG Controller¶

PyG FakeHeteroDataset

adbpyg_adapter.adapter.pyg_to_arangodb()

Notes

The name parameter is used to name your ArangoDB graph.
The ADBPyG_Controller is an optional user-defined class for controlling how nodes & edges are handled when transitioning from PyG to ArangoDB. It is interpreted as the alternative to the metagraph parameter.

In [14]:

# Create the PyG graph
pyg_hetero_graph = FakeHeteroDataset(avg_num_nodes=30, edge_dim=2)[0] # 'edge_attr' property

name = "FakeHetero"

db.delete_graph(name, drop_collections=True, ignore_missing=True)

# Create a custom ADBPyG_Controller
class Custom_ADBPyG_Controller(ADBPyG_Controller):
    """ArangoDB-PyG controller.

    Responsible for controlling how nodes & edges are handled when
    transitioning from PyG to ArangoDB.

    You can derive your own custom ADBPyG_Controller.
    """

    def _prepare_pyg_node(self, pyg_node: dict, col: str) -> dict:
        """Optionally modify a PyG node object before it gets inserted into its designated ArangoDB collection.

        :param pyg_node: The PyG node object to (optionally) modify.
        :param col: The ArangoDB collection the PyG node belongs to.
        :return: The PyG Node object
        """
        pyg_node["foo"] = "bar"
        return pyg_node

    def _prepare_pyg_edge(self, pyg_edge: dict, edge_type: tuple) -> dict:
        """Optionally modify a PyG edge object before it gets inserted into its designated ArangoDB collection.

        :param pyg_edge: The PyG edge object to (optionally) modify.
        :param edge_type: The Edge Type of the PyG edge. Formatted
            as (from_collection, edge_collection, to_collection)
        :return: The PyG Edge object
        """
        pyg_edge["bar"] = "foo"
        return pyg_edge

# Instantiate new adapter & create the ArangoDB graph
adb_hetero_graph = ADBPyG_Adapter(db, Custom_ADBPyG_Controller()).pyg_to_arangodb(name, pyg_hetero_graph)

print('\n--------------------')
print("URL: " + con["url"])
print("Username: " + con["username"])
print("Password: " + con["password"])
print("Database: " + con["dbName"])
print('--------------------\n')
print(f"View the created graph here: {con['url']}/_db/{con['dbName']}/_admin/aardvark/index.html#graph/{name}\n")

[2022/07/29 22:42:08 +0000] [58] [INFO] - adbpyg_adapter: Instantiated ADBPyG_Adapter with database 'TUTc7mc78w0qlchle9za0opmc'

[2022/07/29 22:42:10 +0000] [58] [INFO] - adbpyg_adapter: Created ArangoDB 'FakeHetero' Graph

--------------------
URL: https://tutorials.arangodb.cloud:8529
Username: TUTy0d4nq3jcidztw4rf5nyy
Password: TUTg7njua0hhwpfr1u2m2b2zc
Database: TUTc7mc78w0qlchle9za0opmc
--------------------

View the created graph here: https://tutorials.arangodb.cloud:8529/_db/TUTc7mc78w0qlchle9za0opmc/_admin/aardvark/index.html#graph/FakeHetero

ArangoDB to PyG¶

In [15]:

# Start from scratch! (with a smaller graph)
data = FakeHeteroDataset(
    num_node_types=2,
    num_edge_types=3,
    avg_num_nodes=20,
    avg_num_channels=3,  # avg number of features per node
    edge_dim=2,  # number of features per edge
    num_classes=3,  # number of unique label values
)[0]

adbpyg_adapter.pyg_to_arangodb("FakeHetero", data, overwrite_graph=True, overwrite=True)

[2022/07/29 22:42:10 +0000] [58] [INFO] - adbpyg_adapter: Created ArangoDB 'FakeHetero' Graph

Out[15]:

<Graph FakeHetero>

Via ArangoDB Graph¶

PyG FakeHeteroDataset

adbpyg_adapter.adapter.arangodb_graph_to_pyg()

Notes

The name parameter in this case must point to an existing ArangoDB graph in your ArangoDB instance.
Due to risk of ambiguity, this method does not carry over ArangoDB attributes to PyG.

In [16]:

# Define graph name
graph_name = "FakeHetero"

# Create PyG graph from the ArangoDB graph
pyg_hetero_graph = adbpyg_adapter.arangodb_graph_to_pyg(graph_name)

# You can also provide valid Python-Arango AQL query options to the command above, like such:
# pyg_hetero_graph = adbpyg_adapter.arangodb_graph_to_pyg(graph_name, ttl=1000, stream=True)
# See the full parameter list at https://docs.python-arango.com/en/main/specs.html#arango.aql.AQL.execute

# Show graph data
print('\n--------------------')
print(pyg_hetero_graph)

[2022/07/29 22:42:10 +0000] [58] [INFO] - adbpyg_adapter: Created PyG 'FakeHetero' Graph

--------------------
HeteroData(
  v0={},
  v1={},
  (v0, e0, v0)={ edge_index=[2, 146] }
)

Via ArangoDB Collections¶

PyG FakeHeteroDataset

adbdpyg_adapter.adapter.arangodb_collections_to_pyg()

Notes

The name parameter is purely for documentation purposes in this case.
The vertex_collections & edge_collections parameters must point to existing ArangoDB collections within your ArangoDB instance.
Due to risk of ambiguity, this method does not carry over ArangoDB attributes to PyG.

In [17]:

# Define collection names
v_cols = {"v0", "v1"}
e_cols = {"e0"}

# Create PyG graph from the ArangoDB collections
pyg_hetero_graph = adbpyg_adapter.arangodb_collections_to_pyg("FakeHetero", v_cols, e_cols)

# Show graph data
print('\n--------------------')
print(pyg_hetero_graph)

[2022/07/29 22:42:10 +0000] [58] [INFO] - adbpyg_adapter: Created PyG 'FakeHetero' Graph

--------------------
HeteroData(
  v1={},
  v0={},
  (v0, e0, v0)={ edge_index=[2, 146] }
)

Via ArangoDB-PyG metagraph 1¶

PyG FakeHeteroDataset

adbdpyg_adapter.adapter.arangodb_to_pyg()

Notes

The name parameter is purely for documentation purposes in this case.
The metagraph parameter is an object defining vertex & edge collections to import to PyG, along with collection-level specifications to indicate which ArangoDB attributes will become PyG features/labels. It should contain collections & associated document attributes names that exist within your ArangoDB instance.

In [18]:

# Define the Metagraph that transfers ArangoDB attributes "as is",
# meaning the data is already formatted to PyG data standards
metagraph_v1 = {
    "vertexCollections": {
        # we instruct the adapter to create the "x" and "y" tensor data from the "x" and "y" ArangoDB attributes
        "v0": { "x": "x", "y": "y"},  
        "v1": {"x": "x"},
    },
    "edgeCollections": {
        "e0": {"edge_attr": "edge_attr"},
    },
}

# Create PyG Graph
pyg_hetero_graph = adbpyg_adapter.arangodb_to_pyg("FakeHetero", metagraph_v1)

# Show graph data
print('\n--------------------')
print(pyg_hetero_graph)

[2022/07/29 22:42:11 +0000] [58] [INFO] - adbpyg_adapter: Created PyG 'FakeHetero' Graph

--------------------
HeteroData(
  v0={
    x=[19, 2],
    y=[19]
  },
  v1={ x=[16, 2] },
  (v0, e0, v0)={
    edge_index=[2, 146],
    edge_attr=[146, 2]
  }
)

Via ArangoDB-PyG metagraph 2¶

ArangoDB IMDB Movie Dataset

Package methods used

adbdpyg_adapter.adapter.arangodb_to_pyg()

Important notes

The name parameter is purely for documentation purposes in this case.
The metagraph parameter is an object defining vertex & edge collections to import to PyG, along with collection-level specifications to indicate which ArangoDB attributes will become PyG features/labels. In this example, we rely on user-defined encoders to build PyG-ready tensors (i.e feature matrices) from ArangoDB attributes. See https://pytorch-geometric.readthedocs.io/en/latest/notes/load_csv.html for an example on using encoders with PyG.

In [19]:

# Define the Metagraph that transfers attributes via user-defined encoders
metagraph_v2 = {
    "vertexCollections": {
        "Movies": {
            "x": {  # Build a feature matrix from the "Action" & "Drama" document attributes
                "Action": IdentityEncoder(dtype=torch.long),
                "Drama": IdentityEncoder(dtype=torch.long),
            },
            "y": "Comedy",
        },
        "Users": {
            "x": {
                "Gender": CategoricalEncoder(), # CategoricalEncoder(mapping={"M": 0, "F": 1}),
                "Age": IdentityEncoder(dtype=torch.long),
            }
        },
    },
    "edgeCollections": {
        "Ratings": {
            "edge_weight": "Rating"
        }
    },
}

# Create PyG Graph
pyg_imdb_graph = adbpyg_adapter.arangodb_to_pyg("IMDB", metagraph_v2)

# Show graph data
print('\n--------------------')
print(pyg_imdb_graph)

[2022/07/29 22:42:13 +0000] [58] [INFO] - adbpyg_adapter: Created PyG 'IMDB' Graph

--------------------
HeteroData(
  Movies={
    x=[1682, 2],
    y=[1682]
  },
  Users={ x=[943, 2] },
  (Users, Ratings, Movies)={
    edge_index=[2, 65499],
    edge_weight=[65499]
  }
)

Via ArangoDB-PyG metagraph 3¶

PyG FakeHeteroDataset

adbdpyg_adapter.adapter.arangodb_to_pyg()

Notes

The name parameter is purely for documentation purposes in this case.
The metagraph parameter is an object defining vertex & edge collections to import to PyG, along with collection-level specifications to indicate which ArangoDB attributes will become PyG features/labels. In this example, we rely on user-defined functions to handle ArangoDB attribute to PyG feature conversion.

In [20]:

# Define the metagraph that transfers attributes via user-defined functions
def udf_v0_x(v0_df):
    # process v0_df here to return v0 "x" feature matrix
    # v0_df["x"] = ...
    return torch.tensor(v0_df["x"].to_list())


def udf_v1_x(v1_df):
    # process v1_df here to return v1 "x" feature matrix
    # v1_df["x"] = ...
    return torch.tensor(v1_df["x"].to_list())


metagraph_v3 = {
    "vertexCollections": {
        "v0": {
            "x": udf_v0_x,  # supports named functions
            "y": lambda df: torch.tensor(df["y"].to_list()),  # also supports lambda functions
        },
        "v1": {"x": udf_v1_x},
    },
    "edgeCollections": {
        "e0": {"edge_attr": (lambda df: torch.tensor(df["edge_attr"].to_list()))},
    },
}

# Create PyG Graph
pyg_hetero_graph = adbpyg_adapter.arangodb_to_pyg("FakeHetero", metagraph_v3)

# Show graph data
print('\n--------------------')
print(pyg_hetero_graph)

[2022/07/29 22:42:13 +0000] [58] [INFO] - adbpyg_adapter: Created PyG 'FakeHetero' Graph

--------------------
HeteroData(
  v0={
    x=[19, 2],
    y=[19]
  },
  v1={ x=[16, 2] },
  (v0, e0, v0)={
    edge_index=[2, 146],
    edge_attr=[146, 2]
  }
)

Introducing the ArangoDB-PyG Adapter

ArangoDB PyG Adapter Getting Started Guide¶

Setup¶

Understanding PyG¶

Create a Temporary ArangoDB Cloud Instance¶

Data Import¶

Instantiate the Adapter¶

PyG to ArangoDB¶

Karate Graph¶

FakeHomogeneous Graph¶

FakeHeterogeneous Graph¶

FakeHeterogeneous Graph with a PyG-ArangoDB metagraph¶

FakeHeterogeneous Graph with a user-defined ADBPyG Controller¶

ArangoDB to PyG¶

Via ArangoDB Graph¶

Via ArangoDB Collections¶

Via ArangoDB-PyG metagraph 1¶

Via ArangoDB-PyG metagraph 2¶

Via ArangoDB-PyG metagraph 3¶

Recommend

Introducing the IRONdb Prometheus Adapter

Introducing the ArangoDB-DGL Adapter

Introducing the ArangoDB-NetworkX Adapter

Introducing ArangoDB 3.9 – Graph Meets Analytics

Introducing the new ArangoDB Datasource for Apache Spark

Introducing the Knex Adapter for Auth.js

ArangoDB’s Exciting Updates: Introducing Our Developer Hub and GenAI Bots!

Introducing ArangoDB’s Data Loader : Revolutionizing Your Data Migration Experie...

How ArangGraphML Leverages Intel’s PyG Optimizations

Reintroducing the ArangoDB-RDF Adapter

About Joyk