11

Introducing the ArangoDB-PyG Adapter

 2 years ago
source link: https://www.arangodb.com/2022/08/introducing-the-arangodb-pyg-adapter/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

ArangoDB PyG Adapter Getting Started Guide

arangodb
pyg_logo_text.svg?sanitize=true

Version: 1.0.0

Objective: Export Graphs from ArangoDB, the multi-model database for graph & beyond, to PyTorch Geometric (PyG), a python package for graph neural networks, and vice-versa.

Setup

In [1]:
%%capture
!pip install torch
!pip install adbpyg-adapter==1.0.0
!pip install adb-cloud-connector
!git clone -b 1.0.0 --single-branch https://github.com/arangoml/pyg-adapter.git

## For drawing purposes 
!pip install matplotlib
!pip install networkx
In [2]:
# All imports

import pandas
import torch
from torch_geometric.data import Data, HeteroData
from torch_geometric.datasets import FakeDataset, FakeHeteroDataset, KarateClub
from torch_geometric.utils import to_networkx
from torch_geometric.typing import EdgeType

from adbpyg_adapter import ADBPyG_Adapter, ADBPyG_Controller
from adbpyg_adapter.encoders import IdentityEncoder, CategoricalEncoder
from adbpyg_adapter.typings import Json, ADBMetagraph, PyGMetagraph

from arango import ArangoClient
from adb_cloud_connector import get_temp_credentials

import json
import logging

import matplotlib.pyplot as plt
import networkx as nx

Understanding PyG

(referenced from pytorch-geometric.readthedocs.io)

PyG (PyTorch Geometric) is a library built upon PyTorch to easily write and train Graph Neural Networks (GNNs) for a wide range of applications related to structured data.

At its core, PyG provides the following main features:

  1. Data Handling of Graphs
  2. Common Benchmark Datasets
  3. Mini-batches
  4. Data Transforms
  5. Learning Methods on Graphs
  6. Exercises

PyG represents a graph as an instance of torch_geometric.data.Data, which holds the following attributes by default:

  • data.x: Node feature matrix with shape [num_nodes, num_node_features]
  • data.edge_index: Graph connectivity in COO format with shape [2, num_edges] and type torch.long
  • data.edge_attr: Edge feature matrix with shape [num_edges, num_edge_features]
  • data.y: Target to train against (may have arbitrary shape), e.g., node-level targets of shape [num_nodes, *] or graph-level targets of shape [1, *]

We show a simple example of an unweighted and undirected graph with three nodes and four edges. Each node contains exactly one feature:

In [3]:
edge_index = torch.tensor([[0, 1, 1, 2], [1, 0, 2, 1]], dtype=torch.long)
x = torch.tensor([[-1], [0], [1]], dtype=torch.float)

data = Data(x=x, edge_index=edge_index)
print(data)
Data(x=[3, 1], edge_index=[2, 4])

Besides holding a number of node-level, edge-level or graph-level attributes, Data provides a number of useful utility functions, e.g.:

In [4]:
print(data.keys)

print(data['x'])


for key, item in data:
    print(f'{key} found in data')

print('edge_attr' in data)
print(data.num_nodes)
print(data.num_edges)
print(data.num_node_features)
print(data.has_isolated_nodes())
print(data.has_self_loops())
print(data.is_directed())

# Transfer data object to GPU (requires Tesla T4 GPU if running in Colab)
# device = torch.device('cuda')
# data = data.to(device)
['x', 'edge_index']
tensor([[-1.],
        [ 0.],
        [ 1.]])
x found in data
edge_index found in data
False
3
4
1
False
False
False

PyG also supports Heterogeneous graphs:

In [5]:
data = HeteroData()

data["user"].x = torch.tensor([[21], [16], [38], [64]])
data[("user", "follows", "user")].edge_index = torch.tensor([[0, 1], [1, 2]])
data[("user", "follows", "game")].edge_index = torch.tensor([[0, 1, 2], [0, 1, 2]])
data[("user", "plays", "game")].edge_index = torch.tensor([[3, 3], [1, 2]])
data[("user", "plays", "game")].edge_attr = torch.tensor([[3], [5]])

print(data)
print(data.node_types)
print(data.edge_types)
HeteroData(
  user={ x=[4, 1] },
  (user, follows, user)={ edge_index=[2, 2] },
  (user, follows, game)={ edge_index=[2, 3] },
  (user, plays, game)={
    edge_index=[2, 2],
    edge_attr=[2, 1]
  }
)
['user']
[('user', 'follows', 'user'), ('user', 'follows', 'game'), ('user', 'plays', 'game')]

For more info, visit pytorch-geometric.readthedocs.io.

Create a Temporary ArangoDB Cloud Instance

In [6]:
# Request temporary instance from the managed ArangoDB Cloud Service.
con = get_temp_credentials()
print(json.dumps(con, indent=2))

# Connect to the db via the python-arango driver
db = ArangoClient(hosts=con["url"]).db(con["dbName"], con["username"], con["password"], verify=True)
Log: requesting new credentials...
Succcess: new credentials acquired
{
  "dbName": "TUTc7mc78w0qlchle9za0opmc",
  "username": "TUTy0d4nq3jcidztw4rf5nyy",
  "password": "TUTg7njua0hhwpfr1u2m2b2zc",
  "hostname": "tutorials.arangodb.cloud",
  "port": 8529,
  "url": "https://tutorials.arangodb.cloud:8529"
}

Feel free to use to above URL to checkout the UI!

Data Import

For demo purposes, we will be using the ArangoDB IMDB example graph.

In [7]:
!chmod -R 755 pyg-adapter/
!./pyg-adapter/tests/tools/arangorestore -c none --server.endpoint http+ssl://{con["hostname"]}:{con["port"]} --server.username {con["username"]} --server.database {con["dbName"]} --server.password {con["password"]} --replication-factor 3  --input-directory "pyg-adapter/tests/data/adb/imdb_dump" --include-system-collections true
2022-07-29T22:41:50Z [437] INFO [05c30] {restore} Connected to ArangoDB 'http+ssl://tutorials.arangodb.cloud:8529'
2022-07-29T22:41:50Z [437] INFO [abeb4] {restore} Database name in source dump is 'TUTdit9ohpgz1ntnbetsjstwi'
2022-07-29T22:41:50Z [437] INFO [9b414] {restore} # Re-creating document collection 'Movies'...
2022-07-29T22:41:50Z [437] INFO [9b414] {restore} # Re-creating document collection 'Users'...
2022-07-29T22:41:51Z [437] INFO [9b414] {restore} # Re-creating edge collection 'Ratings'...
2022-07-29T22:41:51Z [437] INFO [6d69f] {restore} # Dispatched 3 job(s), using 2 worker(s)
2022-07-29T22:41:51Z [437] INFO [94913] {restore} # Loading data into document collection 'Movies', data size: 68107 byte(s)
2022-07-29T22:41:51Z [437] INFO [94913] {restore} # Loading data into document collection 'Users', data size: 16717 byte(s)
2022-07-29T22:41:51Z [437] INFO [6ae09] {restore} # Successfully restored document collection 'Users'
2022-07-29T22:41:51Z [437] INFO [94913] {restore} # Loading data into edge collection 'Ratings', data size: 1407601 byte(s)
2022-07-29T22:41:51Z [437] INFO [6ae09] {restore} # Successfully restored document collection 'Movies'
2022-07-29T22:41:56Z [437] INFO [75e65] {restore} # Current restore progress: restored 2 of 3 collection(s), read 9270558 byte(s) from datafiles, sent 3 data batch(es) of 881948 byte(s) total size, queued jobs: 0, workers: 2
2022-07-29T22:41:58Z [437] INFO [69a73] {restore} # Still loading data into edge collection 'Ratings', 10660073 byte(s) restored
2022-07-29T22:41:58Z [437] INFO [6ae09] {restore} # Successfully restored edge collection 'Ratings'
2022-07-29T22:41:58Z [437] INFO [a66e1] {restore} Processed 3 collection(s) in 7.461191 s, read 11542023 byte(s) from datafiles, sent 4 data batch(es) of 11542020 byte(s) total size
In [8]:
# Create the IMDB graph
db.create_graph(
    "imdb",
    edge_definitions=[
        {
            "edge_collection": "Ratings",
            "from_vertex_collections": ["Users"],
            "to_vertex_collections": ["Movies"],
        },
    ],
)
Out[8]:
<Graph imdb>

Instantiate the Adapter

Connect the ArangoDB-PyG Adapter to our temporary ArangoDB cluster:

In [9]:
adbpyg_adapter = ADBPyG_Adapter(db)
[2022/07/29 22:41:58 +0000] [58] [INFO] - adbpyg_adapter: Instantiated ADBPyG_Adapter with database 'TUTc7mc78w0qlchle9za0opmc'

PyG to ArangoDB

Karate Graph

  • adbpyg_adapter.adapter.pyg_to_arangodb()

Notes

  • The name parameter is used to name your ArangoDB graph.
In [10]:
# Create the PyG graph & draw it
pyg_karate_graph = KarateClub()[0]
print(pyg_karate_graph)
nx.draw(to_networkx(pyg_karate_graph), with_labels=True)

name = "Karate"

# Delete the graph if it already exists
db.delete_graph(name, drop_collections=True, ignore_missing=True)

# Create the ArangoDB graph
adb_karate_graph = adbpyg_adapter.pyg_to_arangodb(name, pyg_karate_graph)

# You can also provide valid Python-Arango Import Bulk options to the command above, like such:
# adb_karate_graph = adbpyg_adapter.pyg_to_arangodb(name, pyg_karate_graph, batch_size=5, on_duplicate="replace")
# See the full parameter list at https://docs.python-arango.com/en/main/specs.html#arango.collection.Collection.import_bulk

print('\n--------------------')
print("URL: " + con["url"])
print("Username: " + con["username"])
print("Password: " + con["password"])
print("Database: " + con["dbName"])
print('--------------------\n')
print(f"View the created graph here: {con['url']}/_db/{con['dbName']}/_admin/aardvark/index.html#graph/{name}\n")
print(f"View the original graph below:\n")
Data(x=[34, 34], edge_index=[2, 156], y=[34], train_mask=[34])
[2022/07/29 22:41:58 +0000] [58] [INFO] - adbpyg_adapter: Created ArangoDB 'Karate' Graph
--------------------
URL: https://tutorials.arangodb.cloud:8529
Username: TUTy0d4nq3jcidztw4rf5nyy
Password: TUTg7njua0hhwpfr1u2m2b2zc
Database: TUTc7mc78w0qlchle9za0opmc
--------------------

View the created graph here: https://tutorials.arangodb.cloud:8529/_db/TUTc7mc78w0qlchle9za0opmc/_admin/aardvark/index.html#graph/Karate

View the original graph below:

FakeHomogeneous Graph

  • adbpyg_adapter.adapter.pyg_to_arangodb()

Notes

  • The name parameter is used to name your ArangoDB graph.
In [11]:
# Create the PyG graph & draw it
pyg_homo_graph = FakeDataset(avg_num_nodes=30, edge_dim=1)[0] # 'edge_weight' property
print(pyg_homo_graph)
nx.draw(to_networkx(pyg_homo_graph), with_labels=True)

name = "FakeHomo"

# Delete the graph if it already exists
db.delete_graph(name, drop_collections=True, ignore_missing=True)

# Create the ArangoDB graph
adb_homo_graph = adbpyg_adapter.pyg_to_arangodb(name, pyg_homo_graph)

print('\n--------------------')
print("URL: " + con["url"])
print("Username: " + con["username"])
print("Password: " + con["password"])
print("Database: " + con["dbName"])
print('--------------------\n')
print(f"View the created graph here: {con['url']}/_db/{con['dbName']}/_admin/aardvark/index.html#graph/{name}\n")
print(f"View the original graph below:\n")
Data(y=[25], edge_index=[2, 346], x=[25, 64], edge_weight=[346])
[2022/07/29 22:42:00 +0000] [58] [INFO] - adbpyg_adapter: Created ArangoDB 'FakeHomo' Graph
--------------------
URL: https://tutorials.arangodb.cloud:8529
Username: TUTy0d4nq3jcidztw4rf5nyy
Password: TUTg7njua0hhwpfr1u2m2b2zc
Database: TUTc7mc78w0qlchle9za0opmc
--------------------

View the created graph here: https://tutorials.arangodb.cloud:8529/_db/TUTc7mc78w0qlchle9za0opmc/_admin/aardvark/index.html#graph/FakeHomo

View the original graph below:

FakeHeterogeneous Graph

  • adbpyg_adapter.adapter.pyg_to_arangodb()

Notes

  • The name parameter is used to name your ArangoDB graph.
In [12]:
# Create the PyG graph
pyg_hetero_graph = FakeHeteroDataset(avg_num_nodes=30, edge_dim=2)[0] # 'edge_attr' property

name = "FakeHetero"

# Delete the graph if it already exists
db.delete_graph(name, drop_collections=True, ignore_missing=True)

# Create the ArangoDB graphs
adb_hetero_graph = adbpyg_adapter.pyg_to_arangodb(name, pyg_hetero_graph)

print('\n--------------------')
print("URL: " + con["url"])
print("Username: " + con["username"])
print("Password: " + con["password"])
print("Database: " + con["dbName"])
print('--------------------\n')
print(f"View the created graph here: {con['url']}/_db/{con['dbName']}/_admin/aardvark/index.html#graph/{name}\n")
print(f"View the original graph below:\n")
[2022/07/29 22:42:03 +0000] [58] [INFO] - adbpyg_adapter: Created ArangoDB 'FakeHetero' Graph
--------------------
URL: https://tutorials.arangodb.cloud:8529
Username: TUTy0d4nq3jcidztw4rf5nyy
Password: TUTg7njua0hhwpfr1u2m2b2zc
Database: TUTc7mc78w0qlchle9za0opmc
--------------------

View the created graph here: https://tutorials.arangodb.cloud:8529/_db/TUTc7mc78w0qlchle9za0opmc/_admin/aardvark/index.html#graph/FakeHetero

View the original graph below:

FakeHeterogeneous Graph with a PyG-ArangoDB metagraph

  • adbpyg_adapter.adapter.pyg_to_arangodb()

Notes

  • The name parameter is used to name your ArangoDB graph.
  • The metagraph parameter is an optional object mapping the PyG keys of the node & edge data to strings, list of strings, or user-defined functions.
In [13]:
# Create the PyG graph
pyg_hetero_graph = FakeHeteroDataset(
    num_node_types=2,
    num_edge_types=3,
    avg_num_nodes=20,
    avg_num_channels=3,  # avg number of features per node
    edge_dim=2,  # number of features per edge
    num_classes=3,  # number of unique label values
)[0]
print(pyg_hetero_graph)
nx.draw(to_networkx(pyg_hetero_graph.to_homogeneous()), with_labels=True)

# Define the metagraph
def y_tensor_to_2_column_dataframe(pyg_tensor):
    label_map = {0: "Kiwi", 1: "Blueberry", 2: "Avocado"}

    df = pandas.DataFrame(columns=["label_num", "label_str"])
    df["label_num"] = pyg_tensor.tolist()
    df["label_str"] = df["label_num"].map(label_map)

    return df

metagraph = {
    "nodeTypes": {
        "v0": {
            "x": "features",  # 1) you can specify a string value for attribute renaming
            "y": y_tensor_to_2_column_dataframe,  # 2) you can specify a function for user-defined handling, as long as the function returns a Pandas DataFrame
        },
    },
    "edgeTypes": {
        ("v0", "e0", "v0"): {
            # 3) you can specify a list of strings for tensor dissasembly (if you know the number of node/edge features in advance)
            "edge_attr": [ "a", "b"]  
        },
    },
}

name = "FakeHetero"

db.delete_graph(name, drop_collections=True, ignore_missing=True)

# Create the ArangoDB graph with `explicit_metagraph=False`
adb_hetero_graph = adbpyg_adapter.pyg_to_arangodb(name, pyg_hetero_graph, metagraph, explicit_metagraph=False)

# Create the ArangoDB graph with `explicit_metagraph=True`
# With `explicit_metagraph=True`, the node & edge types omitted from the metagraph will NOT be converted to ArangoDB.
# Only 'v0' and ('v0', 'e0', 'v0') will be brought over (i.e 'v1', ('v0', 'e0', 'v1'), ... are ignored):
## adb_hetero_graph_explicit = adbpyg_adapter.pyg_to_arangodb(name, pyg_hetero_graph, metagraph, explicit_metagraph=True)

print('\n--------------------')
print("URL: " + con["url"])
print("Username: " + con["username"])
print("Password: " + con["password"])
print("Database: " + con["dbName"])
print('--------------------\n')
print(f"View the created graph here: {con['url']}/_db/{con['dbName']}/_admin/aardvark/index.html#graph/{name}\n")
print(f"View the original graph below:\n")
HeteroData(
  v0={
    x=[18, 2],
    y=[18]
  },
  v1={ x=[19, 3] },
  (v1, e0, v1)={
    edge_index=[2, 154],
    edge_attr=[154, 2]
  },
  (v1, e0, v0)={
    edge_index=[2, 141],
    edge_attr=[141, 2]
  },
  (v0, e0, v0)={
    edge_index=[2, 134],
    edge_attr=[134, 2]
  }
)
[2022/07/29 22:42:05 +0000] [58] [INFO] - adbpyg_adapter: Created ArangoDB 'FakeHetero' Graph
--------------------
URL: https://tutorials.arangodb.cloud:8529
Username: TUTy0d4nq3jcidztw4rf5nyy
Password: TUTg7njua0hhwpfr1u2m2b2zc
Database: TUTc7mc78w0qlchle9za0opmc
--------------------

View the created graph here: https://tutorials.arangodb.cloud:8529/_db/TUTc7mc78w0qlchle9za0opmc/_admin/aardvark/index.html#graph/FakeHetero

View the original graph below:

FakeHeterogeneous Graph with a user-defined ADBPyG Controller

  • adbpyg_adapter.adapter.pyg_to_arangodb()

Notes

  • The name parameter is used to name your ArangoDB graph.
  • The ADBPyG_Controller is an optional user-defined class for controlling how nodes & edges are handled when transitioning from PyG to ArangoDB. It is interpreted as the alternative to the metagraph parameter.
In [14]:
# Create the PyG graph
pyg_hetero_graph = FakeHeteroDataset(avg_num_nodes=30, edge_dim=2)[0] # 'edge_attr' property

name = "FakeHetero"

db.delete_graph(name, drop_collections=True, ignore_missing=True)

# Create a custom ADBPyG_Controller
class Custom_ADBPyG_Controller(ADBPyG_Controller):
    """ArangoDB-PyG controller.

    Responsible for controlling how nodes & edges are handled when
    transitioning from PyG to ArangoDB.

    You can derive your own custom ADBPyG_Controller.
    """

    def _prepare_pyg_node(self, pyg_node: dict, col: str) -> dict:
        """Optionally modify a PyG node object before it gets inserted into its designated ArangoDB collection.

        :param pyg_node: The PyG node object to (optionally) modify.
        :param col: The ArangoDB collection the PyG node belongs to.
        :return: The PyG Node object
        """
        pyg_node["foo"] = "bar"
        return pyg_node

    def _prepare_pyg_edge(self, pyg_edge: dict, edge_type: tuple) -> dict:
        """Optionally modify a PyG edge object before it gets inserted into its designated ArangoDB collection.

        :param pyg_edge: The PyG edge object to (optionally) modify.
        :param edge_type: The Edge Type of the PyG edge. Formatted
            as (from_collection, edge_collection, to_collection)
        :return: The PyG Edge object
        """
        pyg_edge["bar"] = "foo"
        return pyg_edge

# Instantiate new adapter & create the ArangoDB graph
adb_hetero_graph = ADBPyG_Adapter(db, Custom_ADBPyG_Controller()).pyg_to_arangodb(name, pyg_hetero_graph)

print('\n--------------------')
print("URL: " + con["url"])
print("Username: " + con["username"])
print("Password: " + con["password"])
print("Database: " + con["dbName"])
print('--------------------\n')
print(f"View the created graph here: {con['url']}/_db/{con['dbName']}/_admin/aardvark/index.html#graph/{name}\n")
[2022/07/29 22:42:08 +0000] [58] [INFO] - adbpyg_adapter: Instantiated ADBPyG_Adapter with database 'TUTc7mc78w0qlchle9za0opmc'
[2022/07/29 22:42:10 +0000] [58] [INFO] - adbpyg_adapter: Created ArangoDB 'FakeHetero' Graph
--------------------
URL: https://tutorials.arangodb.cloud:8529
Username: TUTy0d4nq3jcidztw4rf5nyy
Password: TUTg7njua0hhwpfr1u2m2b2zc
Database: TUTc7mc78w0qlchle9za0opmc
--------------------

View the created graph here: https://tutorials.arangodb.cloud:8529/_db/TUTc7mc78w0qlchle9za0opmc/_admin/aardvark/index.html#graph/FakeHetero

ArangoDB to PyG

In [15]:
# Start from scratch! (with a smaller graph)
data = FakeHeteroDataset(
    num_node_types=2,
    num_edge_types=3,
    avg_num_nodes=20,
    avg_num_channels=3,  # avg number of features per node
    edge_dim=2,  # number of features per edge
    num_classes=3,  # number of unique label values
)[0]

adbpyg_adapter.pyg_to_arangodb("FakeHetero", data, overwrite_graph=True, overwrite=True)
[2022/07/29 22:42:10 +0000] [58] [INFO] - adbpyg_adapter: Created ArangoDB 'FakeHetero' Graph
Out[15]:
<Graph FakeHetero>

Via ArangoDB Graph

  • adbpyg_adapter.adapter.arangodb_graph_to_pyg()

Notes

  • The name parameter in this case must point to an existing ArangoDB graph in your ArangoDB instance.
  • Due to risk of ambiguity, this method does not carry over ArangoDB attributes to PyG.
In [16]:
# Define graph name
graph_name = "FakeHetero"

# Create PyG graph from the ArangoDB graph
pyg_hetero_graph = adbpyg_adapter.arangodb_graph_to_pyg(graph_name)

# You can also provide valid Python-Arango AQL query options to the command above, like such:
# pyg_hetero_graph = adbpyg_adapter.arangodb_graph_to_pyg(graph_name, ttl=1000, stream=True)
# See the full parameter list at https://docs.python-arango.com/en/main/specs.html#arango.aql.AQL.execute

# Show graph data
print('\n--------------------')
print(pyg_hetero_graph)
[2022/07/29 22:42:10 +0000] [58] [INFO] - adbpyg_adapter: Created PyG 'FakeHetero' Graph
--------------------
HeteroData(
  v0={},
  v1={},
  (v0, e0, v0)={ edge_index=[2, 146] }
)

Via ArangoDB Collections

  • adbdpyg_adapter.adapter.arangodb_collections_to_pyg()

Notes

  • The name parameter is purely for documentation purposes in this case.
  • The vertex_collections & edge_collections parameters must point to existing ArangoDB collections within your ArangoDB instance.
  • Due to risk of ambiguity, this method does not carry over ArangoDB attributes to PyG.
In [17]:
# Define collection names
v_cols = {"v0", "v1"}
e_cols = {"e0"}

# Create PyG graph from the ArangoDB collections
pyg_hetero_graph = adbpyg_adapter.arangodb_collections_to_pyg("FakeHetero", v_cols, e_cols)

# Show graph data
print('\n--------------------')
print(pyg_hetero_graph)
[2022/07/29 22:42:10 +0000] [58] [INFO] - adbpyg_adapter: Created PyG 'FakeHetero' Graph
--------------------
HeteroData(
  v1={},
  v0={},
  (v0, e0, v0)={ edge_index=[2, 146] }
)

Via ArangoDB-PyG metagraph 1

  • adbdpyg_adapter.adapter.arangodb_to_pyg()

Notes

  • The name parameter is purely for documentation purposes in this case.
  • The metagraph parameter is an object defining vertex & edge collections to import to PyG, along with collection-level specifications to indicate which ArangoDB attributes will become PyG features/labels. It should contain collections & associated document attributes names that exist within your ArangoDB instance.
In [18]:
# Define the Metagraph that transfers ArangoDB attributes "as is",
# meaning the data is already formatted to PyG data standards
metagraph_v1 = {
    "vertexCollections": {
        # we instruct the adapter to create the "x" and "y" tensor data from the "x" and "y" ArangoDB attributes
        "v0": { "x": "x", "y": "y"},  
        "v1": {"x": "x"},
    },
    "edgeCollections": {
        "e0": {"edge_attr": "edge_attr"},
    },
}

# Create PyG Graph
pyg_hetero_graph = adbpyg_adapter.arangodb_to_pyg("FakeHetero", metagraph_v1)

# Show graph data
print('\n--------------------')
print(pyg_hetero_graph)
[2022/07/29 22:42:11 +0000] [58] [INFO] - adbpyg_adapter: Created PyG 'FakeHetero' Graph
--------------------
HeteroData(
  v0={
    x=[19, 2],
    y=[19]
  },
  v1={ x=[16, 2] },
  (v0, e0, v0)={
    edge_index=[2, 146],
    edge_attr=[146, 2]
  }
)

Via ArangoDB-PyG metagraph 2

Package methods used

  • adbdpyg_adapter.adapter.arangodb_to_pyg()

Important notes

  • The name parameter is purely for documentation purposes in this case.
  • The metagraph parameter is an object defining vertex & edge collections to import to PyG, along with collection-level specifications to indicate which ArangoDB attributes will become PyG features/labels. In this example, we rely on user-defined encoders to build PyG-ready tensors (i.e feature matrices) from ArangoDB attributes. See https://pytorch-geometric.readthedocs.io/en/latest/notes/load_csv.html for an example on using encoders with PyG.
In [19]:
# Define the Metagraph that transfers attributes via user-defined encoders
metagraph_v2 = {
    "vertexCollections": {
        "Movies": {
            "x": {  # Build a feature matrix from the "Action" & "Drama" document attributes
                "Action": IdentityEncoder(dtype=torch.long),
                "Drama": IdentityEncoder(dtype=torch.long),
            },
            "y": "Comedy",
        },
        "Users": {
            "x": {
                "Gender": CategoricalEncoder(), # CategoricalEncoder(mapping={"M": 0, "F": 1}),
                "Age": IdentityEncoder(dtype=torch.long),
            }
        },
    },
    "edgeCollections": {
        "Ratings": {
            "edge_weight": "Rating"
        }
    },
}

# Create PyG Graph
pyg_imdb_graph = adbpyg_adapter.arangodb_to_pyg("IMDB", metagraph_v2)

# Show graph data
print('\n--------------------')
print(pyg_imdb_graph)
[2022/07/29 22:42:13 +0000] [58] [INFO] - adbpyg_adapter: Created PyG 'IMDB' Graph
--------------------
HeteroData(
  Movies={
    x=[1682, 2],
    y=[1682]
  },
  Users={ x=[943, 2] },
  (Users, Ratings, Movies)={
    edge_index=[2, 65499],
    edge_weight=[65499]
  }
)

Via ArangoDB-PyG metagraph 3

  • adbdpyg_adapter.adapter.arangodb_to_pyg()

Notes

  • The name parameter is purely for documentation purposes in this case.
  • The metagraph parameter is an object defining vertex & edge collections to import to PyG, along with collection-level specifications to indicate which ArangoDB attributes will become PyG features/labels. In this example, we rely on user-defined functions to handle ArangoDB attribute to PyG feature conversion.
In [20]:
# Define the metagraph that transfers attributes via user-defined functions
def udf_v0_x(v0_df):
    # process v0_df here to return v0 "x" feature matrix
    # v0_df["x"] = ...
    return torch.tensor(v0_df["x"].to_list())


def udf_v1_x(v1_df):
    # process v1_df here to return v1 "x" feature matrix
    # v1_df["x"] = ...
    return torch.tensor(v1_df["x"].to_list())


metagraph_v3 = {
    "vertexCollections": {
        "v0": {
            "x": udf_v0_x,  # supports named functions
            "y": lambda df: torch.tensor(df["y"].to_list()),  # also supports lambda functions
        },
        "v1": {"x": udf_v1_x},
    },
    "edgeCollections": {
        "e0": {"edge_attr": (lambda df: torch.tensor(df["edge_attr"].to_list()))},
    },
}

# Create PyG Graph
pyg_hetero_graph = adbpyg_adapter.arangodb_to_pyg("FakeHetero", metagraph_v3)

# Show graph data
print('\n--------------------')
print(pyg_hetero_graph)
[2022/07/29 22:42:13 +0000] [58] [INFO] - adbpyg_adapter: Created PyG 'FakeHetero' Graph
--------------------
HeteroData(
  v0={
    x=[19, 2],
    y=[19]
  },
  v1={ x=[16, 2] },
  (v0, e0, v0)={
    edge_index=[2, 146],
    edge_attr=[146, 2]
  }
)

Recommend

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK