How to model a schema for a graph database?

Question

I am looking for a general aproach to realize a restrictional schema on a graph database.

I am familiar with relational database systems where one defines tables with specific columns. All incoming data is then stored exactly like one modeled in the schema of the database. All incoming data is automatically validated, i.e. one cannot store a record with missing required column values.

Graph databases, like neo4j, allow unrestricted and freestyle storage of nodes and relations. I am wondering if there is something like a schema for graph databases. I am looking for an established notation / definition or general aproach for modelling restrictive schemas in graph databases, which corresponds to table schema definitions in relational table based databases.

An example restriction could be: A node representing a "User" always needs to have at least one relation to a node representing a "Department".

I am not particularly looking for a neo4j way, but rather a general formalism or notation. Does something like this exists?

In meantime I have found a suggestion to add nodes to the database that define a meta model here. But I hope to find answers that can point me to established best practices, research papers or mathematical definitions like, say, A database schema is a subgraph of the overall graph and forms a bipartite graph with the nodes containing the actual data.

Imagine you created a user account in a social network app. Does it require you that you should have a friend (relationship) upon creating your account? But it requires you to have a password because it is a property rather than a relationship (edge). — jose_bacoy
I just told you the "formal definition" of restrictions is DBMS-dependent & per the DBMS documentation because there is no "formal definition" of "graph DB" or "graph DB schema"--they are generic terms like "tall". If you don't have a specific case in code then you are just asking us to rewrite the documentation. PS I can appreciate that you want to accomplish something that such a "schema" will be used for but you don't make that clear. PS Please don't insert EDITs/UPDATEs, just make your post the best presentation as of right now. Adding to something unclear doesn't make it clear. — philipxy
To enforce arbitrary constraints on a non-RDBMS you can always write queries that must return an empty set of violations after every update. (Presumably we relationally model relational metadata & the application & then map those tables & constraints to the non-RDBMS.) A difficulty is that RDBMSs are for generic querying while non-RDBMSs are for specialized querying of specialized structures but checking that structures satisfy invariants is expensive let alone implementing that using a DBMS specialized to applications that aren't invariant checkers. (But we can relationally model anything.) — philipxy
One problem at this point is that graph dbs are still relatively young in terms of standards, and there are multiple graph dbs out there, all by different vendors, most with their own independent query languages, and little in terms of standardization, and even where this is some level of standardization (OpenCypher, Tinkerpop), constraint definition and implementation are often out of that scope and thus still differ. Sure you can likely find papers that present all kinds of formal solutions, but that's irrelevant if they're not implemented or standardized. — InverseFalcon
As for Neo4j in particular, we are aware of the value of these kinds of constraints with respect to relationships between nodes (per type, to always have a relationship, never have a relationship, or something more complex), so I would say it's just a matter of time before these get introduced into the product, it's just a matter of prioritizing. — InverseFalcon

djhallx djhallx · Accepted Answer · 2019-10-18T18:47:51

Not all graph database are "schema-less". Objectivity/DB is an object-oriented and graph database that uses schema. The schema is then used, as you would expect, to constrain the data that is assigned to the fields of the object. Nodes and edges are objects.

Because Objectivity/DB is used in large object and graph databases, the schema is also used to support a "placement model", i.e. where different objects are placed in the database. The placement model speeds retrievals because the system only looks for an object of type X in those placement locations where X's have been stored.

Also, Objectivity/DB supports "schema evolution" where defined schema types can be altered and the database understands how to interpret those alterations on existing data.

How to model a schema for a graph database?

3 Answers