5
votes

I am still thinking on how to translate the recursivity of a Datalog program into SQL, such as

P(x,y) <- Q(x,y).
Q(x,y) <- P(x,z), A(y).

where A/1 is an EDB predicate. This, there is a co-dependency between P and Q. For longer queries, how to solve this problem?

Moreover, is there any system completely implement the translation? If there is, may I know what system or which paper may I refer?

2
The example should be corrected. Note that variable y does not appear in the premises of the second rule, although it is used in the head. Datalog is restricted so that all programs terminate. Are you asking about ways to eliminate recursion to obtain (potentially) a single SQL query equivalent, or are you thinking about implementing recursion in a SQL context, which might be possible (with severe limitations) using stored procedures?hardmath
@hardmath: sorry, typo... thx for the correction. Yes, I would like to implement the recursion in SQL, is it possible?zfm
It is possible, but we are likely limited as to the depth of recursion that can be had directly. For example, MS SQL Server 2008 limits the nesting depth of stored procedure calls to 32. It's probably enough to demonstrate a concept but not for production applications. Note that MS SQL Server does allow for calls to external coded (extended stored procedures) in which the nesting of calls is not monitored.hardmath
You might be interested in an indirect approach to the recursion, in which "datalog" rules are applied repeatedly to infer new consequences, and these can be inserted into one or more tables, until no new deductions are possible.hardmath
@hardmath: that's my idea! where the recursivity will be done on the program, not on the SQL... However, this is still so unmanageable for me if there is a nontrivial dependency...zfm

2 Answers

1
votes

If you adopt an approach of "tabling" previous conclusions and forward-chain reasoning on these to infer new conclusions, no recursive "depth" is required.

Bear in mind that Datalog requires some restrictions on rules and variable that assure finite termination and hence finitely many conclusions. Variables must have a finite range of possible values, for example.

Let's assume your example refers to constants rather than to variables:

P(x,y) <- Q(x,y).
Q(x,y) <- P(x,z), A(y).

One wrinkle is that you want A/1 to be implemented as an extended stored procedure or external code. For that I would propose tabling all the results of calling A on all possible arguments (finitely many). These are after all among the conclusions (provable statements) of your system.

Once that is done the forward-chaining inference proceeds iteratively rather than recursively. At each step consider each rule, applying it with premises (right-hand sides) that are previously obtained (tabled) conclusions if it produces a new conclusion. If no rule produces a new conclusion in the current step, halt. The proof procedure is complete.

In your example the proofs stop after all the A facts are adduced, because there are no conclusions sufficient to apply either rule to get new conclusions.

0
votes

A possible approach is to use recursive CTEs in SQL, which provide the power of transitive closure. Relational algebra + transitive closure = Datalog.