Reverse Engineering Z3 SMT solver solutions

Question

I am using Z3 named SMT solver to generate a new set of random numbers from a given vector under some constraints. I am doing this in order to hide my input stream. The corresponding code can be found below:

from z3 import *
import sys
import io
import math


X0 = Real('X0')
X1 = Real('X1')
X2 = Real('X2')
X3 = Real('X3')
X4 = Real('X4')
X5 = Real('X5')
X6 = Real('X6')
X7 = Real('X7')
X8 = Real('X8')
X9 = Real('X9')
X10 = Real('X10')
X11 = Real('X11')
X12 = Real('X12')
X13 = Real('X13')
X14 = Real('X14')

DistinctParameter = [Distinct(X0 , X1 , X2 , X3 , X4 , X5 , X6 , X7 , X8 , X9 , X10 , X11 , X12 , X13 , X14 )]

maxPossibleValue = max(InputStream)

AggregateValue = 0
for x in InputStream:
    AggregateValue = AggregateValue + float(x)

S_Con_Comparison1 = [(X0 < maxPossibleValue)] 
S_Con_Comparison2 = [(X1 < maxPossibleValue)] 
S_Con_Comparison3 = [(X2 < maxPossibleValue)] 
S_Con_Comparison4 = [(X3 < maxPossibleValue)] 
S_Con_Comparison5 = [(X4 < maxPossibleValue)] 
S_Con_Comparison6 = [(X5 < maxPossibleValue)] 
S_Con_Comparison7 = [(X6 < maxPossibleValue)] 
S_Con_Comparison8 = [(X7 < maxPossibleValue)] 
S_Con_Comparison9 = [(X8 < maxPossibleValue)] 
S_Con_Comparison10 = [(X9 < maxPossibleValue)] 
S_Con_Comparison11 = [(X10 < maxPossibleValue)] 
S_Con_Comparison12 = [(X11 < maxPossibleValue)] 
S_Con_Comparison13 = [(X12 < maxPossibleValue)] 
S_Con_Comparison14 = [(X13 < maxPossibleValue)] 
S_Con_Comparison15 = [(X14 < maxPossibleValue)] 


S_Con_Comparison = S_Con_Comparison1 + S_Con_Comparison2 + S_Con_Comparison3 + S_Con_Comparison4 + S_Con_Comparison5 + S_Con_Comparison6 + S_Con_Comparison7 + S_Con_Comparison8 + S_Con_Comparison9 + S_Con_Comparison10 + S_Con_Comparison11 + S_Con_Comparison12 + S_Con_Comparison13 + S_Con_Comparison14 + S_Con_Comparison15

S_Con = [( X0 + X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 + X10 + X11 + X12 + X13 + X14 == AggregateValue)]

Solve = S_Con + S_Con_Comparison + DistinctParameter

s = Solver()

s.add(Solve)


x = Reals('x')
i = 0 
output =[0] * len(InputStream)

if s.check() == sat:
    m = s.model()
    for d in m.decls():
        location = int((repr(d).replace("X","")))
        x=round(float(m[d].numerator_as_long())/float(m[d].denominator_as_long()),5)
        output[location]= x

print(output)

Each of the values of the input stream can be taken from a possible set of size 2^25. As per my understanding, the only way to find the input stream is to do a brute force on the resulted stream. Given this circumstances, I want to know if it is possible to reverse engineer the input stream from the corresponding output stream.

Please post a simplified version of your code (perhaps only a few values?) and something that people can run on their own. The above is not a self-contained program. See here for guidelines on how to post to stack-overflow: stackoverflow.com/help/minimal-reproducible-example — alias
The thing to consider is: What makes you think the input stream is uniquely determined from your constraints? I see nothing in your description that would guarantee that. So, while you'd find an input-stream, it'd probably not be uniquely determined. You really have to be more specific about what you are trying to achieve here. — alias
I wouldn't rely on an SMT solver to generate anything even remotely random since they tend to be rather deterministic.. the randomness of an SMT solver is generally limited to an old-fashioned pseudo-random generator which is normally initialized with the same seed each time so as to make it easier to reproduce actual bugs. — Patrick Trentin
Is your input stream always larger or equal zero? Is it acceptable for the output stream to include negative values? can the attacker fiddle with your python script? The float -> Real -> float passages are going to introduce tiny but unavoidable errors.. are these acceptable for your application? — Patrick Trentin
You should probably edit your question to include the information you put in the comments (these may be deleted by S.O. at any time) and also to clarify the question in the same way you did in the comments. — Patrick Trentin

Patrick Trentin Patrick Trentin · Accepted Answer · 2020-03-27T12:55:04

As mentioned in the comments, SMT solvers should not be entrusted with the task of generating truly random models. Having said this, it doesn't look like you need such property to be guaranteed for your application.

I fixed your model so as to impose X_i >= 0, since this is a requirement in the comments.

from z3 import *
import sys
import io
import math

def obfuscate(input_stream):
    X_list = [Real('X_{0}'.format(idx)) for idx in range(0, len(input_stream))]
    assert len(X_list) == len(input_stream)

    max_input_value = max(input_stream)
    aggregate_value = sum(input_stream)

    distinct_cs = Distinct(X_list)
    lower_cs = [(0 <= Xi) for Xi in X_list]
    upper_cs = [(Xi < max_input_value) for Xi in X_list]
    same_sum_cs = (Sum(X_list) == aggregate_value)

    s = Solver()
    s.add(distinct_cs)
    s.add(lower_cs)
    s.add(upper_cs)
    s.add(same_sum_cs)

    status = s.check()

    if status == sat:
        r_ret = []
        fp_ret = []

        m = s.model()
        for Xi in X_list:
            r_value = m.eval(Xi)
            r_ret.append(r_value)

            num = r_value.numerator_as_long()
            den = r_value.denominator_as_long()

            fp_value = round(float(num) / float(den), 5)
            fp_ret.append(fp_value)

        return input_stream, aggregate_value, "sat", r_ret, fp_ret, sum(fp_ret)

    else:
        return input_stream, aggregate_value, "unsat", None, None, None


if __name__ == '__main__':
    print("Same-value inputs are all unsat")
    print(obfuscate([0.0, 0.0, 0.0]))
    print(obfuscate([1.0, 1.0, 1.0]))
    print(obfuscate([2.0, 2.0, 2.0]))

    print("\nRe-ordering input does not change output")
    print(obfuscate([1.0, 2.0, 3.0]))
    print(obfuscate([1.0, 3.0, 2.0]))
    print(obfuscate([3.0, 2.0, 1.0]))
    print("")
    print(obfuscate([0.1, 0.0, 0.0]))
    print(obfuscate([0.0, 0.1, 0.0]))
    print(obfuscate([0.0, 0.0, 0.1]))

    print("\nSame-sum input do not necessarily map to the same outputs")
    print(obfuscate([0.1, 0.9, 2.0]))
    print(obfuscate([1.1, 0.1, 1.8]))

    print("\nSame outputs may result from different inputs")
    print(obfuscate([0.6, 1.3, 1.1]))
    print(obfuscate([1.3, 0.7, 1.0]))

The output is:

Same-value inputs are all unsat
([0.0, 0.0, 0.0], 0.0, 'unsat', None, None, None)
([1.0, 1.0, 1.0], 3.0, 'unsat', None, None, None)
([2.0, 2.0, 2.0], 6.0, 'unsat', None, None, None)

Re-ordering input does not change output
([1.0, 2.0, 3.0], 6.0, 'sat', [5/2, 11/4, 3/4], [2.5, 2.75, 0.75], 6.0)
([1.0, 3.0, 2.0], 6.0, 'sat', [5/2, 11/4, 3/4], [2.5, 2.75, 0.75], 6.0)
([3.0, 2.0, 1.0], 6.0, 'sat', [5/2, 11/4, 3/4], [2.5, 2.75, 0.75], 6.0)

([0.1, 0.0, 0.0], 0.1, 'sat', [1/30, 1/15, 0], [0.03333, 0.06667, 0.0], 0.09999999999999999)
([0.0, 0.1, 0.0], 0.1, 'sat', [1/30, 1/15, 0], [0.03333, 0.06667, 0.0], 0.09999999999999999)
([0.0, 0.0, 0.1], 0.1, 'sat', [1/30, 1/15, 0], [0.03333, 0.06667, 0.0], 0.09999999999999999)

Same-sum input do not necessarily map to the same outputs
([0.1, 0.9, 2.0], 3.0, 'sat', [4/3, 5/3, 0], [1.33333, 1.66667, 0.0], 3.0)
([1.1, 0.1, 1.8], 3.0, 'sat', [7/5, 8/5, 0], [1.4, 1.6, 0.0], 3.0)

Same outputs may result from different inputs
([0.6, 1.3, 1.1], 3.0, 'sat', [23/20, 49/40, 5/8], [1.15, 1.225, 0.625], 3.0)
([1.3, 0.7, 1.0], 3.0, 'sat', [23/20, 49/40, 5/8], [1.15, 1.225, 0.625], 3.0)

This simple example allows us to make the following observations:

the output is determined by the values in input, but it is not affected by their order
the obfuscation procedure can be sensitive to variations in the input stream

Therefore, even if an attacker attempts to use rainbow tables to find the potential input multiset that generated an output sequence, they still cannot find the exact order of the values in the input stream.

Let's disregard the fact that building such rainbow tables is impractical due to the large number of input sequences of size 15 that can be generated with a pool of 2^25 candidate values (a loose upper-bound would be 2^375), and assume that we have a way to access it efficiently.

Given an output sequence O, generated with obfuscate(), we can look for a match M inside our rainbow table, where M is a list of multisets that, when used as input, would result in the same output O. Let M[i] be the i-th input set in M containing n elements, each with multiplicity m_i. Then the number of possible permutations of M[i] is (source: Wikipedia):

In the simplest scenario in which every value in the input stream is different from the others, there are up to 15! = 1.307.674.368.000 permutations for each candidate solution M[i] in the match M. In your application, would the attacker have the time to try all of them?

Reverse Engineering Z3 SMT solver solutions

1 Answers