0
votes

I need to find the percentage of recipients who picked at least one call for a given program.

I have this schema. Calls are made to the users as part of certain programs, which fall under a unit of an organization.

My goal is: Of the total number of users who received calls (connected = yes) as part of program X, how many of them picked at least one call (picked = yes) (I need to find the %)

To find the list of recipients who received at least 1 call:

g.V().hasLabel("User").filter(inE().has("connected","yes").filter(outV().in().has("Program","name","X")).count().is(gte(1))).count()

Similarly, to find the list of recipients who picked at least 1 call:

g.V().hasLabel("User").filter(inE().has("picked","yes").filter(outV().in().has("Program","name","X")).count().is(gte(1))).count()

I know I need to take the ratio of these values. I did my research and realized that I need to make two parallel traversals and use math() to find the percentage. I tried the following query:

g.V().hasLabel("User").as("a","b").
math("a/b * 100").
by(filter(inE().has("picked","yes").
filter(outV().in().has("program","name","X")).count().is(gte(1)))
.count()).
by(filter(inE().has("connected","yes").
filter(outV().in().has("program","name","X")).count().is(gte(1)))
.count())

But I get the following error:

Division by zero!
Type ':help' or ':h' for help.
Display stack trace? [yN]

What I think is happening is that this query is generating the percentage for each user (not what I intended) and for users who never received any call under program 'X', the denominator is understandably 0.

How can I write a query that gives me the intended ratio? How can I make the right traversals?

1

1 Answers

1
votes

You don't need two traversals. I assume that "connected" and "picked" are properties of the same edge.

g.withSack(0).
  V().hasLabel("User").
  inE().has("connected","yes").
  sack(sum).
    by(choose(has("picked","yes"),
                constant(1),
                constant(0))).
  outV().in().has("Program","name","X").
  union(sack().sum().project("picked"),
        count().project("called")).
  unfold().
  group().
    by(keys).
    by(select(values)).
  math('picked/called')

To prevent NPEs in case there's no connection, you would replace the math step with:

choose(unfold(),
         math('picked/called'),
         constant(0))

This would make 0 the default/fallback return value.

Jumping over to the modern graph to show an example.

Percentage of persons who are older than 30 and worked on lop

gremlin> g.V().has("name","lop").in().valueMap()
==>[name:[marko],age:[29]]
==>[name:[josh],age:[32]]
==>[name:[peter],age:[35]]
gremlin> g.withSack(0).
......1>   V().hasLabel("person").
......2>   outE("created").
......3>   sack(sum).
......4>     by(choose(outV().has("age",gt(30)),
......5>                 constant(1),
......6>                 constant(0))).
......7>   inV().has("software","name","lop").
......8>   union(sack().sum().project("oldGuys"),
......9>         count().project("total")).
.....10>   unfold().
.....11>   group().
.....12>     by(keys).
.....13>     by(select(values)).
.....14>   math('oldGuys/total')
==>0.6666666666666666