I am trying to use countApproxDistinctByKey in pyspark (1.4 and 1.5) but cannot find it.
countApproxDistinctByKey
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala#L417
Am I missing something or has not been ported / wrapped yet?
Thanks
Nope, hasn't been ported yet. You can only do countApproxDistinct as of 1.5.
countApproxDistinct
Source code for python RDD