1
votes

Is there a way to find also the discarded points from rdp algorithm in python?

The algorithm:

from rdp import rdp

rdp([[1, 1], [2, 2], [3, 3], [4, 4]])

Gives the points remaining after compression:

[[1, 1], [4, 4]]

If I have a large dataset and apply the algorithm, I want to find the discarded points. Is there a way?

1

1 Answers

1
votes

As specified by the documentation, rdp can return a mask of the remaining points, it also provides an interface for numpy arrays.

One solution will be to combined the mask with numpy indexing for retrieving both the remaining and discarded points:

import numpy as np
from rdp import rdp

arr = np.array([[1, 1], [2, 2], [3, 3], [4, 4]])
mask = rdp(arr, return_mask=True)

print("remaining: {}".format(arr[mask]))
print("discarded: {}".format(arr[~mask]))

Output

remaining: [[1 1]
 [4 4]]
discarded: [[2 2]
 [3 3]]

Note

The arr[mask] notation means select those points where the mask is positive, and arr[~mask] select those points where the mask is negative.