0
votes

Let's assume that I have PCollection with following format:

-----------------------------------------
|   sale_id |   product_id  |   amount  |
|-----------|---------------|-----------|
|     1     |       a       |     1     |
|-----------|---------------|-----------|
|     2     |       b       |     12    |
|-----------|---------------|-----------|
|     3     |       c       |     3     |
|-----------|---------------|-----------|
|     4     |       d       |     100   |
|-----------|---------------|-----------|
|     5     |       e       |     4     |
-----------------------------------------

My target is to filter only X bestseller records, i.e order by amount, limit X

What is the way to do it in apache beam?

Thanks!

1

1 Answers

3
votes

Update:

BeamSQL[2] supports "ORDER BY LIMIT", if you'd like to try it.


If you are using Java SDK, you can use built-in TOP transform[1] to do ORDER BY LIMIT. TOP transform allows both DESC and ASC.

If you are using a SDK without TOP, you can always refer to TOP's implementation to write your own.

1:https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Top.java

2.https://beam.apache.org/documentation/dsls/sql/overview/