1
votes

I'm trying to implement Gamma Distribution in apache beam. First,I'm reading a CSV file CSV file using the TextIO class of Apache beam :

Pipeline p = Pipeline.create();  
p.apply(TextIO.read().from("gs://path/to/file.csv"));  

After that I apply a transform that will parse each row in the CSV file and return an object. Here only I'm trying perform Gamma Distribution operation :

.apply(ParDo.of(new DoFn<String, Entity>() {
@ProcessElement
public void processElement(ProcessContext c) {
    String[] strArr = c.element().split(",");
    ClassxNorms xn = new ClassxNorms();
    xn.setDuration(Double.parseDouble(strArr[0]));
    xn.setAlpha(Double.parseDouble(strArr[1]));
    xn.setBeta(Double.parseDouble(strArr[2]));
    GammaDistribution gdValue = new GammaDistribution(Double.parseDouble(strArr[0]), Double.parseDouble(strArr[1]), Double.parseDouble(strArr[2]));
    System.out.println("gdValue : " + gdValue);
    c.output(xn);
}
}));

I'm Creating a beamRecord and in the next step I'm converting the beam record into string to write the final output to Google storage :

PCollection<String> gs_output_final = xnorm_trig.apply(ParDo.of(new DoFn<BeamRecord, String>() {
                    private static final long serialVersionUID = 1L;
                    @ProcessElement
                    public void processElement(ProcessContext c) {
                        c.output(c.element().toString());
                        System.out.println(c.element().toString());
                    }
                })); 
   gs_output_final.apply(TextIO.write().to("gs://output/op_1/Q40test111"));  

I'm getting the output but gamma distribution operation is not getting implemented. Any help will be really appreciate.

1
What is ClassxNorms doing? Also you are creating your gamma distribution with gdValue, however, I don't see you passing it to the next step. NOTE: in google cloud, printing to screen is not preserved if you run it in dataflow unless you use a logger or use direct-runner. - Haris Nadeem

1 Answers

1
votes

I was able to implement gamma distribution in apache beam. Below is the code snippet for reference :

.apply(ParDo.of(new DoFn<String, ClassxNorms>() { 
    @ProcessElement
    public void processElement(ProcessContext c) throws ParseException {
      String[] strArr = c.element().split(",");
      ClassxNorms xn = new ClassxNorms();
      double sample = new GammaDistribution(Double.parseDouble(strArr[11]), Double.parseDouble(strArr[12])).cumulativeProbability(Double.parseDouble(strArr[6]));
      xn.setDuration(Double.parseDouble(strArr[6]));
      xn.setAlpha(Double.parseDouble(strArr[11]));
      xn.setBeta(Double.parseDouble(strArr[12]));
      xn.setVolume(Double.parseDouble(strArr[13]));
      xn.setSpend(Double.parseDouble(strArr[14]));
      xn.setEfficiency(Double.parseDouble(strArr[15]));
      xn.setXnorm(Double.parseDouble(strArr[16]));
      xn.setKey(strArr[17]);
      xn.setGamma(sample);
      c.output(xn);
    }
  }));