scala/spark different case class based on input data

Question

In spark / scala, how to do data driven instantiation of the case classes?

Explanation: Let's say we have an input dataset of some kind of contracts (e.g. telecom subscriptions) and those contracts need to be somehow evaluated. Input dataset contains values like date of creation, start of the validity of contract, end of validity, some amounts, additional options, family discount etc. which all don't have to be filled (e.g. some contracts don't have additional options)

Does it make sense to model all type of contracts using case classes? So, one input row coming from the dataset could be a contract for fixed line, or mobile number or some other service. Then i'd try to deduct the most details the input row has and instantiate appropriate case class using match? Each of these case classes would have a functions that returns a value of the contract based on this data and some static data coming from elsewhere (a lookup table, maybe k,v map). This function would then be used in a call to dataset 'map'. Better way to do this?

Given that the case classes idea makes sense, each class could also do simulations on the same input data. E.g. what if customer downgrades his internet speed, what would then be estimated income for this contract? So for one input row, i'd have to return 2 new columns: value of the contract and simulated value of the contract. Doing 'what if' scenarios, it could also be that for one input row i do several scenarios (at once?) which would than return several rows (e.g. 1. what if the customer buys something more; 2. what if customer downgrades; 3. what if customer cancels all additional options on the contract).

Is this even the right approach to problem? How to make these evaluations 'data driven' since input values drive which case class it is and configuration/run options drive how many times a 'map' on the dataset should be triggered?

While I find the question interesting, and it also sounds to me as if your approaches could work, it would be nice if you could give some very concrete examples of rows, a few examples of possible contrac types that can be created from those rows, and the concrete results that you expect to see. I think your problem should lead to a very interesting solution, but it would be nice if you invested a little more time into a little toy example to experiment with. — Andrey Tyukin
Would something like this make sense? Kinda did this in a hurry, but have to think of better example. Maybe i'll have to change the industry example from telco to something else. docs.google.com/spreadsheets/d/… — mispp

mispp mispp · Accepted Answer · 2018-06-22T09:14:16

Modeling huge amount of different combination of products into a class hierarchy tree is not pragmatic.

Solution that worked is to have nested classes.

So, from one input row, columns would be grouped into different objects that make sense and those would be data members of the parent class.

I've tried this on banking contracts instead of telecom contracts (as used in the question): if there is a contract for a loan which is delivered in one row in a dataframe, columns of that one row can be grouped into maturity information, interest information etc. Each of these information groups has its own class and methods. Instance of these classes become a data member of the parent Loan class.

This way i could model different interest behavior, maturity behavior etc and just call it in the .map from the Loan object itself.

scala/spark different case class based on input data

1 Answers