I have an Rdd "labResults" of objects:
case class LabResult(patientID: String, date: Long, labName: String, value: String)
I want to transform this rdd such that it includes only one row for each patientID and labName combination. This row should be the latest row for this combination of patientID and labName ( I am interested only in the latest date when a patient had this lab). I do it in this way:
//group rows by patient and lab and take only the last one
val cleanLab = labResults.groupBy(x => (x.patientID, x.labName)).map(_._2).map { events =>
val latest_date = events.maxBy(_.date)
val lab = events.filter(x=> x.date == latest_date)
lab.take(1)
}
Late I want to create the edges from this RDD:
val edgePatientLab: RDD[Edge[EdgeProperty]] = cleanLab
.map({ lab =>
Edge(lab.patientID.toLong, lab2VertexId(lab.labName), PatientLabEdgeProperty(lab).asInstanceOf[EdgeProperty])
})
and I am getting an error:
value patientID is not a member of Iterable[edu.gatech.cse6250.model.LabResult]
[error] Edge(lab.patientID.toLong, lab2VertexId(lab.labName), PatientLabEdgeProperty(lab).asInstanceOf[EdgeProperty]) [error] ^ [error] /hw4/stu_code/src/main/scala/edu/gatech/cse6250/graphconstruct/GraphLoader.scala:94:53: value labName is not a member of Iterable[edu.gatech.cse6250.model.LabResult] [error] Edge(lab.patientID.toLong, lab2VertexId(lab.labName), PatientLabEdgeProperty(lab).asInstanceOf[EdgeProperty]) [error] ^ [error] /hw4/stu_code/src/main/scala/edu/gatech/cse6250/graphconstruct/GraphLoader.scala:94:86: type mismatch; [error] found : Iterable[edu.gatech.cse6250.model.LabResult] [error] required: edu.gatech.cse6250.model.LabResult [error] Edge(lab.patientID.toLong, lab2VertexId(lab.labName), PatientLabEdgeProperty(lab).asInstanceOf[EdgeProperty])
So, it looks like the problem is that "cleanLab" is nor a RDD of LabResult as I expected, but an RDD of Iterable[edu.gatech.cse6250.model.LabResult]
How could I fix it?
lab.head
instead oflab.take(1)
. – Shaido