0
votes

Can you explain how to work with historical data in ArangoDB?

For example, i have 5 collection:

1. School [Type = Documents] - I keep information about schools. I have 2 school:
- BestSchool
- NotveryBestSchool
2. Class [Type = Documents] - I keep information about class. I have 4 class:
- Best A Class
- Best B Class
- NotveryBest A Class
- NotveryBest B Class
3. Students [Type = Documents] - I keep information about Students:
- Timmi
- Lisa
- Kail
- Bart
4. ClassinSchool [Type = Edge] - I keep information about class in school, where "parents" = "school" and "child" = Class:
_From: School/BestSchool _To: Class/Best A Class
_From: School/BestSchool _To: Class/Best B Class
_From: School/NotveryBestSchool: Class/NotveryBest A Class
_From: School/NotveryBestSchool: Class/NotveryBest B Class
And, finally:
5. StudentsinClass [Type = Edge] -  I keep information about Students in Class, where "parents" = "Students" and "Child" = Class
_From: Students/Timmi _To: Class/Best A Class
_From: Students/Lisa _To: Class/Best B Class
_From: Students/Kail: Class/NotveryBest A Class
_From: Students/Bart: Class/NotveryBest B Class

And here is the case. 2017 Bart studied very well. At the end of the year he was transferred from "NotveryBest B Class" to a new class "NotveryBest A Class". In 2018, he studied even better and his parents decided to transfer him to another school, where he could develop his talents. He was transferred from "NotveryBestSchool" to a new school "BestSchool" and class "Best B Class".

Assumption: Do I understand correctly, in order to track Bart's movement between classes and schools I must add dates to the edge? The main fields in this edges should be StartDates and EndDates? Or maybe his movement I need to store in the parameters of the Bart? And the third option - to make a separate collection "History" and store everything there.

Which option to choose?

1

1 Answers

0
votes

So here is the way I see it:

  • Quick and dirty: Put everything in a history collection. This is easy to do and you can put a GUI on top of this table in no time. However, trying to do analysis on this table in the future will be messy because the multiple joins that will be involved to get usable time based data. Also, if you have a lot of students, this table could grow quite a bit.

  • Student centric: Storing the information in the student object as a property will work well if we have a lot of students and we usually retrieve historical information on a student per student basis most of the time. Doing analysis on the aggregated student body to find patterns will be a bit complex as we will need to get the data from individual students before we can analyze it.

  • Adding dates to edges: This takes a bit more effort to setup initially, but it is the most flexible way to store the data. Plus you could add additional information like reason for the transfer for example. This setup will allow you to do the most analysis on student transfer patterns or timing of student transfers etc...

In the end it all depends on what you are building, but I tend to gravitate toward the third option unless I have specific reasons not to do it that way.