0
votes

I've been trying to create my first star schema based on Google Classroom data for a week. I put a description of the tables from my most recent attempt below. I didn't list descriptive fields not relevant to my question.

I have a table visual that shows CourseName, StudentsEnrolled (it works)

StudentsEnrolled = CALCULATE(DISTINCTCOUNT(gc_FactSubmissions[StudentID]))

I am trying to create a table visual that shows StudentName, CourseWorkTitle, PointsEarned, MarkPct.

MarkPct = divide(sum(gc_FactSubmissions[PointsEarned]),sum(gc_DimCourseWork[MaxPoints]))

When I try to add StudentName to the visual, I end up with incorrect results (some blank student names and incorrect totals). When I try to use DAX Related(), I can only select fields in the Submissions table.

I’ve spent countless hours of Googling sites/pages like the following one and others: https://exceleratorbi.com.au/the-optimal-shape-for-power-pivot-data/

I think the problem is the gc_DimStudents table because it contains a student record for every student that is enrolled in a gc_DimCourses. Not all students enrolled have submitted assignments, so if I limited the gc_DimStudents to only the StudentIDs in gc_FactSubmissions, then I won’t be able to get a count of StudentsEnrolled in courses.

I’m not sure how to resolve this. Should gc_DimCourses also be made into a fact table? With a gc_DimCourseStudents and a gc_DimSubmissionStudents? Then I’d have to create a surrogate key to join gc_FactSubmissions to the new gc_FactCourses? If that is true, then as I add more fact tables to my model, is it normal to have many DimAnotherStudentList in many places in a Star Schema model?

I want to keep building on this star schema because we want reports/dashboards that relate things like online marks, to attendance, to disciplinary actions, etc., etc. So I want to get the relationships correct this time.

Any help is very much appreciated.

Thanks, JMC


gc_FactSubmissions (contains one record for every combination of the 4 ID fields, no blanks)

CourseID (many to 1 join to gc_Dimcourses.CourseID )

OwnerID (many to 1 join to gc_DimOwners.OwnerID)

CourseWorkID (many to 1 join to gc_DimCourseWork.CourseWorkID)

StudentID (many to 1 join to gc_DimStudents)

SubmissionID

SubmissionName

PointsEarned (int, default to sum)

(other descriptive fields)


gc_DimCourseWork (one CourseWorkID for each gc_FactSubmissions.CourseWorkID)

CourseWorkID (it is distinct, no blanks)

CourseWorkName

MaxPoints (int, default to sum)

(other descriptive fields)


gc_DimCourses (one CourseID for each gc_FactSubmissions.Course CourseID)

CourseID (it is distinct, no blanks)

CourseName

(other descriptive fields)


gc_DimOwners (one OwnerID for each gc_DimOwners.OwnerID)

OwnerID (it is distinct, no blanks)

OwnerName

(other descriptive fields)


gc_DimStudents (one StudentID for each gc_FactSubmissions.Course CourseID)

StudentID (distinct, no blanks)

StudentName

(other descriptive fields)

1
Have you missed the StudentUserID column from your description of the fact table? Can a student submit more than once for the same piece of coursework? Can a student be enrolled on a course but have yet to submit any coursework? If they can then this fact table won't show these students and your calculation will be off. If you want to report on enrolments then why not create an enrolment fact table?NickW
Yes, I'd forgotten to list it. I just added it and made it consistent (all called StudentID). A student can have multiple course work for each course, but only one submission for each course work. Yes, a student can be enrolled in a course, but have not yet submitted any course work for the course.bpnit
If I add a FactEnrollment table, would it contain only CourseID and StudentID and be joined to DimCourses using CourseID? Would that make it a snowflake schema instead of a star schema?bpnit

1 Answers

0
votes

A Snowflake Schema is one where Dimensions are related to each other directly rather than via a Fact table - so no, adding another fact table to your model doesn't make it a Snowflake.

An Enrolment fact would have FKs to any Dimensions that are relevant to Enrolments - so Course, Student, probably at least 1 date and whatever other enrolment attributes there may be.

As an additional comment, while there are many incorrect ways of modelling a star schema there can also be many correct ways of modelling it: there is rarely one correct answer. For example, for your Submissions Star you could denormalise your Course data into your CourseWork Dim and possibly also include the Owner data (I assume Owner is Owner of the course?). The fewer joins there are in any query the better the performance. If another fact, such as Enrolment, needed to be related to a Course Dim (rather than to Coursework) then you'd need to consider the trade-off in performance of having fewer joins to one fact and having to maintain the course data in two different Dims (Course and Coursework).

As a star schema is denormalised there is no issue with the same data appearing in multiple tables (within reason). The most common example is a Date Dim that has date, week, month and year attributes and a Month Dim that has just month and year attributes.