I previously asked the question on how to do a countifs in python across multiple data frames, just like you can do countifs on separate worksheets in Excel. somebody gave me a very creative answer:
python pandas countifs using multiple criteria AND multiple data frames
Thank you for that @AlexG--I tried it, and it worked superbly:
import pandas as pd
import numpy as np
import matplotlib as plt
#import the data
students = pd.read_csv("Student Detail stump.csv")
exams = pd.read_csv("Exam Detail stump.csv")
#get data parameters
student_info = students[['Student Number', 'Enrollment Date', 'Detail Date']].values
#prepare an empty list to hold the results
N_exams_passed = []
#count records in data set according to parameters
for s_id, s_enroll, s_qual in student_info:
N_exams_passed.append(len(exams[(exams['Student Number']==s_id) &
(exams['Exam Grade Date']>=s_enroll) &
(exams['Exam Grade Date']<=s_qual) &
(exams['Exam Grade']>=70)])
)
#add the results to the original data set
students['Exams Passed'] = N_exams_passed
HOWEVER, it only worked effectively on small data sets. When I ran the data with 100,000s of rows, it wouldn't even be done overnight. It doesn't seem very pythonic.
The SQL way you can do this in seconds is to use a correlated subquery, like this:
SELECT
s.*,
(SELECT COUNT(e.[Exam Grade])
FROM
exams AS e
WHERE
e.[Exam Grade] >= 65
AND e.[Student Number] = s.[Student Number]
AND e.[Exam Grade Date] >= s.[Enrollment Date]
AND e.[Exam Grade Date] <= s.[Detail Date])
AS ExamsPassed
FROM
students AS s;
How do I reproduce such a correlated subquery in pandas or some other pythonic way?
Here are the data frames:
#Students
Student Number Enroll Date Detail Date
1 1/1/2016 2/1/2016
1 1/1/2016 3/1/2016
2 2/1/2016 3/1/2016
3 3/1/2016 4/1/2016
#Exams
Student Number Exam Date Exam Grade
1 1/1/2016 50
1 1/15/2016 80
1 1/28/2016 90
1 2/5/2016 100
1 3/5/2016 80
1 4/5/2016 40
2 2/2/2016 85
2 2/3/2016 10
2 2/4/2016 100
Final data frame should look like this, with a count of 'Passed Exams' at the end:
#FinalResult
Student Number Enroll Date Detail Date Passed Exams
1 1/1/2016 2/1/2016 2
1 1/1/2016 3/1/2016 3
2 2/1/2016 3/1/2016 2
3 3/1/2016 4/1/2016 0