Create list from pandas dataframe

Question

I have a function that takes all, non-distinct, MatchId and (xG_Team1 vs xG_Team2, paired) and gives an output of as an array. which then summed up to be sse constant.

The problem with the function is it iterates through each row, duplicating MatchId. I want to stop this.

For each distinct MatchId I need the corresponding home and away goals as a list. I.e. Home_Goal and Away_Goal to be used in each iteration. from Home_Goal_time and Away_Goal_time columns of the dataframe. The list below doesn't seem to work.

MatchId Event_Id   EventCode        Team1        Team2      Team1_Goals
0   842079  2053    Goal Away    Huachipato  Cobresal       0
1   842079  2053    Goal Away    Huachipato  Cobresal       0
2   842080  1029    Goal Home      Slovan    lava           3
3   842080  1029    Goal Home      Slovan    lava           3
4   842080  2053    Goal Away      Slovan    lava           3
5   842080  1029    Goal Home      Slovan    lava           3
6   842634  2053    Goal Away      Rosario   Boca Juniors   0
7   842634  2053    Goal Away      Rosario   Boca Juniors   0
8   842634  2053    Goal Away      Rosario   Boca Juniors   0
9   842634  2054  Cancel Goal Away Rosario   Boca Juniors   0

    Team2_Goals xG_Team1    xG_Team2    CurrentPlaytime  Home_Goal_Time Away_Goal_Time
0   2       1.79907     1.19893     2616183         0       87
1   2       1.79907     1.19893     3436780         0       115
2   1       1.70662     1.1995      3630545         121     0
3   1       1.70662     1.1995      4769519         159     0
4   1       1.70662     1.1995      5057143         0       169
5   1       1.70662     1.1995      5236213         175     0
6   2       0.82058     1.3465      2102264         0       70
7   2       0.82058     1.3465      4255871         0       142
8   2       0.82058     1.3465      5266652         0       176
9   2       0.82058     1.3465      5273611         0       0

For example MatchId = 842079, Home_goal =[], Away_Goal = [87, 115]

x1 = [1,0,0] 
x2 = [0,1,0] 
x3 = [0,0,1]
m = 1 ,arbitrary constant used to optimise sse.
k = 196
total_timeslot = 196 
Home_Goal = [] # No Goal
Away_Goal = [] # No Goal

def sum_squared_diff(x1, x2, x3, y):
    ssd = []
    for k in range(total_timeslot):  # k will take multiple values
        if k in Home_Goal:
            ssd.append(sum((x2 - y) ** 2))
        elif k in Away_Goal:
            ssd.append(sum((x3 - y) ** 2))
        else:
            ssd.append(sum((x1 - y) ** 2))
    return ssd

def my_function(row):
    xG_Team1 = row.xG_Team1
    xG_Team2 = row.xG_Team2
    return np.array([1-(xG_Team1*m + xG_Team2*m)/k, xG_Team1*m/k, xG_Team2*m/k])

results = df.apply(lambda row: sum_squared_diff(x1, x2, x3, my_function(row)), axis=1)

results
sum(results.sum())

For the three matches above the desire outcome should look like the following. If I need an individual sse, sum(sum_squared_diff(x1, x2, x3, y)) gives me the following.

MatchId =  842079   =  3.984053038520635
MatchId =  842080   =  7.882189570700502
MatchId =  842080   =  5.929085973050213

Given the size of the original data, realistically I am after the total sum of the sse. For the above sample data, simply adding up the values give total sse=17.79532858227135.` Once I achieve this, then I will try to optimise the sse based on this figure by updating the arbitrary value m.

Here are the lists i hoped the function will iterate over.

Home_scored = s.groupby('MatchId')['Home_Goal_time'].apply(list)
Away_scored = s.groupby('MatchId')['Away_Goal_Time'].apply(list)
type(HomeGoal)
pandas.core.series.Series

Then convert it to lists.

Home_Goal = Home_scored.tolist()
Away_Goal = Away_scored.tolist()
type(Home_Goal)
 list

 Home_Goal
Out[303]: [[0, 0], [121, 159, 0, 175], [0, 0, 0, 0]]


Away_Goal 
Out[304]: [[87, 115], [0, 0, 169, 0], [70, 142, 176, 0]]

But the function still takes Home_Goal and Away_Goal as empty list.

Dillon Dillon · Accepted Answer · 2018-06-18T12:46:56

If you only want to consider one MatchId at a time you should .groupby('MatchID') first

df.groupby('MatchID').apply(...)

Create list from pandas dataframe

1 Answers