0
votes

I'm searching for an answer for the following problem:

I want to create a numpy array where all the intercepts and slopes are stored. The slope is the increase of means over the years. I have found multiple ways to calculate the intercept/slope, but I really miss the link to get them in a new array (I'm new to Numpy so the logic is slowly getting there but I've now been stuck for a day..)

So.. I have an array that is structured like this:

x = np.array([(2000, 'A', '1',5), (2001, 'A', '1', 10),
              (2003, 'A', '1',15), (2004, 'A', '1', 20),
              (2000, 'A', '2',1), (2001, 'A', '2', 2),
              (2002, 'A', '2', 3), (2003, 'A', '2', 4)],
             dtype=[('year', 'i4'), ('group1', 'U2'), ('group2', 'U2'), ('means', 'i2')])

And I would like to end up with an array like this:

>desired_array
array([('A', '1', 5, 5), 
       ('A', '2', 1, 1)],
       dtype=[('group1', '<U2'), ('group2', '<U2'), ('intercept', '<i2'), ('slope', '<i2')])

I have gotten to this point:

ans, indices = np.unique(x[['group1', 'group2']], return_inverse=True)
desired_array = np.empty(2, dtype=[('group1', 'U2'), ('group2', 'U2'), ('intercept', '<f8'),
                                   ('slope', '<f8')])
desired_array['group1'] = ans['group1']
desired_array['group2'] = ans['group2']
x = x[x['year'] == 2000]
desired_array['intercept'] = x['means']

it's a bit rough which I can still improve but the main question for me where I get stuck is how to add the slope per regression line to the array.

Would be great is someone could help me out :)

1
What's desired_array['a'] at the end? It doesn't have any 'a' field as I seeMercury
ah yes, my mistake, I edited into 'intercept'. also note: I do now that I need to adjust the intercept calculation, this is just a quick example, the main problem is adding the slopeLotw

1 Answers

1
votes

You can simply calculate your slopes and intercepts in lists and add them in.

x = np.array([(2000, 'A', '1',5), (2001, 'A', '1', 10),
              (2002, 'A', '1',15), (2003, 'A', '1', 20),
              (2000, 'A', '2',1), (2001, 'A', '2', 2),
              (2002, 'A', '2', 3), (2003, 'A', '2', 4)],
             dtype=[('year', 'i4'), ('group1', 'U2'), ('group2', 'U2'), ('means', 'i2')])

Note that I've changed the year value of rows 3 and 4 to 2002 & 2003 as opposed to 2003 & 2004, as it wouldn't be a straight line then. I'm considering years as the x axis and means as the y axis in this example. Naturally then slope, m = (y2-y1)/(x2-x1) and the intercept would be c = y - m*x for any (x,y) pair in the corresponding line. Store the slope and intercept in two lists while going through each unique group pair.

unique_groups = np.unique(x[['group1', 'group2']])

slopes, intercepts = [],[]
for group in unique_groups:
    current_group = x[x[['group1', 'group2']]==group]
    x_g = current_group['year']
    y_g = current_group['means']
    slope = (y_g.max()-y_g.min())/(x_g.max()-x_g.min())
    intercept = y_g[0]-slope*x_g[0]
    slopes.append(slope)
    intercepts.append(intercept)

Simply plug in the calculated values into the desired array.

desired_array = np.empty(len(unique_groups), dtype=[('group1', 'U2'), ('group2', 'U2'), ('intercept', '<f8'),
                                   ('slope', '<f8')])
desired_array['group1'] = unique_groups['group1']
desired_array['group2'] = unique_groups['group2']
desired_array['intercept'] = intercepts
desired_array['slope'] = slopes