I am working with a dataset that has purchases per date (called ItemNum) on multiple dates across 2800 individuals. Each Item is given its own line, so if an individual has purchased two items on a date, that date will appear twice. I don't care how many items were purchased on a date (with each date representing one trip), but rather the mean number of trips made across the 2800 individuals (For about 18230 lines of data). My data looks like this:
+---+----------+-------+---------------------- ---+
|ID | Date |ItemNum| ItemDescript |
| 1 |01/22/2010| 1 |Description of the item |
| 1 |01/22/2010| 2 |Description of other item |
| 1 |07/19/2013| 1 | |
| 2 |06/04/2012| 1 | |
| 2 |02/02/2013| 1 | |
| 2 |11/13/2013| 1 | |
+---+----------+-------+---------------------- ---+
In the above table, person 1 made two trips and three item purchases (because two dates are shown), person 2 made three trips. I am interested in the average number of trips across all people, but first I need to collapse it down to unique dates. So I know I need to collapse
on the date, but when I do
collapse (mean) ItemNum (first) Date, by(ID)
it just takes the first date that the ID shows up, not the first occurrence of each unique date.
The next issue is that once it's collapsed, I need to take the mean of the count of the dates, not the date itself, which is also where I seem to be getting tripped up.
collapse
. – Nick Cox