Context:
I have a data set for the weights of truck and trailer combinations coming into my site over the span of a few years. I have organized my data by seasons as I am trying to prove that the truck:trailers in winter are noticeably heavier due to ice, snow, and mud. The theory is, if the tare weight is higher in this season (the weight of the truck after it empties its load) than its Avg tare weight (which I need to calculate from the data) it can be deduced that the truck:trailer combinations are coming in with extra weight that we pay for in part as some snow/ice/mud falls off in the trailer emptying process.
What I've done so far:
- I've defined a custom date range for my seasons
- I've grouped Truck:Trailer by: count to get a duplicates column and, all rows to keep all my details
- I've filtered out every combination I've seen less than 50 times, as i want good representation for each truck:trailer combo so that I can better emphasize repeated patterns
- I've added an index column to better keep track of the individuals before expanding the details
What I need to do:
- I only want to work with truck:trailer combinations which have weighed in for all four seasons at least once
- I need to find the average tare weight of the truck:trailer combinations based over the extended range for both summer and autumn (the dry time of the year) while preserving the raw tare data for all seasons, as I need to eventually compare the winter tare values to this average.
When I'm finished I'd like the data to look something like this