Design: A single data receive module accepts one trial of data, composed of 8-bit samples aligned contiguously. This module adds N number of these trials together (by adding N samples across trials together to form summed samples aligned contiguously making up a single summed trial) before sending this final trial summation to the next module for further processing.
Fig. 1: Trial 1[SAMPLE1-1, SAMPLE1-2, SAMPLE 1-3...] + Trial 2[SAMPLE2-1, SAMPLE2-2, SAMPLE 2-3...] = Summed Trial[(SAMPLE1-1+SAMPLE2-1), (SAMPLE1-2+SAMPLE2-2), (SAMPLE1-3+SAMPLE2-3), ...]
Currently in my RTL, I am using a for loop statement within a generate block to instantiate the number of adders (which I design myself and uses just the simple '+' operation) required to add samples across trials, and letting the synthesis tool (Vivado) decide the primitives to use.
I am seeking techniques to use the least # of CLBs and logic resources to perform this addition, whether through optimizations in my RTL, instantiating primitives directly, or others. Any suggestions would be greatly appreciated. Thanks!