I assume your adder has been implied via sum = A + B;. For area optimisation why do you not share a single adder unit. A+B in CLK1, SUM+C in CLK2, SUM+D in CLK3. Then you have nothing to disable or clock gate.
The majority of power is used when values change, so zeroing inputs when not used can actually increase power by creating unnecessary toggles. As adders are combinatorial logic all we can do to save power for a given architecture is hold values stable, this could be done through the use of clock gate cells controlling/sequencing input and output flip-flops clks.
Update
With the information that a new calculation may be required every clock cycle, and there is an enable signal called start. Th question made reference to adding them serially ie :
sum1 = A + B;
sum2 = sum1 + C;
sum3 = sum2 + D;
Since the result is calculated potentially every clock cycle they are all on or all off. The given serialisation (which is all to be executed in parallel) has 3 adders stringed together (ripple path of 3 adders). if we refactor to :
sum1 = A + B;
sum2 = C + D;
sum3 = sum1 + sum2;
Or ripple path is only 2 adders deep allowing a quicker settling time, which implies less ripple or transients to consume power.
I would be tempted to do this all on 1 line and allow the synthesis tool to optimise it.
sum3 = A + B + C + D;
For power saving I would turn on auto clock gating when synthesising and use a structure that worked well with this technique:
always @(posedge clk or negedge rst_n) begin
if (~rst_n) begin
sum3 <= 'b0;
end
else begin
if (start) begin //no else clause, means this signal can clk gate the flop
sum3 <= A + B + C + D;
end
end
end