3
votes

There have been other questions and answers on this site suggesting that, to create an echo or delay effect, you need only add one audio sample with a stored audio sample from the past. As such, I have the following Java class:

public class DelayAMod extends AudioMod {

private int delay = 500;
private float decay = 0.1f;
private boolean feedback = false;

private int delaySamples;
private short[] samples;
private int rrPointer;

@Override
public void init() {
    this.setDelay(this.delay);
    this.samples = new short[44100];
    this.rrPointer = 0;
}

public void setDecay(final float decay) {
    this.decay = Math.max(0.0f, Math.min(decay, 0.99f));
}

public void setDelay(final int msDelay) {
    this.delay = msDelay;
    this.delaySamples = 44100 / (1000/this.delay);
    System.out.println("Delay samples:"+this.delaySamples);
}

@Override
public short process(short sample) {
    System.out.println("Got:"+sample);
    if (this.feedback) {
        //Delay should feed back into the loop:
        sample = (this.samples[this.rrPointer] = this.apply(sample));
    } else {
        //No feedback - store base data, then add echo:
        this.samples[this.rrPointer] = sample;
        sample = this.apply(sample);
    }
    ++this.rrPointer;
    if (this.rrPointer >= this.samples.length) {
        this.rrPointer = 0;
    }
    System.out.println("Returning:"+sample);
    return sample;
}

private short apply(short sample) {
    int loc = this.rrPointer - this.delaySamples;
    if (loc < 0) {
        loc += this.samples.length;
    }
    System.out.println("Found:"+this.samples[loc]+" at "+loc);
    System.out.println("Adding:"+(this.samples[loc] * this.decay));
    return (short)Math.max(Short.MIN_VALUE, Math.min(sample + (int)(this.samples[loc] * this.decay), (int)Short.MAX_VALUE));
}
}

It accepts one 16-bit sample at a time from an input stream, finds an earlier sample, and adds them together accordingly. However, the output is just horrible noisy static, especially when the decay is raised to a level that would actually cause any appreciable result. Reducing the decay to 0.01 barely allows the original audio to come through, but there's certainly no echo at that point.

Basic troubleshooting facts:

  • The audio stream sounds fine if this processing is skipped.
  • The audio stream sounds fine if decay is 0 (nothing to add).
  • The stored samples are indeed stored and accessed in the proper order and the proper locations.
  • The stored samples are being decayed and added to the input samples properly.
  • All numbers from the call of process() to return sample are precisely what I would expect from this algorithm, and remain so even outside this class.

The problem seems to arise from simply adding signed shorts together, and the resulting waveform is an absolute catastrophe. I've seen this specific method implemented in a variety of places - C#, C++, even on microcontrollers - so why is it failing so hard here?

EDIT: It seems I've been going about this entirely wrong. I don't know if it's FFmpeg/avconv, or some other factor, but I am not working with a normal PCM signal here. Through graphing of the waveform, as well as a failed attempt at a tone generator and the resulting analysis, I have determined that this is some version of differential pulse-code modulation; pitch is determined by change from one sample to the next, and halving the intended "volume" multiplier on a pure sine wave actually lowers the pitch and leaves volume the same. (Messing with the volume multiplier on a non-sine sequence creates the same static as this echo algorithm.) As this and other DSP algorithms are intended to work on linear pulse-code modulation, I'm going to need some way to get the proper audio stream first.

1
You say "The problem seems to arise from simply adding signed shorts together, and the resulting waveform is an absolute catastrophe." but you also say "All numbers from the call of process() to return sample are precisely what I would expect from this algorithm, and remain so even outside this class." Which is it, do you get the right results or not? Perhaps you can post the output from an impulse.Bjorn Roche
@Bjorn Well, say I have a sample of -3672. And 10 milliseconds ago, it was 1912, which gets multiplied by 0.1 to be 191. The returned value will be -3481. Numerically, it does exactly what I'd expect. But hearing the sound, and literally looking at the waveforms through an analyzer, it is clear that the "right" numbers are not creating the "right" sound. Ergo, simple offset addition apparently isn't going to work.DigitalMan

1 Answers

0
votes

It should definitely work unless you have significant clipping.

For example, this is a text file with two columns. The leftmost column is the 16 bit input. The second column is the sum of the first and a version delayed by 4001 samples. The sample rate is 22KHz.

Each sample in the second column is the result of summing x[k] and x[k-4001] (e.g. y[5000] = x[5000] + x[999] = -13840 + 9181 = -4659) You can clearly hear the echo signal when playing the samples in the second column.

Try this signal with your code and see if you get identical results.