3
votes

I've trying to do some audio processing, I'm really stuck with a stereo to mono conversion. I looked in internet regarding stereo to mono conversion.

As far I know, I can take the left channel, right channel, sum them up and divide by 2. But when I dump the result into a WAV file again, I got a lot of foreground noise. I know that the noise can be caused when processing the data, there some overflow in the byte variable.

This is my class from retrieving byte[] data chunks from an MP3 file:

public class InputSoundDecoder {

private int BUFFER_SIZE = 128000;
private String _inputFileName;
private File _soundFile;
private AudioInputStream _audioInputStream;
private AudioFormat _audioInputFormat;
private AudioFormat _decodedFormat;
private AudioInputStream _audioInputDecodedStream;

public InputSoundDecoder(String fileName) throws UnsuportedSampleRateException{
    this._inputFileName = fileName;
    this._soundFile = new File(this._inputFileName);
    try{
        this._audioInputStream = AudioSystem.getAudioInputStream(this._soundFile);
    }
    catch (Exception e){
        e.printStackTrace();
        System.err.println("Could not open file: " + this._inputFileName);
        System.exit(1);
    }

    this._audioInputFormat = this._audioInputStream.getFormat();

    this._decodedFormat = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 44100, 16, 2, 1, 44100, false);
    this._audioInputDecodedStream = AudioSystem.getAudioInputStream(this._decodedFormat, this._audioInputStream);

    /** Supported sample rates */
    switch((int)this._audioInputFormat.getSampleRate()){
        case 22050:
                this.BUFFER_SIZE = 2304;
            break;

        case 44100:
                this.BUFFER_SIZE = 4608;
            break;

        default:
            throw new UnsuportedSampleRateException((int)this._audioInputFormat.getSampleRate());
    }

    System.out.println ("# Channels: " + this._decodedFormat.getChannels());
    System.out.println ("Sample size (bits): " + this._decodedFormat.getSampleSizeInBits());
    System.out.println ("Frame size: " + this._decodedFormat.getFrameSize());
    System.out.println ("Frame rate: " + this._decodedFormat.getFrameRate());

}

public byte[] getSamples(){
    byte[] abData = new byte[this.BUFFER_SIZE];
    int bytesRead = 0;

    try{
        bytesRead = this._audioInputDecodedStream.read(abData,0,abData.length);
    }
    catch (Exception e){
        e.printStackTrace();
        System.err.println("Error getting samples from file: " + this._inputFileName);
        System.exit(1);
    }

    if (bytesRead > 0)
        return abData;
    else
        return null;
}

}

This means, every time I call getSamples, it returns an array like:

buff = {Lchannel, Rchannel, Lchannel, Rchannel,Lchannel, Rchannel,Lchannel, Rchannel...}

The processing routine an conversion to mono looks like:

    byte[] buff = null;
        while( (buff = _input.getSamples()) != null ){

            /** Convert to mono */
            byte[] mono = new byte[buff.length/2];

            for (int i = 0 ; i < mono.length/2; ++i){
                int left = (buff[i * 4] << 8) | (buff[i * 4 + 1] & 0xff);
                int right = (buff[i * 4 + 2] <<8) | (buff[i * 4 + 3] & 0xff);
                int avg = (left + right) / 2;
                short m = (short)avg; /*Mono is an average between 2 channels (stereo)*/
                mono[i * 2] = (byte)((short)(m >> 8));
                mono[i * 2 + 1] = (byte)(m & 0xff);
            }

}

And writing to the wav file using:

     public static void writeWav(byte [] theResult, int samplerate, File outfile) {
        // now convert theResult into a wav file
        // probably should use a file if samplecount is too big!
        int theSize = theResult.length;


        InputStream is = new ByteArrayInputStream(theResult);
        //Short2InputStream sis = new Short2InputStream(theResult);

        AudioFormat audioF = new AudioFormat(
                AudioFormat.Encoding.PCM_SIGNED,
                samplerate,
                16,
                1,          // channels
                2,          // framesize
                samplerate,
                false
        );

        AudioInputStream ais = new AudioInputStream(is, audioF, theSize);

        try {
            AudioSystem.write(ais, AudioFileFormat.Type.WAVE, outfile);
        } catch (IOException ioe) {
            System.err.println("IO Exception; probably just done with file");
            return;
        }


    }

With 44100 as sample rate.

Take in mind that actually the byte[] array that I've got it's already pcm, so mp3 -> pcm conversion it's done by specifying

 this._decodedFormat = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 44100, 16, 2, 1, 44100, false);
this._audioInputDecodedStream = AudioSystem.getAudioInputStream(this._decodedFormat, this._audioInputStream);

As I said, when writing to the Wav file I've got a lot of noise. I pretend to apply to every chunk of byte a FFT, but I think because of the noisy sound the result it's not correct.

Because I'm taking two songs, one of them is a 20 seconds crop from the another, and when comparing the crop fft result with the original 20 seconds subset, it doesn't match at all.

I think the reason it's the incorrect conversion stereo->mono.

Hope someone know something about this,

Regards.

1
If it is caused by an overflow, why not divide by 2 and then sum?James
You may be getting the endianness of the data wrong. Try doing something like reading and writing with no conversion, or better yet, put a known clean data source through it (perhaps a square wave using only 2 distinct amplitude values) and examine the raw bytes of the output. With a little experience, may types of problems can quickly be recognized if the graph the signal in audio software.Chris Stratton
If I don't convert, all I have from a mp3 file it's raw encoded bytes. Conversion it's not an optional step, it has to be done in order to have real sound values into the array. Dividing and summing has the same result...Mario

1 Answers

7
votes

As pointed out in the comments, endianness may be wrong. Also, converting to a signed short and shifting it may cause the first byte to be 0xFF.

Try:

int HI = 0; int LO = 1;
int left = (buff[i * 4 + HI] << 8) | (buff[i * 4 + LO] & 0xff);
int right = (buff[i * 4 + 2 + HI] << 8) | (buff[i * 4 + 2 + LO] & 0xff);
int avg = (left + right) / 2;
mono[i * 2 + HI] = (byte)((avg >> 8) & 0xff);
mono[i * 2 + LO] = (byte)(avg & 0xff);

Then switch the values of HI and LO to see if it gets better.