3
votes

I'm attempting to capture video on an iPhone 5 for realtime upload and HLS streaming. I'm at the stage where I'm generating the video on the device (not yet uploading to the server). Like these links on SO suggest, I've hacked together some code that switches out AssetWriters every five seconds.

Right now during dev, I'm just saving the files to the device locally and pulling them out via XCode Organizer. I then run Apple's mediafilesegmenter to simply convert them to MPEG2-TS (they're below 10 seconds already, so there's no actual segmenting happening - I assume they're just being converted to TS). I build the m3u8 by editing together the various index files created during this process (also manually at the moment).

When I put the assets on a server for testing, they're mostly streamed correctly, but I can tell when there's a segment switch because the audio briefly drops (possibly the video too, but I can't tell for sure - it looks ok). This obviously doesn't happen for typical HLS streams segmented from one single input file. I'm at a loss as to what's causing this.

You can open my HLS stream on your iPhone here (you can hear the audio drop after 5 seconds and again around 10)

Could there be something happening in my creation process (either on the device or the post-processing) that's causing the brief audio drops? I don't think I'm dropping any sampleBuffer's during AssetWriter switch outs (see code).

- (void)writeSampleBuffer:(CMSampleBufferRef)sampleBuffer ofType:(NSString *)mediaType
{
    if (!self.isStarted) {
        return;
    }

    @synchronized(self) {

        if (mediaType == AVMediaTypeVideo && !assetWriterVideoIn) {
            videoFormat = CMSampleBufferGetFormatDescription(sampleBuffer);
            CFRetain(videoFormat);
            assetWriterVideoIn = [self addAssetWriterVideoInput:assetWriter withFormatDesc:videoFormat];
            [tracks addObject:AVMediaTypeVideo];
            return;
        }

        if (mediaType == AVMediaTypeAudio && !assetWriterAudioIn) {
            audioFormat = CMSampleBufferGetFormatDescription(sampleBuffer);
            CFRetain(audioFormat);
            assetWriterAudioIn = [self addAssetWriterAudioInput:assetWriter withFormatDesc:audioFormat];
            [tracks addObject:AVMediaTypeAudio];
            return;
        }

        if (assetWriterAudioIn && assetWriterVideoIn) {
            recording = YES;
            if (assetWriter.status == AVAssetWriterStatusUnknown) {
                if ([assetWriter startWriting]) {
                    [assetWriter startSessionAtSourceTime:CMSampleBufferGetPresentationTimeStamp(sampleBuffer)];
                    if (segmentationTimer) {
                        [self setupQueuedAssetWriter];
                        [self startSegmentationTimer];
                    }
                } else {
                    [self showError:[assetWriter error]];
                }
            }

            if (assetWriter.status == AVAssetWriterStatusWriting) {
                if (mediaType == AVMediaTypeVideo) {
                    if (assetWriterVideoIn.readyForMoreMediaData) {
                        if (![assetWriterVideoIn appendSampleBuffer:sampleBuffer]) {
                            [self showError:[assetWriter error]];
                        }
                    }
                }
                else if (mediaType == AVMediaTypeAudio) {
                    if (assetWriterAudioIn.readyForMoreMediaData) {
                        if (![assetWriterAudioIn appendSampleBuffer:sampleBuffer]) {
                            [self showError:[assetWriter error]];
                        }
                    }
                }
            }
        }
    }
}

- (void)setupQueuedAssetWriter
{
    dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_BACKGROUND, 0), ^{
        NSLog(@"Setting up queued asset writer...");
        queuedFileURL = [self nextFileURL];
        queuedAssetWriter = [[AVAssetWriter alloc] initWithURL:queuedFileURL fileType:AVFileTypeMPEG4 error:nil];
        if ([tracks objectAtIndex:0] == AVMediaTypeVideo) {
            queuedAssetWriterVideoIn = [self addAssetWriterVideoInput:queuedAssetWriter withFormatDesc:videoFormat];
            queuedAssetWriterAudioIn = [self addAssetWriterAudioInput:queuedAssetWriter withFormatDesc:audioFormat];
        } else {
            queuedAssetWriterAudioIn = [self addAssetWriterAudioInput:queuedAssetWriter withFormatDesc:audioFormat];
            queuedAssetWriterVideoIn = [self addAssetWriterVideoInput:queuedAssetWriter withFormatDesc:videoFormat];
        }
    });
}

- (void)doSegmentation
{
    NSLog(@"Segmenting...");
    AVAssetWriter *writer = assetWriter;
    AVAssetWriterInput *audioIn = assetWriterAudioIn;
    AVAssetWriterInput *videoIn = assetWriterVideoIn;
    NSURL *fileURL = currentFileURL;

    //[avCaptureSession beginConfiguration];
    @synchronized(self) {
        assetWriter = queuedAssetWriter;
        assetWriterAudioIn = queuedAssetWriterAudioIn;
        assetWriterVideoIn = queuedAssetWriterVideoIn;
    }
    //[avCaptureSession commitConfiguration];
    currentFileURL = queuedFileURL;

    dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_BACKGROUND, 0), ^{
        [audioIn markAsFinished];
        [videoIn markAsFinished];
        [writer finishWritingWithCompletionHandler:^{
            if (writer.status == AVAssetWriterStatusCompleted ) {
                [fileURLs addObject:fileURL];
            } else {
                NSLog(@"...WARNING: could not close segment");
            }
        }];
    });
}
3

3 Answers

1
votes

You can try inserting a #EXT-X-DISCONTINUITY between every segment in the m3u8, but I doubt this will work. There are a lot of thing that can be going wrong here.

Assuming you are sample audio at 44100kHz There is a new audio sample every 22 microseconds. During the time you are closing and reopening the file, you are definitely losing samples. If you concatenate the final wave form, it will play back slightly faster that real time due to this loss. In reality, this is probably not an issues.

As @vipw said, you will also have timestamp issues. Every time you start a new mp4, you are starting from timestamp zero. So, the player is getting confused, because the timestamps keep getting reset.

Also, is the transport stream format. The TS encapsulates each frame into 'streams'. HLS typically has 4 (PAT, PMT, Audio and Video), each stream is split into 188 byte packets with a 4 byte header. The headers have a 4 bit per stream continuity counter that wraps around on overflow. So, running mediafilesegmenter on every mp4, you are breaking the stream every segment by reseting the continuity counter back to zero.

You need a a tool that will accept mp4 and create a streaming output that maintains/rewrites timestamps (PTS, DTS, CTS), as well as continuity counters.

1
votes

Shifting Packets

We had trouble using older versions of ffmpeg pts filter to shift packets. The more recent ffmpeg1 and ffmpeg2 support time shifts for mpegts.

Here's an example of a ffmpeg call, note -t for the duration and -initial_time for the shift at the end of the command (keep on scrolling right...) Here's a segment with a 10 second shift

/opt/ffmpeg -i /tmp/cameo/58527/6fc2fa1a7418bf9d4aa90aa384d0eef2244631e8 -threads 0 -ss 10 -i /tmp/cameo/58527/79e684d793e209ebc9b12a5ad82298cb5e94cb54 -codec:v libx264 -pix_fmt yuv420p -preset veryfast -strict -2 -bsf:v h264_mp4toannexb -flags -global_header -crf 28 -profile:v baseline -x264opts level=3:keyint_min=24:keyint=24:scenecut=0 -b:v 100000 -bt 100000 -bufsize 100000 -maxrate 100000 -r 12 -s 320x180 -map 0:0 -map 1:0 -codec:a aac -strict -2 -b:a 64k -ab 64k -ac 2 -ar 44100 -t 9.958333333333334 -segment_time 10.958333333333334 -f segment -initial_offset 10 -segment_format mpegts -y /tmp/cameo/58527/100K%01d.ts -codec:v libx264 -pix_fmt yuv420p -preset veryfast -strict -2 -bsf:v h264_mp4toannexb -flags -global_header -crf 28 -profile:v baseline -x264opts level=3:keyint_min=24:keyint=24:scenecut=0 -b:v 200000 -bt 200000 -bufsize 200000 -maxrate 200000 -r 12 -s 320x180 -map 0:0 -map 1:0 -codec:a aac -strict -2 -b:a 64k -ab 64k -ac 2 -ar 44100 -t 9.958333333333334 -segment_time 10.958333333333334 -f segment -initial_offset 10 -segment_format mpegts -y /tmp/cameo/58527/200K%01d.ts -codec:v libx264 -pix_fmt yuv420p -preset veryfast -strict -2 -bsf:v h264_mp4toannexb -flags -global_header -crf 28 -profile:v baseline -x264opts level=3:keyint_min=24:keyint=24:scenecut=0 -b:v 364000 -bt 364000 -bufsize 364000 -maxrate 364000 -r 24 -s 320x180 -map 0:0 -map 1:0 -codec:a aac -strict -2 -b:a 64k -ab 64k -ac 2 -ar 44100 -t 9.958333333333334 -segment_time 10.958333333333334 -f segment -initial_offset 10 -segment_format mpegts -y /tmp/cameo/58527/364K%01d.ts -codec:v libx264 -pix_fmt yuv420p -preset veryfast -strict -2 -bsf:v h264_mp4toannexb -flags -global_header -crf 28 -profile:v baseline -x264opts level=3:keyint_min=24:keyint=24:scenecut=0 -b:v 664000 -bt 664000 -bufsize 664000 -maxrate 664000 -r 24 -s 480x270 -map 0:0 -map 1:0 -codec:a aac -strict -2 -b:a 64k -ab 64k -ac 2 -ar 44100 -t 9.958333333333334 -segment_time 10.958333333333334 -f segment -initial_offset 10 -segment_format mpegts -y /tmp/cameo/58527/664K%01d.ts -codec:v libx264 -pix_fmt yuv420p -preset veryfast -strict -2 -bsf:v h264_mp4toannexb -flags -global_header -crf 23 -profile:v baseline -x264opts level=3.1:keyint_min=24:keyint=24:scenecut=0 -b:v 1264000 -bt 1264000 -bufsize 1264000 -maxrate 1264000 -r 24 -s 640x360 -map 0:0 -map 1:0 -codec:a aac -strict -2 -b:a 64k -ab 64k -ac 2 -ar 44100 -t 9.958333333333334 -segment_time 10.958333333333334 -f segment -initial_offset 10 -segment_format mpegts -y /tmp/cameo/58527/1264K%01d.ts

There's also the adaption of the c++ segmenter that I've updated on github, but it's only been reasonably tested for video only mpegts. AV still causes it some issues (I wasn't confident which type of packet should be shifted to the new value the first video or the first audio packet, opted for the first video packet). Also, as you bumped into it can have problems with certain media as you noted in your issue.

If I had more time on my hands, I'd like to debug your specific case and improve the c++ shifter. I hope the above ffmpeg example helps get your http live streaming example working, we've gone through our share of streaming trouble. We're currently working around an audio pop that occurs from shifted segments. The fix is to gather all source media before splitting into segmented streams (which we can do when we finalize a video but it would slow us down during iterative builds).

0
votes

I think that your ts files aren't going to be created on the same timeline. Within ts files are the presentation timestamps of the packets, and if a new ts is being created on each segment, there's probably a discontinuity.

What might work is for you to concatenate the recorded segments together so that the new part is timestamped in the same timeline. Then segmenting should work properly and the segment transitions should be smooth in the generated stream.

I think you need a process that always keeps the last part of the previous segment so that the timestamps are always synchronized.