I am following this LABEL DETECTION TUTORIAL.
The code below does the following(after getting the response back)
Our response will contain result within an AnnotateVideoResponse, which consists of a list of annotationResults, one for each video sent in the request. Because we sent only one video in the request, we take the first segmentLabelAnnotations of the results. We then loop through all the labels in segmentLabelAnnotations. For the purpose of this tutorial, we only display video-level annotations. To identify video-level annotations, we pull segment_label_annotations data from the results. Each segment label annotation includes a description (segment_label.description), a list of entity categories (category_entity.description) and where they occur in segments by start and end time offsets from the beginning of the video.
segment_labels = result.annotation_results[0].segment_label_annotations
for i, segment_label in enumerate(segment_labels):
print('Video label description: {}'.format(
segment_label.entity.description))
for category_entity in segment_label.category_entities:
print('\tLabel category description: {}'.format(
category_entity.description))
for i, segment in enumerate(segment_label.segments):
start_time = (segment.segment.start_time_offset.seconds +
segment.segment.start_time_offset.nanos / 1e9)
end_time = (segment.segment.end_time_offset.seconds +
segment.segment.end_time_offset.nanos / 1e9)
positions = '{}s to {}s'.format(start_time, end_time)
confidence = segment.confidence
print('\tSegment {}: {}'.format(i, positions))
print('\tConfidence: {}'.format(confidence))
print('\n')
So, it says "Each segment label annotation includes a description (segment_label.description), a list of entity categories (category_entity.description) and where they occur in segments by start and end time offsets from the beginning of the video."
But, in the output, all the labels urban area
, traffic
, vehicle
.. have the same start and end time offsets
which are basically the start and the end of the video.
$ python label_det.py gs://cloud-ml-sandbox/video/chicago.mp4
Operation us-west1.4757250774497581229 started: 2017-01-30T01:46:30.158989Z
Operation processing ...
The video has been successfully processed.
Video label description: urban area
Label category description: city
Segment 0: 0.0s to 38.752016s
Confidence: 0.946980476379
Video label description: traffic
Segment 0: 0.0s to 38.752016s
Confidence: 0.94105899334
Video label description: vehicle
Segment 0: 0.0s to 38.752016s
Confidence: 0.919958174229
...
Why is this happening?
Why is the API returning these offsets for all the labels and not the start and end time offsets of the segment where that particular label (entity) appears?(I feel like it has something to do with the video-level annotation but I am not sure)
- How can I get the start and end time offsets of the segment where they actually appear?