Dense Captioning of Video Demonstrating the Upgraded Boston Dynamics Atlas Robot
Artist and programmer Gene Kogan ran the Boston Dynamics video demonstrating their upgraded Atlas robot through the Densecap captioning system, which tries to identify objects in a video. The system is both impressive and at times wildly inaccurate, labeling the robot in the resulting video as a variety of incorrect things like a person skiing, a motorcycle, or a fire hydrant.
Captions are generated by densecap on individual video frames. The video is made by a python script which merges matching captions along sequences of consecutive frames with a set of (mostly greedy) heuristics. Presumably, it would be possible to caption sequences of regions directly rather than a naive merging algorithm, but I’m not sure how :)
interestingly, densecap never mentions robots. atlas is variously described as person, motorcycle, fire hydrant, etc pic.twitter.com/roInNdoKKM
— Gene Kogan (@genekogan) July 1, 2016