Dense Captioning of Video Demonstrating the Upgraded Boston Dynamics Atlas Robot

Artist and programmer Gene Kogan ran the Boston Dynamics video demonstrating their upgraded Atlas robot through the Densecap captioning system, which tries to identify objects in a video. The system is both impressive and at times wildly inaccurate, labeling the robot in the resulting video as a variety of incorrect things like a person skiing, a motorcycle, or a fire hydrant.

Captions are generated by densecap on individual video frames. The video is made by a python script which merges matching captions along sequences of consecutive frames with a set of (mostly greedy) heuristics. Presumably, it would be possible to caption sequences of regions directly rather than a naive merging algorithm, but I’m not sure how :)

via Prosthetic Knowledge

Glen Tickle
Glen Tickle

Amelia's dad. Steph's husband. Writer, comedian, gentleman. Good at juggling, bad at chess.