Google Research scientists are working to develop a machine-leaning system that is capable of producing automatic captions based on images, a project that could eventually assist the visually impaired and improve online image search.
Many efforts to construct computer-generated natural descriptions of images propose combining current state-of-the-art techniques in both computer vision and natural language processing to form a complete image description approach. But what if we instead merged recent computer vision and language models into a single jointly trained system, taking an image and directly producing a human readable sequence of words to describe it?
images via Google Research
via Google Research