Capstone Design at UM (Jan. - May 2019):
Yi Wen, Chung-Hsun Wang, I-Le Wu, Shengyi Qian, Ray Cao, Linyi Jin, Xinyi Zheng
EECS 498-09: Conversational AI: Principles and Practice, Instructed by: Jason Mars

Convision is a smart conversation AI implemented with the Clinc platform. This tool will allow users to talk with it and understand what’s happening based on the image input. They can also understand the information of people in the image through follow-up dialogues. It provides a starting point of bring vision to the people with visual impairment through conversations.


Bring vision to the blind or people with visual impairment through a conversational platform.

  • Tell users what is contained in an image
  • Give gender, age, emotion, number information of people in an image
  • Understand the image scene, and be able to describe it


WebCam was used to take th visual input. Cognitive service on Microsoft Azure helped analyze the visual information.

Clinc was to take the audio question as input and understand the natural language. It would also generated audio feedback when the textual answer was completed.

The convision serever coordinated CV tasks and NLP tasks. The user's interest showed in the question specified what information was needed from the image. Based on the specific information, a human-style sentence would be generated.