Introduction to Computer Vision

Human beings make sense of their surrounding and act accordingly by using the information gathered through either of the sensing capabilities. One of them is our perception based on what we see in the environment. Vision helps us identify, classify, find patterns in our environment, and make decision accordingly. Similarly, when we use visual data either image, video or even an icon as an input for computers for them to see and find patterns in that data, we call it computer vision. As shown below in the picture, for computers a typical picture is just the combination of different numbers, but every number represent some information.

Like humans, most common tasks that computers must accomplish for various vision applications are shown below in the picture.

There are various techniques that we usually implement to achieve the above-mentioned tasks. Without going into the complex detail what is worth mentioning is that every image has different features. What we usually do is we apply various filters in different forms to either extract information about those features or change them. When we see a cat, for us humans, it has various features such as two ears, four legs, two eyes and so on. Similarly, for computers to classify they also need to know features of an object and patterns in the data to complete the required tasks. In the conventional computer vision, we do the following.

  • Based on selection and connections of computational filters to abstract key features and correlating them to an object.
  • Works well with well-defined objects and controlled scene.
  • Difficult to predict critical features in larger number of objects or varying scenes.

In the conventional approach there are various challenges because it is very hard to hard code things such are viewpoint variation, scale variation, different lighting conditions etc. As shown below in the picture these are the most common challenges we face usually in computer vision.

Thanks to the great advancements in deep learning and having significant computational ability now we can use deep learning for computer vision tasks with very high efficiency which was not possible before. The learn more about deep learning refer to the post here.

  • Based on application of a large number of filters to an image to extract features.
  • Features in the object(s) are analyzed with the goal of associating each input image with an output node for each type of object.
  • Values are assigned to output node representing the probability that the image is the object associated with the output node.

To learn more about the solutions using computer vision to solve real world use-cases visit Tech Data’s IoT solutions catalogue page here.

Naqqash Abbassi
Lead for AI and Vision
Data and IoT Team
Tech Data Europe