Computer Vision Basics

These are my notes from the Computer Vision Basics course on Coursera. I had some free time, so I decided to revise my hand-written notes while also digitising the same.

Computer Vision strives to make computers understand visual feed and images and unpack its information, intuitively and more elaborately compared to humans and other living beings. The human visual perception system comprising the eye and the brain is analogous to computational vision systems using a camera and a processing unit. Spatial and temporal visual information packed in videos are extremely to critical to most of the intelligent applications of CV in robotics, autonomous vehicles, monitoring and scene understanding tasks.

Computer vision reads still images or a sequence of images to automatically extract all useful information performing tasks like object detection, classification, depth estimation, motion estimation, scene understanding, predictions and more while being robust to different image qualities. Computer vision attempts to develop intelligent vision systems efficient as humans but also robust to problems of optical illusion and other biological limitations of the human vision system.

Several concepts of CV are based on geometry and statistics applied to traveling light.

Artificial Intelligence - Object Detection, Classification, Semantic Segmentation and more
Digital Signal Processing - Image compression, enhancement and
Neuroscience - Natural vision pipelines

Using good open-source tools and in-built libraries/packages helps accelerate computer vision learning curve towards solving more high-level task.

Evolution Of Computer Vision

Soon after photography was discovered and cameras became a common tool to capture the real world on a paper, naive computer vision approaches were developed to understand the conversion of photographs on a paper to the real-world equivalent. Several inventions followed, including the definition of pixels, connection of a camera with a computer and the discovery of color cameras.

Optical flow, motion estimation, edge detection, character recognition were the first few algorithms discovered in the early days of computer vision. Mathematical representation of different optical phenomena and their different applications to computer vision form the basis of the modern-age computer vision.

Camera calibration, projections and, skews were important for tasks like 3D reconstruction and image stitching. It also paved the way for stereovision, the basics of which hold true, even today. The Viola-Jones algorithm using Haar Cascades and HoG revolutionised face detection algorithms in the early 2000s. Robust image features like scale and orientation invariance contributed to lots of still image computer vision. Neural networks, especially convolutional neural networks after the 2012 ImageNet challenge led the true revolution in the field of modern-day computer vision applications.

Another breakthrough happened with the proposal of Generative Adversarial Networks (GANs), a relatively new yet very powerful method of making multiple networks compete against each other to eventually improve the learning.

Applications of Computer Vision

Industrial Monitoring
Robotics
Augmented Reality
Medical Imaging
Visual Suveillance

Computer vision setup

Any computer vision system needs a camera hooked up to a processor via communication cables. The type of camera, type of communication line between the digital signal and the processor and the processor itself are all critical to the performance and capabilities of the vision system. While poor resolution cameras means less information from the scene and thus low performance, a slow processor or limited bandwidth communicatoin pipeline is equally restricts the performance.

Light Sources

Light sources provide the illumination needed for the vision systems. Their luminosity, wavelength and color determines the behavior of the imaging system. For instance, a a yellow filament bulb light may make the camera incorrect capture a white object as yellow because of the light falling on it. Color can be affected by object surfaces, emittance and remittance properties and orientation. A common assumption for CV systems is no light scattering or inter reflection. The visible light wavelength spans from 400-700 nm