Skip to main content
Computer vision is the discipline concerned with enabling machines to interpret and understand visual information from the world. It draws on mathematics, physics, and machine learning to solve problems that humans perform effortlessly: recognizing objects, estimating depth, reading text, tracking motion, and reconstructing three-dimensional scenes from photographs. This course — Visión por Computador (IEE3714 / IIC3724) at Pontificia Universidad Católica de Chile — is taught by Professor Domingo Mery. It covers the theory and practice of computer vision from geometric foundations through modern deep learning, and concludes with a frank discussion of ethics and the societal impact of AI.

What is computer vision?

At its core, computer vision asks: given one or more images, what can a machine infer about the physical world that produced them? That deceptively simple question leads to a rich set of sub-problems:
  • Recognition — identifying objects, faces, scenes, and text in images.
  • Reconstruction — recovering the 3D structure of a scene from 2D projections.
  • Detection and tracking — locating objects across space and time.
  • Segmentation — partitioning an image into semantically meaningful regions.
  • Generation — synthesizing new, realistic images (GANs, diffusion models).
Applications span medicine (X-ray analysis, surgical robotics), autonomous vehicles, industrial inspection, augmented reality, biometrics, and scientific imaging.

A brief history of the field

Computer vision has evolved over roughly six decades:
EraKey developments
1960s–70sEarly edge detection, block-world scene understanding, early neural networks
1980sScale-space theory, optical flow, stereo vision, Marr’s computational framework
1990sStatistical shape models, SIFT-like features, structure from motion
2000sFace detection (Viola–Jones), SIFT, large-scale datasets
2012–presentDeep learning revolution — AlexNet, YOLO, ResNet, GANs, Transformers, diffusion
The course history lectures (Classes 2–4) trace this arc in detail, from Renaissance perspective machines through the ImageNet era and beyond. The Khan Academy notes on vanishing points and the perspective machine video provide accessible visual context for why geometry was the starting point of the field.

Course philosophy and approach

The course is built on three convictions:
  1. Geometry first. Understanding how cameras project the 3D world into 2D images — homogeneous coordinates, homographies, epipolar geometry — is prerequisite knowledge for everything else.
  2. Hands-on learning. Every theoretical topic is paired with a Google Colab notebook. You will implement calibration, RANSAC, CNNs, YOLO, UNet, GANs, and Transformers, not just read about them.
  3. Responsible practice. The final chapter of the course addresses bias, fairness, explainability, and data-protection law, because deploying vision systems in the real world carries real ethical weight.
Primary tools are Python (via Google Colab) and MATLAB. No prior computer vision experience is assumed, but familiarity with linear algebra, calculus, and basic Python is expected.
All lecture slides, Colab notebooks, recorded classes, and practice exercises are available in the course GitHub repository at github.com/domingomery/vision.

Continue exploring

Course Overview

Full 28-class schedule, chapter structure, grading, and exam resources.

Bibliography & Resources

Core textbooks, supplementary videos, and reference materials.

Build docs developers (and LLMs) love