Computer vision is the discipline concerned with enabling machines to interpret and understand visual information from the world. It draws on mathematics, physics, and machine learning to solve problems that humans perform effortlessly: recognizing objects, estimating depth, reading text, tracking motion, and reconstructing three-dimensional scenes from photographs. This course — Visión por Computador (IEE3714 / IIC3724) at Pontificia Universidad Católica de Chile — is taught by Professor Domingo Mery. It covers the theory and practice of computer vision from geometric foundations through modern deep learning, and concludes with a frank discussion of ethics and the societal impact of AI.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/domingomery/vision/llms.txt
Use this file to discover all available pages before exploring further.
What is computer vision?
At its core, computer vision asks: given one or more images, what can a machine infer about the physical world that produced them? That deceptively simple question leads to a rich set of sub-problems:- Recognition — identifying objects, faces, scenes, and text in images.
- Reconstruction — recovering the 3D structure of a scene from 2D projections.
- Detection and tracking — locating objects across space and time.
- Segmentation — partitioning an image into semantically meaningful regions.
- Generation — synthesizing new, realistic images (GANs, diffusion models).
A brief history of the field
Computer vision has evolved over roughly six decades:| Era | Key developments |
|---|---|
| 1960s–70s | Early edge detection, block-world scene understanding, early neural networks |
| 1980s | Scale-space theory, optical flow, stereo vision, Marr’s computational framework |
| 1990s | Statistical shape models, SIFT-like features, structure from motion |
| 2000s | Face detection (Viola–Jones), SIFT, large-scale datasets |
| 2012–present | Deep learning revolution — AlexNet, YOLO, ResNet, GANs, Transformers, diffusion |
Course philosophy and approach
The course is built on three convictions:- Geometry first. Understanding how cameras project the 3D world into 2D images — homogeneous coordinates, homographies, epipolar geometry — is prerequisite knowledge for everything else.
- Hands-on learning. Every theoretical topic is paired with a Google Colab notebook. You will implement calibration, RANSAC, CNNs, YOLO, UNet, GANs, and Transformers, not just read about them.
- Responsible practice. The final chapter of the course addresses bias, fairness, explainability, and data-protection law, because deploying vision systems in the real world carries real ethical weight.
All lecture slides, Colab notebooks, recorded classes, and practice exercises are available in the course GitHub repository at github.com/domingomery/vision.
Continue exploring
Course Overview
Full 28-class schedule, chapter structure, grading, and exam resources.
Bibliography & Resources
Core textbooks, supplementary videos, and reference materials.
