Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/ilirosmanaj/detect_kermit/llms.txt

Use this file to discover all available pages before exploring further.

The fastest path to a working prediction is to use the pre-trained model that ships with the repository. Because the weights file (kermit_finder.h5) is stored via Git LFS, no retraining is necessary — clone the repo, install the dependencies, and you are one command away from classifying any image or video as kermit or no-kermit. This guide walks through exactly that path.
1

Prerequisites

Make sure you have the following before you begin:
  • Python 3.x — the evaluation script uses asyncio and type annotations that require Python 3.5+.
  • pip — used to install all Python dependencies.
  • Git LFS — required to download the pre-trained .h5 model that is tracked as a large file in the repository.
Install Git LFS for your platform:macOS (Homebrew)
brew install git-lfs
Ubuntu / Debian
sudo apt-get install git-lfs
After installation, initialise Git LFS for your user account (only needed once):
git lfs install
Then clone the repository so that LFS objects are downloaded automatically:
git clone https://github.com/ilirosmanaj/detect_kermit.git
cd detect_kermit
2

Install dependencies

Install all required Python packages from the pinned requirements.txt:
pip install -r requirements.txt
The file installs the following exact versions to ensure reproducibility:
PackageVersion / Source
tensorflow1.12.0
tensorflow-gpu1.12.0
keras2.2.4
opencv-python4.0.0.21
imageai2.0.2 (GitHub wheel URL)
pillow5.4.1
scipy1.2.0
matplotlib3.0.2
h5py2.9.0
pandas0.23.4
google_images_downloadlatest
ImageAI 2.0.2 is installed directly from GitHub because it is not published to PyPI:
https://github.com/OlafenwaMoses/ImageAI/releases/download/2.0.2/imageai-2.0.2-py3-none-any.whl
On macOS, the standard pip distribution of TensorFlow does not work. Install it from the official Google storage URL instead:
python3 -m pip install --upgrade https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.12.0-py3-none-any.whl
Run this command before pip install -r requirements.txt, or replace the tensorflow==1.12.0 line in requirements.txt with the URL above.
3

Verify the pre-trained model exists

After cloning with Git LFS enabled, confirm the two files the inference script depends on are present:
ls -lh data/images/models/kermit_finder.h5
ls -lh data/images/json/model_class.json
  • data/images/models/kermit_finder.h5 — the serialised ResNet weights produced by imageai_build_model.py. This file is tracked by Git LFS and should be several hundred megabytes.
  • data/images/json/model_class.json — the class index map used at inference time. Its contents are:
{
    "0": "kermit",
    "1": "no-kermit"
}
If kermit_finder.h5 is only a few hundred bytes, Git LFS did not download the real file. Run git lfs pull to fetch it:
git lfs pull
4

Run a prediction on an image

Use kermit_model_evaluation.py with -t image to classify a single image. Pass the file path with -f:
python kermit_model_evaluation.py -t image -f kermit.jpeg
Expected output:
Predicting the kermit.jpeg image
 kermit: 99.87 no-kermit: 0.13
The script prints the probability for each class. A kermit score near 100 confirms the model has high confidence that Kermit the Frog appears in the image.You can also classify multiple images in a single run by passing a comma-separated list of file paths:
python kermit_model_evaluation.py -t image -f kermit.jpeg,frog.jpeg,scene.jpeg
Each image is predicted independently and its result is printed in turn.
If TensorFlow throws an error about no available GPU devices or a CUDA_VISIBLE_DEVICES conflict, force CPU-only execution by exporting:
export CUDA_VISIBLE_DEVICES=''
Run this in the same shell session before executing the prediction script.
5

Run a prediction on a video

To classify every second of a video file, use -t video:
python kermit_model_evaluation.py -t video -f MuppetsEpisode3.avi
The script performs the following steps automatically:
  1. Frame extraction — OpenCV seeks to each 1-second mark (CAP_PROP_POS_MSEC) and captures one frame per second for the full duration of the video.
  2. Async batch inference — each extracted frame is submitted as an asyncio coroutine so all frames are predicted concurrently rather than sequentially.
  3. Annotation — once all predictions are collected, the confidence scores for both classes are rendered as a red text banner at the top of each frame image using cv2.putText.
  4. Output — annotated frames are saved as JPEGs in the episode3_results/ directory, named sequentially (ep3_frame0.jpg, ep3_frame1.jpg, …).
You can flip through the output directory to review which seconds of the episode contain Kermit, or stitch the annotated frames back into a video for a quick visual audit.

Next Steps

Prediction Guide

Deep-dive into image and video prediction: batch processing, understanding confidence scores, and interpreting edge cases like Kermit-lookalike characters.

Training Guide

Learn how to build your own labelled dataset, configure training epochs and batch size, and fine-tune the ResNet model from scratch with ImageAI.

Build docs developers (and LLMs) love