Python 010: Augmented Reality with OpenCV

Be familiar with Python functions and slice notation for indexing lists.
Install NumPy: try typing pip install numpy in the terminal.
Install OpenCV: OpenCV's installation instructions are bad. Try typing pip install opencv-python in the terminal.

Capture - a stream of images from the camera, processed one by one.
Image - a multi-dimensional array containing data for each color channel at each x and y value. See Step 3 explanation.
Multi-dimensional array - a list is a 1D array. A grid can be represented by a 2D array. Images in OpenCV are generally 3D arrays.
Color channel - one part of color in a certain mode of representing color, for example, the red values in RGB color.
Slice indexing - indicating positions in an array using square brackets [] and indexes (locations) in the array. See here.
Classifier - data from pre-training a machine learning model to recognize an image, so that a the model can be re-created to recognize images.
Feature - a type of image or shape which a classifier data set is trained to recognize, in some visibilty conditions with some accuracy.
AR/augmented reality - taking sensory information from the real world and combining it with computer-created sensory information, in this case, adding computer images to a webcam feed.
Overlay/filter - in our usage, a computerized image placed over another image or video.

cv2.VideoCapture
VideoCapture.read()
cv2.flip() (see how the optional flipCode parameter is used to specify horizontal flipping)
cv2.imshow()
cv2.waitkey()
VideoCapture.release()
cv2.destroyAllWindows()

cv2.imread()
cv2.resize() (look at the types of interpolation to see what cv2.INTER_AREA means)
cv2.bitwise_not() (images are stored as 3-dimensional arrays {x, y, color channel} so array operations work on them.)
cv2.bitwise_and()
cv2.add()

while True:
    ret, frame = cap.read()

frame_count = 0 # initialize frame counter
while True:
    ret, frame = cap.read()
    frame_count += 1 # increment the frame counter
    if frame_count % 2 != 0: # if the frame count is not divisible by 2;
      continue # continue through the loop without doing it this time.

Python 010: Augmented Reality with OpenCV

This will help you:

Analyze images or videos to identify features (such as faces or eyes), and add images which follow the detected features.

Time: 2-4 hours / Level: C1

You should already:

Be familiar with Python functions and slice notation for indexing lists.

Install NumPy: try typing pip install numpy in the terminal.

Install OpenCV: OpenCV's installation instructions are bad. Try typing pip install opencv-python in the terminal.

Get the code and resources for this activity by clicking below. It will allow you to download the files from a Google Drive folder. Unzip the folder and save it in a sensible location.

Glossary

Capture - a stream of images from the camera, processed one by one.

Image - a multi-dimensional array containing data for each color channel at each x and y value. See Step 3 explanation.

Multi-dimensional array - a list is a 1D array. A grid can be represented by a 2D array. Images in OpenCV are generally 3D arrays.

Color channel - one part of color in a certain mode of representing color, for example, the red values in RGB color.

Slice indexing - indicating positions in an array using square brackets [] and indexes (locations) in the array. See here.

Classifier - data from pre-training a machine learning model to recognize an image, so that a the model can be re-created to recognize images.

Feature - a type of image or shape which a classifier data set is trained to recognize, in some visibilty conditions with some accuracy.

AR/augmented reality - taking sensory information from the real world and combining it with computer-created sensory information, in this case, adding computer images to a webcam feed.

Overlay/filter - in our usage, a computerized image placed over another image or video.

Step 1: Warm-up - Show a video

Open show_vid.py and read the comments to try to understand what is happening. Then, run the program by running python show_vid.py in the terminal.

If you want to read about one of the objects or functions used, here is the documentation:

cv2.flip() (see how the optional flipCode parameter is used to specify horizontal flipping)

Make sure you understand this code before moving on.

Step 2: Warm-up - Detect faces

Open face_detection.py and read the comments to try to understand what is happening. Then, run the program by running python face_detection.py in the terminal.

If you want to read about one of the objects or functions used, here is the documentation:

cv2.rectangle() or cv2.ellipse()

Some of the code in this activity comes from this tutorial on detecting faces. The classifier files included in this activity come from here.

Currently, there is a minimum size for detected features. If features are not showing up, you can decrease it. You can also set a maximum size using maxSize=(w,h). These may help to narrow down what is detected, but they may cause you to miss detections.

Make sure you understand this code before moving on.

Step 3: Warm-up - Overlay images

Open image_overlay.py and read the comments to try to understand what is happening. Then, run the program by running python image_overlay.py in the terminal.

If you want to read about one of the objects or functions used, here is the documentation:

cv2.resize() (look at the types of interpolation to see what cv2.INTER_AREA means)

cv2.bitwise_not() (images are stored as 3-dimensional arrays {x, y, color channel} so array operations work on them.)

You can (and should) read more about image arithmetic here. It goes through an example of creating a mask and using it to overlay two images.

See what happens if you change the indices in mask = overlay[:, :, 3], or swap some uses of mask and mask_inv. Try using cv2.imshow("label", ______) to show some of the intermediate images roi, mask, mask_inv, overlay, img_bg, img_fg, or dst.

Make sure you understand this code before moving on.

Step 4: Activity - Detect multiple features

Open multi_feature.py and read the comments to try to understand what is happening. Then, run the program by running python multi_feature.py in the terminal.

The main loop just does the routine things - loading classifier files, creating a video capture stream, reading the stream, calling the above function, and showing the modified video.

The code in this activity is based on this tutorial. It provides an explanation of how feature detection works in OpenCV.

Step 5: Activity - AR filters

Open video_overlay.py and read the comments to try to understand what is happening. Start near the bottom, after return frame - that's all the detectAndOverlay() function, and it's pretty long due to positioning the overlay.

After that, the routine video steps happen: create a feed, read frames, process, show. There's also code for creating a file to save the video, like in show_vid.py.

Now, run the program by running python video_overlay.py in the terminal.

The code in this activity is based on the tutorial here.

Step 6: Make it your own

Download your own PNG images to use as overlays, and add them to any video or image you want. You'll likely want to change what feature (or features) are being detected, using the included classifier files or ones you find online.

Step 7: Going further

Option 1: Combine this with the image processing activity to use your own filters and effects on the video, in addition to the AR overlays.

Option 2: Edit an existing video instead of your camera feed to make it funnier. Instead of passing 0 into cv2.VideoCapture(), pass in a string with a file path to a video. For example, cap = cv2.VideoCapture('my/video/path.mp4')).

If the program runs too slowly, try processing every 2nd or 3rd frame. Replace these lines:

With these:

The counter should be initialized right before the while loop, and the divisibility check should happen immediately after the video capture feed is read. The higher the number (2 in this example), the more video frames will be skipped.

Install NumPy: try typing `pip install numpy` in the terminal.

Install OpenCV: OpenCV's installation instructions are bad. Try typing `pip install opencv-python` in the terminal.

Open `show_vid.py` and read the comments to try to understand what is happening. Then, run the program by running `python show_vid.py` in the terminal.

`cv2.flip()` (see how the optional `flipCode` parameter is used to specify horizontal flipping)

Open `face_detection.py` and read the comments to try to understand what is happening. Then, run the program by running `python face_detection.py` in the terminal.

`cv2.rectangle()` or `cv2.ellipse()`

Currently, there is a minimum size for detected features. If features are not showing up, you can decrease it. You can also set a maximum size using `maxSize=(w,h)`. These may help to narrow down what is detected, but they may cause you to miss detections.

Open `image_overlay.py` and read the comments to try to understand what is happening. Then, run the program by running `python image_overlay.py` in the terminal.

`cv2.resize()` (look at the types of interpolation to see what `cv2.INTER_AREA` means)

`cv2.bitwise_not()` (images are stored as 3-dimensional arrays {x, y, color channel} so array operations work on them.)

See what happens if you change the indices in `mask = overlay[:, :, 3]`, or swap some uses of `mask` and `mask_inv`. Try using `cv2.imshow("label", ______)` to show some of the intermediate images `roi`, `mask`, `mask_inv`, `overlay`, `img_bg`, `img_fg`, or `dst`.

Open `multi_feature.py` and read the comments to try to understand what is happening. Then, run the program by running `python multi_feature.py` in the terminal.

Open `video_overlay.py` and read the comments to try to understand what is happening. Start near the bottom, after `return frame` - that's all the `detectAndOverlay()` function, and it's pretty long due to positioning the overlay.

After that, the routine video steps happen: create a feed, read frames, process, show. There's also code for creating a file to save the video, like in `show_vid.py`.

Now, run the program by running `python video_overlay.py` in the terminal.

Option 2: Edit an existing video instead of your camera feed to make it funnier. Instead of passing 0 into `cv2.VideoCapture()`, pass in a string with a file path to a video. For example, `cap = cv2.VideoCapture('my/video/path.mp4')`).