Glossary
Capture - a stream of images from the camera, processed one by one.
Image - a multi-dimensional array containing data for each color channel at each x and y value. See Step 3 explanation.
Multi-dimensional array - a list is a 1D array. A grid can be represented by a 2D array. Images in OpenCV are generally 3D arrays.
Color channel - one part of color in a certain mode of representing color, for example, the red values in RGB color.
Slice indexing - indicating positions in an array using square brackets [] and indexes (locations) in the array. See here.
Classifier - data from pre-training a machine learning model to recognize an image, so that a the model can be re-created to recognize images.
Feature - a type of image or shape which a classifier data set is trained to recognize, in some visibilty conditions with some accuracy.
AR/augmented reality - taking sensory information from the real world and combining it with computer-created sensory information, in this case, adding computer images to a webcam feed.
Overlay/filter - in our usage, a computerized image placed over another image or video.
Step 1: Warm-up - Show a video
Open show_vid.py
and read the comments to try to understand what is happening. Then, run the program by running python show_vid.py
in the terminal.
If your computer has a built-in webcam which is properly configured, cv2.VideoCapture(0)
should create a video capture feed from it. If you have a secondary camera, for example a USB webcam in addition to one built-in, you may pass in a different device number, such as cv2.VideoCapture(1)
. To stop the program, hit 'Q' on your keyboard (while focused on the display window).
If you want to read about one of the objects or functions used, here is the documentation:
cv2.flip()
(see how the optional flipCode
parameter is used to specify horizontal flipping)
There is a block of code responsible for (optionally) saving the video that you record. It assumes you are running the program in a folder with a subfolder named "output". It uses the mp4 video format by default, but if you can't view the output video, try a different format. It uses VideoCapture.get()
, cv2.VideoWriter
and VideoWriter.write()
. Some of the code in this activity comes from this tutorial on showing and saving videos.
Make sure you understand this code before moving on.
Step 2: Warm-up - Detect faces
Open face_detection.py
and read the comments to try to understand what is happening. Then, run the program by running python face_detection.py
in the terminal.
If you want to read about one of the objects or functions used, here is the documentation:
Some of the code in this activity comes from this tutorial on detecting faces. The classifier files included in this activity come from here.
There is a face classifier file included in the classifiers
folder with this activity. Download a different classifier online by searching "Haar cascade classifier ____" for whatever you want to classify. You should download an .xml file and put it in the classifiers
folder. Then, change the file path in the code where it says classifier='classifiers/????.xml'
. Run the program again and see if the classifier data you found is effective.
Currently, there is a minimum size for detected features. If features are not showing up, you can decrease it. You can also set a maximum size using maxSize=(w,h)
. These may help to narrow down what is detected, but they may cause you to miss detections.
Make sure you understand this code before moving on.
Step 3: Warm-up - Overlay images
Open image_overlay.py
and read the comments to try to understand what is happening. Then, run the program by running python image_overlay.py
in the terminal.
If you want to read about one of the objects or functions used, here is the documentation:
cv2.resize()
(look at the types of interpolation to see what cv2.INTER_AREA
means)
cv2.bitwise_not()
(images are stored as 3-dimensional arrays {x, y, color channel} so array operations work on them.)
You can (and should) read more about image arithmetic here. It goes through an example of creating a mask and using it to overlay two images.
Images in OpenCV are treated as 3-dimensional arrays. Imagine a 3-dimensional spreadsheet, where each row represented a row of pixels in the image, each column was a column, and the depth (or spreadsheet tab) indicated one of the color channels - like blue, green and red in BGR color, or red, green, blue and transparency in RGBA color. OpenCV allows indexing images by row, column, and channel using image[0, 0, 1]
, for example, to get the green value in the top-left pixel of an image. :
in list indexing means "all the rest", so image[20:, :, :]
cuts off the left side by 20 pixels. overlay[:, :, 3]
, as used in this code, gets all the x and y values, and only the 4th color channel, which is transparency for a PNG image. 0 is totally transparent, so wherever the image was transparent, the mask will exclude.
See what happens if you change the indices in mask = overlay[:, :, 3]
, or swap some uses of mask
and mask_inv
. Try using cv2.imshow("label", ______)
to show some of the intermediate images roi
, mask
, mask_inv
, overlay
, img_bg
, img_fg
, or dst
.
Make sure you understand this code before moving on.
Step 4: Activity - Detect multiple features
Open multi_feature.py
and read the comments to try to understand what is happening. Then, run the program by running python multi_feature.py
in the terminal.
This program defines a separate function, detectAndOverlay()
, for processing a single frame. It does all the complicated work - it flips the frame, detects faces in it, and detects eyes within the region of interest (ROI) of the face. Then it draws ellipses over those features and returns the modified frame.
The main loop just does the routine things - loading classifier files, creating a video capture stream, reading the stream, calling the above function, and showing the modified video.
The code in this activity is based on this tutorial. It provides an explanation of how feature detection works in OpenCV.
Try modifying the code to detect a different facial feature using the included classifier files, or some you find on your own. You could try detecting just the mouth, or both eyes as one feature (currently, they're detected as multiple occurences of one feature.) If you're looking for a feature that should only occur once, add a break
statement at the end of the innermost loop. The loop is convenient because the results come as a list, but sometimes you only want to display the first thing found.
Step 5: Activity - AR filters
Open video_overlay.py
and read the comments to try to understand what is happening. Start near the bottom, after return frame
- that's all the detectAndOverlay()
function, and it's pretty long due to positioning the overlay.
The first section creates classifiers, just like face_detection.py
. The next section loads an image and creates a mask, just like image_overlay.py
. The background will come from the video, so we don't need to deal with it, and we're going to modify the mask to fit the video at each frame, so we add orig_
to the mask names. The step of combining the masks and images comes at the end of detectAndOverlay()
(scroll up a bit).
After that, the routine video steps happen: create a feed, read frames, process, show. There's also code for creating a file to save the video, like in show_vid.py
.
Now, run the program by running python video_overlay.py
in the terminal.
Once you understand the program except for the frame processing step, scroll up and read through detectAndOverlay()
. The first, outer loop just finds a rectangle with a face, selects it out of the frame by indexing the appropriate x and y coordinates, and then detects eyes in that region of interest. The inner loop deals with the eyes detected - in this case, they're detected as a single wide, rectangular region, rather than 2. The rest of the code resizes and positions the overlay, making sure it fits in the face region of interest. Then the overlay, face region of the image, and masks are combined. The region of the face modified by the mask is pasted back in, and the modified image is returned.
The code in this activity is based on the tutorial here.
Step 6: Make it your own
Download your own PNG images to use as overlays, and add them to any video or image you want. You'll likely want to change what feature (or features) are being detected, using the included classifier files or ones you find online.
You'll probably need to change the scaling and alignment of the overlay. Do this where center_x
and center_y
are first set for the alignment, and where overlay_width
and overlay_height
are first set for the scaling. You may want to change which of width and height are computed first; just swap the words 'width' and 'height' in those two lines and the math will work out.
Step 7: Going further
Option 1: Combine this with the image processing activity to use your own filters and effects on the video, in addition to the AR overlays.
Option 2: Edit an existing video instead of your camera feed to make it funnier. Instead of passing 0 into cv2.VideoCapture()
, pass in a string with a file path to a video. For example, cap = cv2.VideoCapture('my/video/path.mp4')
).
If the program runs too slowly, try processing every 2nd or 3rd frame. Replace these lines:
while True:
ret, frame = cap.read()
With these:
frame_count = 0 # initialize frame counter
while True:
ret, frame = cap.read()
frame_count += 1 # increment the frame counter
if frame_count % 2 != 0: # if the frame count is not divisible by 2;
continue # continue through the loop without doing it this time.
The counter should be initialized right before the while loop, and the divisibility check should happen immediately after the video capture feed is read. The higher the number (2 in this example), the more video frames will be skipped.
continue
is a Python command that works inside of loops and causes the loop to immediately jump to the end of its code. That means whenever you call continue, the rest of the loop will be skipped over and it will start over with the next iteration. It doesn't stop the loop entirely - that's what break
does.