Welcome back to our three part series on computer vision. In the previous posts, we discussed convolutional neural networks (CNNs) and how transfer learning allows us to reuse pre-trained CNNs. This post will assume that you have a basic understanding of CNNs and transfer learning; we encourage you to reread the first two posts if you want a refresher on convolutional networks or transfer learning.

Introduction

We concluded the last post by talking about two of indico’s transfer learning APIs – Custom Collection and Image Features. To give you an idea of how you can use Custom Collection, we decided to show you how to make a smart security camera using Python and a Raspberry Pi. In case you haven’t yet explored indico’s Custom Collection API or missed the end of the last post, the gist is that it allows you to quickly train a custom classifier for images or text, using your own dataset. What sets it apart from other training algorithms is that it can make accurate predictions from 5-10 examples, instead of requiring 50-100 thousand examples. As you can probably guess, the Custom Collection API uses a pre-trained deep CNN to extract features from images via transfer learning. Whereas the previous blog posts focused on the theory behind computer vision, this post will focus more on how to build your own computer vision system.

Data Collection

To get started, let’s train a classifier to recognize different people in the office. As with most machine learning problems, the first step is collecting a dataset. We went around the office and took four different pictures of each person that we wanted the camera to recognize.
Training images for the algorithm of indico team members

(Algorithm training image from left to right) Background, Diana, Slater, Nathan, Luke

With the dataset ready, the next step is to write a script which uploads the photos to a collection and begins training it. I’ve written it in Python, but we support several other languages if you have another preference.

security_collection = Collection("security_camera")
user_images = {
  'luke': ['luke1.jpg', 'luke2.jpg', 'luke3.jpg', 'luke4.jpg'],
  'nathan': ['nathan1.jpg', 'nathan2.jpg', 'nathan3.jpg', 'nathan4.jpg'],
  'diana': ['diana1.jpg', 'diana2.jpg', 'diana3.jpg', 'diana4.jpg'],
  'slater': ['slater1.jpg', 'slater2.jpg', 'slater3.jpg', 'slater4.jpg'],
  'background': ['background1.jpg', 'background2.jpg', 'background3.jpg', 'background4.jpg']
}

for user, images in user_images.items():
    for image in images: 
        image_data = open(image).read().encode("base64")
        security_collection.add_data([image_data, user])
security_collection.train()

Now that we have a trained model, we’re ready to start making predictions. However, before we jump into making predictions, it’s important that we validate that the model is working. We collect a separate validation set, which the classifier had never seen before, and make sure that the classifier predicts accurately.
Validation image

(Validation image from left to right) Background, Diana, Slater, Nathan, Luke

security_collection = Collection("security_camera")
validation_images = {
  'luke': 'luke_validation.jpg',
  'nathan': 'nathan_validation.jpg',
  'diana': 'diana_validation.jpg',
  'slater': 'slater_validation.jpg',
  'background': 'background_validation.jpg'
}

for user, image in validation_images:
    image_data = open(image).read().encode("base64")
    prediction = security_collection.predict(image_data)
    print user, max(prediction.items(), key=lambda x:x[1])

>>>
... background ('background', 0.8732261163671003)
... diana ('diana', 0.7533351238106798)
... luke ('luke', 0.646517502291992)
... nathan ('nathan', 0.7010145157356292)
... slater ('slater', 0.66915308000605)

It appears that everything is working! All that’s left is writing a web app that we can check to see if someone we know is at the door.

Security Camera App

The security camera app has two components; a web server and a camera client. The camera client periodically uploads images to the web server. The web server in turn uploads the images to the indico API and displays the results via a web app.

camera_client.py
def run():
   cap = cv2.VideoCapture(0)
   while(True):
       ret, frame = cap.read()
       frame = frame[190:890, 610:1220, :] # Crop the Frame

       frame = cv2.imencode('.jpg', frame)[1]
       b64 = base64.encodestring(frame)
       try:
           requests.put("http://192.168.128.37:3000/last_update")
           requests.put("http://192.168.128.37:3000/security_camera",
                {"picture": b64}) # Send the frame to the server
       except:
           print "error"
       time.sleep(5)

if __name__ == "__main__":
   run()

For the camera client, we set up a Raspberry Pi to take photos every five seconds and send the image to a server written in node.js. The user interface is a simple react app that shows which person the classifier believes is at the door. You can check out the code for the user interface on my GitHub.

user interface for security camera

The security camera web app

Other Ways You Can Use Collections

The power of the Custom Collection API extends beyond classifying faces and can be used to quickly solve any image recognition task. For example, business owners can train a collection to detect the emotions of their users and track how happy their customers are. Alternatively, you could train a classifier on pictures of your pantry and alert you when your favorite food runs out.

This post concludes our three part series on computer vision. Hopefully, you now understand how CNNs and transfer learning can be used to make robust image detection algorithms. Additionally, you too can build computer vision algorithms by using the Custom Collection API. If you have any questions or need help getting started, feel free to reach out to us at contact@indico.io!

Suggested Posts

Machine Learning So Easy, Even Your Cat Could Do It (Part 2): Text Tags

Exploiting Text Embeddings for Industry Contexts

Data Science Deployments with Docker