Post by mtriplett on Nov 28, 2020 17:22:11 GMT
In this installment, I will outline some practical use cases for neural nets in hobby robotics. I have personally tried or implemented most of these in Python on a LattePanda Alpha running windows and in some cases on an Intel Neural Compute Stick 2.
Replace Procedural Logic on a Bot with a Neural Net
Classify an Image (or a sub-image from it) from a List of 1000 or more Classes
Recognize Many Objects at the Same Time (from 20-80 Different Types of Objects) in Live Video
Answer a Question using a "BERT-based Transformer Model"
Recent History Example - "Where did I leave my keys?"
Personal Question Example - "What do you do?"
Factual Question Example - "Where was Bono born?"
Find Something to Say About a Subject
Find Something to Say Based on a Conversation
Some Other Use Cases I Have Either Tried or Hope to Try Soon:
I have tried some of these, but not all.
Many models can sound like they have a lot of promise, but each model is only as good as the training data that went into it, and the similarity between that data and the real world inputs your bot will encounter.
For this reason, expect that you will try several models before finding one that works for you and your bot.
This list is far from a complete one, I have attempted to give you ideas based on techniques
I have mostly tried.
Where You Can Find More Ideas, Demos, and Pre-Trained Models
One place to look to get more ideas on what is out there is the Intel Model Zoo & Demos. Please note, you don't typically have to have their hardware to run these models.
docs.openvinotoolkit.org/2019_R1/_docs_Pre_Trained_Models.html
github.com/openvinotoolkit/open_model_zoo/blob/master/demos/README.md
For natural language "transformer" models...HuggingFace is the place to go:
huggingface.co/models
Replace Procedural Logic on a Bot with a Neural Net
- Using a very basic 3 layer "fully connected" neural net that you can code yourself in python, C# or any other language, you can use one or more neural nets to implement any number of robot behaviors.
- This could be used for obstacle avoidance, autonomus decision-making, roaming, etc.
- The advantage of this is code elimination and avoiding lots of if/then/else logic that can be very hard to maintain over time as a bot gets more complex.
- Using this technique, you simply add more and more training data over time and retrain the network.
- Each record in the training data represents a "If this (input), then do that (output)." condition.
- Effectively, the network, once trained, will implement all the situations (conditions) you have added, without having any procedural code to write and maintain.
I recommend everyone try this once, just to learn and demystify the basics of a 3 or 4 layer fully connected neural net.
Classify an Image (or a sub-image from it) from a List of 1000 or more Classes
- Using AlexNet or one of several other image classification models, your bot could recognize a thing it is looking directly at.
- This type of model has a huge limitation, it can only classify an image as a single class.
- To get around this limitation, you could break up a video frame and classify parts of the image individually if there are sub-areas that can be segmented out for classification.
- I personally prefer to use one of the models in the next section.
Recognize Many Objects at the Same Time (from 20-80 Different Types of Objects) in Live Video
- Using YOLO or other models, your bot could recognize the presence of many people, pets, plants, televisions, and many over common household objects in its field of view.
- This is a big upgrade over image classification, as a robot will typically not be looking directly at a single thing.
- The models will typically return multiple objects per video frame with a location, width, height, and probability score.
- This is one of my favorite visual techniques, and works well in a variety of lighting conditions.
Answer a Question using a "BERT-based Transformer Model"
- Using a special type of neural net, called a "Transformer", your bot can answer natural language questions. Given a body of text (a corpus) and a question, this type of model can provide surprisingly good answers to lots of questions.
- Often the questions can be ambiguous or use words not even present in the corpus.
- If I said "I went to the gym at 4:00.", and later asked "When did I work out?", the network would understand the common-sense implication between a "gym" and "working out".
This is one of the truly exciting things about "Transformers", they learn these connections automatically. - The important limitation to understand when implementing this is that you can only analyze around 500 words at a time. You can do things in a loop for longer texts, or focus on extracting a subset of your text for analysis.
Recent History Example - "Where did I leave my keys?"
- If your bot kept a text record of what you said, this could be used as a corpus to answer this and many other questions.
Personal Question Example - "What do you do?"
- Imagine if your bot kept a text record that represented its own identity, everything about it, its history, likes, etc. This text could then be used as a corpus to answer this and many other questions..."I do what I am told."
Factual Question Example - "Where was Bono born?"
- Imagine your bot has a wifi connection.
- Your bot could use this connection to download wikipedia or other pages and extract text from them.
- This text could then be used as a corpus to answer questions.
Find Something to Say About a Subject
- Using a "Masking" transformer model, a bot can find something reasonable to say in a given situation.
- For example, if the topic is cats, you can provide the model with a bunch of sentences with one word "masked" or blanked out. The model will provide the most probable word to fit in the blank.
- Lets say you provide a pattern such as "I think cats are _____." or "I like to _____ cats." The model would likely return words such as "cute", "adorable", "lazy", for the first one, and a different set of plausible words for the second one.
- If you were to have a list of patterns and pick a random one to use for the masking, then your bot would have a variety of ways to express itself.
Find Something to Say Based on a Conversation
- Using a "Text-Gen" transformer, a bot could find something to say at any given moment, using what was just said as a "seed".
This is highly dependent on the corpus the model was trained on. If it was trained on Shakespeare, the results would come up sounding like Shakespeare.
Be warned, this can lead to a lot of very unexpected and crazy results. If you want a bat crazy bot, this is the way to go.
Some Other Use Cases I Have Either Tried or Hope to Try Soon:
I have tried some of these, but not all.
Many models can sound like they have a lot of promise, but each model is only as good as the training data that went into it, and the similarity between that data and the real world inputs your bot will encounter.
For this reason, expect that you will try several models before finding one that works for you and your bot.
- Estimate a Person's Age, Gender, Emotional State - I got mixed results on this one. I think the models needed more training data.
- Estimate a Person's Eye and Face Direction in 3D from Video Frames
- Estimate all of a Person's Joints (their pose) in 3D
- Get a List of Facial Landmarks
- Perform Speech-to-Text Locally Using a Neural Net
This list is far from a complete one, I have attempted to give you ideas based on techniques
I have mostly tried.
Where You Can Find More Ideas, Demos, and Pre-Trained Models
One place to look to get more ideas on what is out there is the Intel Model Zoo & Demos. Please note, you don't typically have to have their hardware to run these models.
docs.openvinotoolkit.org/2019_R1/_docs_Pre_Trained_Models.html
github.com/openvinotoolkit/open_model_zoo/blob/master/demos/README.md
For natural language "transformer" models...HuggingFace is the place to go:
huggingface.co/models