Creating an Intuitive Robotic Manipulator Control With a Myo Armband

Recently I was given a Myo armband, and this article aims to describe how such a device could be exploited to control a robotic manipulator intuitively. That is, we will move our arm as if it was the actual hand of the robot. The implementation relies on the Robot Operating System (ROS), in particular, on the rospy API. I leave you the reference to the main project repository, the gesture recognition dataset repository, and the dataset itself.

What’s in a Myo

The Myo armband is an electronic bracelet produced by Thalmic Labs, which allows the owner to interact remotely with other devices. Have a look at the Thalmic Labs repository to get an idea of what you can do with it.

The device is equipped with an accelerometer, a gyroscope, and a magnetometer packed together inside a chip called MPU-9150. These sensors provide valuable data which can be exploited to understand where the device is in space and even how it’s oriented.

But the Myo it’s likely most known for its 8-channels EMG sensor, giving it the ability to sense the muscular activity of the arm of who’s wearing it. One can then process this kind of data to recognize the current gesture of the hand of the user. Machine learning is a popular choice here. For instance, people built applications allowing you to switch channels on your smart TV by waiving in and out your hand or turning on your smart light by snapping your fingers.

Let’s get a robotic hand

Our goal is to control a robot using the Myo. So it seems like we need a robot, which is quite expensive nowadays :(

Fortunately, there exist simulators, and indeed we’re gonna control a robot in a simulated environment using Gazebo. Which robot? We opted for a 7 degrees of freedom (DoFs) manipulator by Franka Emika. That’s a very popular choice in academic research, but it also finds space in some industrial environments. We stick with the Franka, especially because being a redundant manipulator, it’s capable of very dexterous moves, making it suitable for our intuitive control.

Now that we somehow have a robot, we need a way to talk with it. For this purpose, there exists the Robot Operating System (or just ROS), probably the de facto standard when it comes to open-source robotics projects. In particular, my code is based on rospy, which, as you might guess, is a python package allowing you to write code to interact with ROS. There are tons of libraries to work with the Franka with ROS, and you can find good working ones for the Myo armband as well. In particular, I’ve used ros_myo to “talk” with the bracelet, and libfranka to interact with the Franka. Let’s thank the authors, they lifted a lot of work from us.

How we control the robot

The manipulator moves accordingly to signals computed by a resolved-velocity controller. I don’t want to dive deep into robotics theory, but the choice was primarily driven by the fact that it doesn’t require the desired task space position as input. The user should drive the robot to his target pose by moving his own arm, intuitively.

This controller will impose the orientation of the Myo (i. e. of our hand) to the manipulator, whereas hand extension and flexion, when detected, trigger a linear velocity of +0.1m/s and -0.1m/s respectively over the approach direction of the end-effector (i. e. the Franka gripper). Naturally, whenever the hand is steady in a neutral pose, the linear velocity imposed on the end-effector will be zero.

In the above picture, you can see a block marked as 1/s. That’s the inner control loop of the manipulator, which directly talks with the robot joints and makes them move. It is provided by the Franka API and it’s compliant with the ROS control interface. It is marked as 1/s because that’s the symbol of an integrator in the frequency domain. Indeed, here we are assuming that the inner control loop works very well and it behaves like a perfect integrator (joint-space velocity, the dotted q, is integrated to provide the desired joint-space pose, q). Depending on the context, this assumption may be too optimistic.

Extract the orientation from the Myo

There are several methods to describe the orientation of a body in space with respect to a given reference frame. For instance, a popular choice is to rely on the roll, pitch and yaw angles. This way, we have a triplet of numbers, each one representing the angle w.r.t. x, y and z axes respectively. In robotics and ROS in particular, a common practice is to exploit unit quaternions to represent an orientation.

Given that the Myo has a built-in magnetometer, it can already give us its current orientation. Unfortunately, I found a big issue with that. Namely, it’s not clear to which reference frame the provided orientation refers to. Even worst, while recording multiple orientations from the same Myo pose, the results were sometimes different. It could be an issue with the firmware or the magnetometer itself because this device can greatly suffer from the nearby presence of other electrical devices. The point is that I couldn’t trust the orientation of the Myo this way.

To address the problem of reliable orientation, we exploit the imu_filter_madgwick ROS package. It provides an out-of-the-box implementation of Madgwick’s filter, an algorithm that fuses angular velocities (from the gyroscope) and linear accelerations (from the accelerometer) to compute an orientation wrt the Earth’s magnetic field. Briefly, the filter first computes an orientation integrating gyroscope readings. This alone would quickly accumulate errors over time, making it useless. To this end, the filter also provides a second orientation, computed with a gradient descent algorithm applied to accelerometer data. This second orientation serves to correct the long-term error of the first one.

Once we have a reliable source of orientation, we need to describe it wrt to the base link of our manipulator. Here, we assume that we know the orientation of the base link frame wrt to the Earth's global frame. Since we are working in a simulation, we are free to arbitrarily choose that the global frame in the virtual world coincides with the one of the Earth. In particular, given that our gazebo world also has the base link frame aligned with the world frame, we can directly say that Madgwick’s filter gives us the orientation of the Myo related to the base link of the Franka. Exactly what we need.

Preprocessing for Madgwick’s filter

Because we aim to control the manipulator as if it was our arm, it’s just easy to think that our hand acts as the gripper. It follows that our forearm should be aligned with the approach direction (z-axis) of the Franka end-effector. The accelerometer and gyroscope of the Myo are oriented differently, as they have the z-axis orthogonal to our forearm. So we need a change of frame to account for that, as depicted below.

Then, we have to realize that both the gyroscope and the accelerometer are very noisy sensors. To this end, I find it beneficial to filter their data using the Stance Hypothesis Optimal Estimation¹ (SHOE) detector, before feeding them to Madgwick’s algorithm. This is a simple formula that computes a score out of angular velocity and linear acceleration readings, weighted by their variances. We say that the stance hypothesis is confirmed (i. e. the device is stationary) or rejected (i. e. the device is moving) whenever the score is below or above a given threshold. The idea is to feed data to Madgwick’s filter only if we detect that the device (our arm) is moving. This prevents the algorithm to keep computing useless micro-changes in orientation due to subconscious moves or vibrations of our arm.

In the above video, we are just logging to screen the angular velocity and acceleration of the Myo. It’s not important to read the values. What matters is that, as you can see, on the left side of the screen data is changing continuously, even when my arm is not moving significantly. On the contrary, the right side of the screen shows the same data filtered with the SHOE detector. This time values get updated much more coherently with the true motions of my arm.

Gesture recognition

Quick recap: we want to exploit the EMG sensor of the Myo for gesture recognition. We need 2 different gestures to impose a +0.1m/s or -0.1m/s velocities along the z-axis of the end-effector. Additionally, we want to make the gripper close when we squeeze our hand into a fist gesture. Finally, we need to recognize whenever our hand is in a neutral pose in order not to send any of the previous commands to the robot.

This project uses the scikit-learn implementation of a Support Vector Machine (SVM) trained for gesture recognition. I tried several other machine learning classifiers, but SVM turned out to be the best. Furthermore, it involves just dot-products, a fast operation for nowadays machines to carry on. The speed of the algorithm is crucial to us since we’re gonna work in a real-time setup.

Of course, any machine learning algorithm requires a proper dataset to train on. There are some even good-quality datasets available online for gesture recognition using the Myo armband. Unfortunately, most of them assume EMG readings in the form of an 8-bit unsigned integer (aka a byte). My device provides data as 16-bit signed integers. It may be due to a different firmware version or the ROS package used to interact with the bracelet. Anyway, I didn’t achieve satisfactory results training on those datasets.

I ended up collecting my own small dataset by asking a bunch of friends and relatives to perform hand gestures while wearing the Myo. The protocol was very simple: for each candidate, I recorded 2 sessions for each of the three target gestures (fist, flexion and extension). A single session lasted a minute, during which the candidate was asked to alternate between the neutral and the target gesture every 5 seconds. In the meanwhile, every EMG reading coming from the Myo was recorded. The final dataset was then chunked into samples in a sliding window fashion. A single sample comprises 30 consecutive 8-channels EMG readings. Two subsequent samples have an overlap of 10 readings. This choice should enable approximately 2 gesture predictions per second.

Once I had the data, knowing the extent (60 seconds) of a session, the target gesture performed in every session, the span of every gesture (5 seconds), and the frequency of the EMG sensor (about 50 hertz), I could annotate every sample with the proper gesture label. This approach is theoretically optimal, but in reality, it can’t be. The EMG is not always working at 50 Hz, and a candidate is a human being rather than a perfect machine, so sometimes a gesture lasts more than 5 seconds, sometimes less. So you are going to accumulate errors while annotating samples this way within a recording session. It turned out that a better solution was to annotate data by using a clustering algorithm, in particular, I chose the popular K-means. While SVM is a supervised machine learning classifier, this one belongs to the family of unsupervised learning algorithms. This means that it can infer knowledge from data without a supervised signal (i.e. labels). So I simply run the K-means on the whole dataset, partitioning it into 4 different clusters. The label of a cluster was set as a label for every one of its samples. I then trained the SVM on this dataset.

The dataset has been uploaded and is free to access on Kaggle. There, you will find a quick notebook on which you can test the performance of an SVM on the data annotated with both the labels “by hand” and the labels provided by the K-means. The test runs a 5-fold cross-validation. As you can see, using hand-made labels, the SVM performs quite well. We are in the nearby of 0.9 both in accuracy and F1-score (it’s better to consider the latter since the dataset is highly unbalanced towards the neutral gesture class). But if we train the classifier on the very same dataset but using the class labels provided by the K-means, we raise it to nearly 0.98 in both metrics. This gap shouldn’t surprise us. The clustering indeed annotates data based on patterns naturally found in the data itself, which is the way you’re supposed to exploit machine learning. On the other hand, the labels put by me only rely on time, but in practice we know that’s gonna make errors, so a classifier would learn from bad data.

Now I have to stress one thing: what I’ve done here, that is using a clustering algorithm to annotate data for supervised learning, cannot be done most time. Machine learning would be a lot easier otherwise. The point is that cluster labels provided by the K-means have no semantical meaning at all! For instance, you know that sample x belongs to cluster “3", but actually, you don’t know which class is “3”. Fortunately, in this project, given that we have only 4 possible labels and the classifier works pretty well, it was straightforward to experimentally find the associations from the cluster labels to the related gestures. But in general, this cannot be done. Think if you had 1000 different clusters and you have to match 1000 different class labels…

The following is an attempt to visualize the entire dataset. The 2 reddish circles represent class labels and you may see that there are indeed differences between those provided by me (inner) and those given by the K-means (outer). The bluish circle is EMG readings: the more it gets close to yellow, the more powerful it was the recorded muscular activity.

Here we find only a subset of the dataset. This time you may better recognize that the bluish circle is chunked into many circular sectors, everyone being a single sample. Notice also the orthogonal subdivision into 8 different parts. That’s because the Myo EMG has 8 different channels.

As we previously said, a single sample entails 30 consecutive EMG readings. If you are familiar with numpy, we can state that a sample is a numpy array with shape [30, 8]. We can now flatten it into a one-dimensional array of shape [150] and feed it to the SVM. However, with just a little bit of preprocessing, we will reach better performances. Indeed, I didn’t train the classifier on this raw data. Every sample undergoes a reduction operation over the time axis (the first dimension of the array). Specifically, I take the median of the time series represented by a sample. We’re thus switching from a [30, 8] to just an [8] array. This is beneficial because we have less data to process and it allows us to better generalize, making the classifier more robust, especially wrt outliers. Furthermore, we also normalize between 0 and 1 the resulting feature vector.

About kinematics singularities

There’s one last point to cover in our project. Recall that the Franka is a redundant manipulator. It means that it can reach the desired pose of the end-effector in potentially more than one configuration. That’s a desirable property in general, but it doesn’t come for free. There are indeed some joint configurations that make further motion very unease for our manipulator. Those are called kinematics singularities.

The resolved-velocity control implemented in this project comprises two different operative modes. The first one is the simplest and basically just ignores singularities. With this setup, the Franka probably will soon get stuck into one of these bad configurations and we will need a little patience to get it out.

The second mode implements the revised resolved-velocity control loop presented in A purely-reactive manipulability-maximising motion controller². Shortly, it aims to maximize the measurement of manipulability (MoM). You can think about it as a score of how much easy the manipulator can move away from its current joint configuration. Thus, if we maximize this quantity during our control loop, we keep the Franka far from its singularities. The below images may help you visualize what a kinematics singularity is.

To drive the robot in those poses, I made the very same motion with my arm, but it resulted in two different behaviors. On the left, we find the Franka in one of its singular configurations, with the 6th joint vertically aligned with the base link. On the right, the singularity has been avoided thanks to the advanced control loop and the robot is ready to move to any other configuration.

Conclusions and limitations

Of course, this project is far from perfection and by no means ready to be used in any real-world environment. It is supposed to be a proof of concept for further developments, although I confess that even now it’s quite funny to use the Myo to control the Franka :) One of the most critical points regards the difficulty to execute all the computations required to keep our code up and running.

To bring in some context, the nominal frequency of the Franka manipulator is 1 Khz. By downgrading and disabling some of the Gazebo physical engine features, I’m able to run the program at around 100–200 Hz. That’s while using the simple control loop, without singularity avoidance. Introducing the advanced resolved-velocity control, the frequency drops to the nearby of 5 Hz. That’s 200 times less than the frequency the robot is supposed to work at.

The algorithm involves solving every time a quadratic programming problem and my machine is not able to handle it in real-time. I made some attempts to improve this, really with no significant gain. One idea could be to employ the simple controller by default, resorting to the advanced one as we get close to a singularity. More broadly, I think switching from python to C++ could make a huge difference. Naturally, if you have a powerful machine you may already get an enjoyable experience without even noticing these problems.

Anyway, if you come so far, let me thank you. I’m aware that I’ve just reported a brief description of the project, without discussing in detail neither the code or the theory behind it. The purpose was just to elicit your curiosity and give you an idea of what can be achieved with some robotics and data science background. If you have time, will, and a Myo armband at hand, feel free to extend this project!

References

[1] I. Skog, P. Handel, J. -O. Nilsson and J. Rantakokko, “Zero-Velocity Detection — An Algorithm Evaluation,” in IEEE Transactions on Biomedical Engineering, vol. 57, no. 11, pp. 2657–2666, Nov. 2010, doi: 10.1109/TBME.2010.2060723.

[2] Haviland, J. and Corke, P., 2020. A purely-reactive manipulability-maximising motion controller. arXiv preprint arXiv:2002.11901.

This post was originally published here.