Advancing sEMG Hand Signal Classification
Surface Electromyography (sEMG) captures the tiny electrical signals generated by our muscles when they contract. These signals are at the heart of many prosthetic control systems and human–computer interfaces. In this report, we explore two main strategies:
- Windowed FFT-based feature extraction: breaking the signal into fixed-length segments and analyzing frequency components
- Non-windowed modeling: feeding raw signal samples directly into machine learning models
Our goal is to understand how each approach balances accuracy and processing speed when classifying hand gestures.
Accurate and fast gesture recognition is crucial for applications like real-time prosthetic devices, where even small delays or misclassifications can impact the user experience. By comparing a traditional signal-processing pipeline against newer, end-to-end learning methods, we aim to find the sweet spot between performance and complexity.
Contents
- Problem Statement
- Data Source
- Methodology
- Evaluation Technique
- Results
- Conclusion & Future Work
- References
Problem Statement
Surface Electromyography measures the electrical activity produced by muscle contractions. Windowing segments signals into fixed intervals for robust feature extraction but introduces computational overhead and fixed temporal boundaries. Non-windowed methods model each timestep directly, potentially reducing latency. This study evaluates both approaches to determine their impact on classification accuracy and inference time.
Data Source
The dataset utilized in this research is the “EMG Data for Gestures,” available through the UCI Machine Learning Repository. It consists of raw electromyographic (EMG) signals recorded from 36 subjects using a MYO Thalmic bracelet worn on the forearm. The bracelet’s eight evenly distributed sensors capture myographic signals transmitted via Bluetooth. Each subject performed two series of six or seven static hand gestures, each held for 3 seconds followed by a 3-second rest period.
The data files contain ten columns: time (in milliseconds), eight sEMG channels, and a class label indicating the gesture performed. Class labels are defined as:
- 0 — unmarked data
- 1 — hand at rest
- 2 — hand clenched in a fist
- 3 — wrist flexion
- 4 — wrist extension
- 5 — radial deviation
- 6 — ulnar deviation
- 7 — extended palm (performed by a subset of subjects)
For simplicity, data from all subjects were concatenated across recordings, focusing on a generalized classification problem rather than individual differences.
Methodology
Below is a detailed methodology description that includes the precise mathematical formulations from the original report. We compare a windowing-based signal-processing pipeline against a non-windowing strategy that models raw samples directly.
1. Experimental Setup
All models are trained and evaluated on a MacBook M3 Pro using Metal Performance Shaders (MPS) for GPU acceleration. Data are standardized and random seeds fixed to ensure reproducibility.
2. Windowing Approach
a. Segmentation into Windows
Let the raw dataset be ${t_i, x_i, y_i}_{i=1}^n$, where
- $t_i$ is the timestamp (ms),
- $x_i\in\mathbb{R}^M$ is the $M$-channel sEMG vector, and
- $y_i$ is the gesture label.
Applying a sliding window of length $W$ yields:
\[X^{(j)} = [\,x_i, x_{i+1}, \dots, x_{i+W-1}\], \quad Y^{(j)} = [\,y_i, y_{i+1}, \dots, y_{i+W-1}\],\]for $j=1,\dots,N$, where $N$ is the number of windows.
[Image Source](https://www.danorlandoblog.com/use-the-sliding-window-pattern-to-solve-problems-in-javascript/)
b. Frequency-Domain Transformation (FFT)
For each channel $m=1,\dots,M$, perform a Discrete Fourier Transform on each windowed signal $x_m^{(j)}(n)$:
Record magnitude and phase:
Stacking across $M$ channels yields a feature vector $H^{(j)}\in\mathbb{R}^{2M(K+1)}$, where $K=\lfloor W/2\rfloor$.
[Image Source](https://www.researchgate.net/figure/Three-Components-of-a-Complex-Number-In-Phase-Quadrature-and-Phase-Incoming-radar-wave_fig2_332511933)
c. CNN for Frequency-Domain Feature Extraction
Treating frequency bins as a “spatial” axis and channels as input planes, a 1D convolution with filters $W^{(\ell)}\in\mathbb{R}^{F\times C_{\mathrm{in}}\times C_{\mathrm{out}}}$ and bias $b^{(\ell)}$ computes
where $f$ is LeakyReLU:
\(f(x)=\max(0,x)+\alpha\,\min(0,x).\)
Each block also includes BatchNorm, MaxPool and Dropout.
d. Deep Cross Network (DCN)
Given embedding $x^{(0)}$, each cross layer $\ell$ updates
where $\circ$ is the Hadamard (elementwise) product.
[Image Source](https://arxiv.org/pdf/2008.13535)
e. Multi-Layer Perceptron (MLP)
The final DCN output $x^{(L)}$ is passed through fully connected layers:
f. Model Architecture Overview
The complete architecture for the windowing approach is as follows:
3. Non-Windowing Approach
a. Raw Feature Input
Use each instantaneous vector $x\in\mathbb{R}^M$ without any segmentation.
b. Random Forest Classifier
As a baseline, average class probabilities from $T$ trees:
\[P(y=c\mid x) = \frac{1}{T} \sum_{t=1}^T P_t(y=c\mid x).\]
[Image Source](https://www.researchgate.net/figure/Architecture-of-the-Random-Forest-algorithm_fig1_337407116)
c. Outer-Product Neural Network (OPNN)
Compute the outer product $O = x\,x^\top\in\mathbb{R}^{M\times M}$, flatten to $z\in\mathbb{R}^{M^2}$, then
\[\hat y = \mathrm{softmax}\bigl(W_2\,f(W_1\,z + b_1) + b_2\bigr).\]
4. Loss & Optimization
All deep models use cross-entropy loss:
\[\mathcal{L}(\theta) = -\sum_{c} y_c \,\log \hat y_c,\]optimized with Adam.
[Image Source](https://pytorch.org/docs/stable/generated/torch.optim.Adam.html)
Evaluation Technique
Accuracy and inference time (ms/sample) are measured on an M3 Pro with Metal Performance Shaders. Accuracy is defined as:
\[\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}.\]Results
Conclusion & Future Work
Non-windowed deep learning offers higher accuracy and lower latency, suggesting that time-frequency segmentation may not be necessary for static gesture recognition. Future work should explore dynamic gestures and cross-user generalization.
References
- Olmo & Domingo (2020). EMG Characterization. Materials, 13(24), 5815.
- Raez et al. (2006). EMG Signal Analysis. Biological Procedures Online, 8, 11–35.
- Rani et al. (2023). sEMG & AI. IEEE Access, 11, 105140–105169.
- Asogbon et al. (2018). Window Conditioning in EMG. IEEE CBS.
- Krilova et al. (2018). EMG Data for Gestures. UCI Repository.