Activity Recognition with XGBoost
19 human physical activities classified from smartphone and smartwatch sensor data. Final accuracy reached 85.6% after tuning an XGBoost model with randomised search.
85.6%
Overall accuracy
19
Activity classes
2
Sensor placements
≥96%
F1 on Sitting / Writing
Overview
Human activity recognition (HAR) matters for health monitoring, fitness tracking, elderly care, and sports analytics. This project built a multi-class classifier to distinguish 19 daily physical activities using raw inertial measurement unit (IMU) data captured simultaneously from a smartphone and a smartwatch.
XGBoost was selected for its strong handling of tabular sensor features, robustness to outliers, and efficiency on medium-scale datasets. Hyperparameter tuning via randomised search pushed accuracy past the 85% threshold.
Dataset & Activities
Sensor readings (three-axis accelerometer and three-axis gyroscope) were collected from two placements: a smartphone and a smartwatch. The dataset covers 19 labelled activity classes:
Methodology
Feature Engineering
Model
Hyperparameter Tuning
Evaluation
Best Hyperparameters
Per-class Performance (Phone)
Stationary activities (sitting, writing) were classified near-perfectly. High-motion activities with similar kinematics (stairs vs. walking, kicking vs. dribbling) remained the hardest to separate.
Key Findings
- Overall accuracy reached 85.59% after hyperparameter tuning, up noticeably from baseline defaults.
- Sitting (D) and Writing (Q) achieved ~96% F1, confirming that highly distinctive postures are easy to identify.
- Jogging (B) was frequently confused with Walking (A); the two share similar limb kinematics and are hard to separate without gait-specific features.
- Stairs (C) showed lower performance due to its overlap with walking in stride pattern and acceleration profile.
- Smartwatch and smartphone placements yielded different confusion profiles, suggesting sensor fusion could push accuracy further.
- On the watch, "Teeth (G)" and "Soup (H)" were often confused with Standing (E). Wrist movements alone are too ambiguous for fine-grained eating activities.
What I'd do differently
If I were to redo this project, I'd fuse the phone and watch features into a single model from the start instead of evaluating them separately. The confusion patterns were clearly complementary: watch data struggled with eating activities while phone data handled them better, and vice versa for some motion classes. I left that on the table.
I'd also skip the manual feature engineering and go straight to a CNN-LSTM on raw windowed signals. XGBoost was a solid baseline, but the temporal structure in IMU data is exactly what sequence models are built for. The 85.6% accuracy is respectable, but I think a well-tuned deep model could push past 90% without much more data.
Finally, I didn't think enough about deployment. An activity recognition model that can't run on a phone in real time is an academic exercise. I'd benchmark inference latency on actual mobile hardware next time.
Limitations & Future Work
- A single sensor modality per model limits robustness; fusing phone + watch features simultaneously is a natural next step.
- Larger, more diverse participant pools would improve generalisability across body types and movement styles.
- Deep learning approaches (LSTM, CNN-LSTM) could automatically learn temporal patterns without manual feature engineering.
- Real-time inference on-device was not evaluated. Latency and memory constraints of XGBoost on embedded hardware still need investigation.