Releasing the Balabit Mouse Dynamics Challenge Data Set

Published on 26 August 2016

Winning the gold medal at the Olympics is probably the single greatest achievement one can reach in sports. In data science, however, we believe there lies a challenge even greater than winning a competition; and that is designing an engaging one.

For a couple of months earlier this year, we have been busy integrating behavioral biometric capabilities into our User Behavior Analysis product. In order to detect unauthorized usage of user accounts we employ methods that analyze mouse and keystroke dynamics. This way even if attackers try to mimic the owner of the account they hacked, they will fail because no one is really capable of imitating the mouse and keyboard usage of a certain person.

Developing such algorithms were something that we truly enjoyed. We also discovered that biometric authentication is an active area of research pursued by professionals in academia and in the security industry. Sharing our excitement and data with the community seemed to be a perfect opportunity for fulfilling our dream of designing a data science challenge. This is how the idea of the Balabit Mouse Dynamics Challenge was born.

We formulated the task and provided the data set for solving it. The goal of the challenge was to protect a set of users from the unauthorized usage of their accounts by learning the characteristics of how they use their mouses. From March to May several teams tried their best to estimate the anomalousness of test audit trails based on the trail files provided for training the models.

For our cursor movement data set and a more comprehensive description please visit: https://github.com/balabit/Mouse-Dynamics-Challenge.

We hope you will have fun exploring our data as much as we did. You can even evaluate your solution with a part of the labels of the test trails.

Did we succeed in delivering an exciting challenge? We were given favorable feedbacks from the contestants. We managed to set up the task such that it was demanding but, having seen the performance of the models, not impossible. So far no one reported any kind of information leakage that would lead to unintended significant improvement in performance (which obviously does not imply that there are not any). On the whole, we believe it is a decent piece of work that we are happy to share. Enjoy!

by Arpad Fulop

Árpád is a data scientist at Balabit working on Privileged Account Analytics, part of Balabit's PAM solution. He applies machine learning and other analytical methods to computer network data in order to detect anomalies and discover security issues.

share this article
Mitigate against privileged account risks
Get in touch

Recent Resources

The top IT Security trends to watch out for in 2018

With 2017 now done and dusted, it’s time to think ...

The key takeaways from 2017’s biggest breaches

Like many years before it, 2017 has seen a large ...

Why is IT Security winning battles, but losing the war…?

When a child goes near something hot, a parent will ...

“The [Balabit] solution’s strongest points are the privileged session management, recording and search, and applying policy filters to apps and commands typed by administrators on monitored sessions.”

– The Forrester Wave, Privileged Identity Management, Q3 2016, by Andras Cser