Introducing the “One-Stop Anomaly Shop” (OSAS)
Authors: Tiberiu Boros, Data Scientist & Machine Learning Engineer; Andrei Cotaie, Technical Lead, Security Intelligence Team; Vivek Malik, Security Engineer; Kumar Vikramjeet, Security Engineer

OSAS is a new Adobe Security open-source project. It provides a security intelligence toolset aimed at discovering anomalies in a given dataset. The tool implements and combines several of the Adobe Security Intelligence Team’s previous research, white papers, and other open-source projects.
OSAS is able to run “out-of-the-box” and enables researchers to: (a) experiment with data sets; (b) control how the data set is processed and how the features are combined; and (c) can help shorten the path to finding a balanced solution for detecting security threats.
So, why OSAS?
Logs are not always straightforward. Security-related logs are even more heterogenous and verbose, often presenting a large feature-space due to the unbound nature of attribute values. Often when using machine learning (ML) algorithms and models this large feature-space can create an adverse effect known as data sparsity. This means that most supervised and unsupervised ML algorithms will struggle to find structure within the data and are likely to overfit and handle previously unseen examples poorly.
About OSAS
The “One-Stop Anomaly Shop” (OSAS) project helps reduce this effect by implementing a two-step approach to data processing:
1. Initially raw data is consumed and labeled (tagged) using standard recipes for different field types such as multinomial, text, and numeric values. In this phase, complex features are also created by combining different attributes — just like in the well-established feature engineering process for classical ML. Additionally, the tagging process allows rule-based assignment of labels to leverage in-domain knowledge from experts.
2. Next, the labels are used as input features for generic (unsupervised) or targeted (supervised) ML algorithms. For this OSAS offers three standard options, but we intend to add more.

(you can find more information on “Hubble,” another Adobe open-source project, on GitHub)
The automatic learning/tagging function allows OSAS to be used for a diverse range of datasets and projects. The “Expert Knowledge Based” tagging component makes it highly efficient at targeting security threats and shifts the underlaying operation from unsupervised learning towards a semi-supervised one, similar to a “Risk Based Alerting” model.
From the technical perspective, OSAS is a command-line interface (CLI) toolset that provides the following functionality:
1. Automatically generates a custom pipeline configuration file by guessing field types and creating complex features in certain cases.
2. Creates a pre-trained model for the custom pipeline, based on the configuration file and raw dataset.
3. Applies the previously created model on previously unseen data.
The open-source repository contains the full source code of the project. We also provide a Dockerized version, equipped with a WebUI and integrated with Elastic Search OpenDistro, for fast visual analysis of the results.
If you are interested in learning more about and/or contributing to further development of OSAS we encourage you to read our technical whitepaper and to check out the project’s Github repository.