Torpeda is a framework for the construction of web traffic labeled datasets.
The aim of Torpeda is to provide a bunch of datasets in order to evaluate and compare the effectiveness of Web Application Firewalls.
All datasets included in Torpeda:
- are public and available to the community,
- are composed exclusively by labelled HTTP requests,
- are useful in the evaluation of WAFs (both anomaly-based and signature-based).
Torpeda is open to the contributions of new datasets, builded by other researchers and collaborators.
Evaluation is one of the main issues that researchers encounter when proposing a new system for web attack detection. Performance measurements are heavily related with the concrete data used for the evaluation. This fact makes it difficult to compare different systems using different evaluation datasets.
There exist some public datasets available to the community. However, very few of then are designed specifically to test WAFs. Torpeda aims to fill this gap, offering a common structure to create new web-based datasets in order to test and compare different detectors.
A Torpeda dataset is presented as a XML document.
A dataset contains an arbitrary number of samples (the larger number of samples, the better), that are identified by a unique id. The document looks like this.
Each sample represent a labelled HTTP request, and contain two major parts: captured data and labelling data.
The captured data section represents the description of the request itself. It contains only data that can be observed through a web sniffer. The request is described by its different components:
- Query (only GET requests).
- Body (only POST requests).
The labelling data section describes how the sample is classified by an expert. It contains data that the detector is supposed to predict. The following labels describe this section:
- Request type. The sample can be labelled as normal, anomalous or attack.
- Attack category. If the sample is labelled as attack, this section specifies the attack category.
Collaboration is the main point of Torpeda. Our goal is to provide not only a specification and tools to facilitate the construction of new datasets, but also a way to share and make them public.
To this end, we expect to collect as many datasets as possible and make them available to the community. This process is dynamic. Since web attacks (and detection systems) are continously changing, it becomes necesary to have more specific traffic datasets at disposal.
If you want to contribute with your data, contact us.