Skip to content

Latest commit

 

History

History
21 lines (13 loc) · 1.93 KB

README.md

File metadata and controls

21 lines (13 loc) · 1.93 KB

Please download the all the example datasets here.


Dataset description

Three NetFlow datasets: Netflow data has the following schema TBD

  1. UGR16 dataset consists of traffic (including attacks) from NetFlow v9 collectors in a Spanish ISP network. We used data from the third week of March 2016.
  2. CIDDS dataset emulates a small business environment with several clients and servers (e.g., email, web) with injected malicious traffic was executed. Each NetFlow entry recorded with the label (benign/attack) and attack type (DoS, brute force, port scan).
  3. TON dataset represents telemetry IoT sensors. We use a sub-dataset (“Train_Test_datasets”) for evaluating cybersecurity-related ML algorithms; of its 461,013 records, 300,000 (65.07%) are normal, and the rest (34.93%) combine nine evenly-distributed attack types (e.g., backdoor, DDoS, injection, MITM).

Three PCAP datasets:

  1. CAIDA contains anonymized traces from high-speed monitors on a commercial backbone link. Our subset is from the New York collector in March 2018. (Require an CAIDA account to download the data)
  2. DC dataset is a packet capture from the "UNI1" data center studied in the IMC 2010 paper.
  3. CA dataset is traces from The U.S. National CyberWatch Mid-Atlantic Collegiate Cyber Defense Competitions from March 2012.

Zeek: Zeek logs have the following schema TBD

Wikipedia: The wikipedia web page view logs have the following schema TBD