2021 IEEE International Conference on Cyber Security and Resilience

Full Program

Summary:

Nowadays, there is an increasing need for cyber security professionals to make use of tools that automatically extract Cyber Threat Intelligence (CTI) relying on information collected from relevant blogs and news sources that are publicly available. When such sources are used, an important part of the CTI extraction process is content selection, in which pages that do not contain CTI-related information should be filtered out. For this task, we apply supervised machine learning-based text classification techniques, trained on a new dataset created for the purposes of this work. Furthermore, we show in practice the importance of a good content selection process in a commonly used CTI extraction pipeline, by inspecting the results of the named entity recognition (NER) process that normally follows.

Author(s):

Panos Panagiotou    
Information Technologies Institute, CERTH, Thessaloniki
Greece

Christos Iliou    
Information Technologies Institute, CERTH, Thessaloniki
Greece

Konstantinos Apostolou    
Information Technologies Institute, CERTH, Thessaloniki
Greece

Theodora Tsikrika    
Information Technologies Institute, CERTH, Thessaloniki
Greece

Stefanos Vrochidis    
Information Technologies Institute, CERTH, Thessaloniki
Greece

Periklis Chatzimisios    
School of Science & Technology, International Hellenic University, Thessaloniki
Greece

Ioannis Kompatsiaris    
Information Technologies Institute, CERTH, Thessaloniki
Greece

 


Copyright © 2021 SUMMIT-TEC GROUP LTD