Computer vision can help detect cyber threats with surprising accuracy
This article is part of our reviews of AI research articles, a series of articles that explore the latest findings in artificial intelligence.
The growing interest of the last decade in deep learning has been sparked by the proven ability of neural networks in computer vision tasks. If you train a neural network with enough tagged photos of cats and dogs, it will be able to find recurring patterns in each category and categorize the invisible images with decent precision.
What else can you do with an image classifier?
In 2019, a group of cybersecurity researchers wondered if they could treat security threat detection as an image classification problem. Their intuition turned out well and they were able to create a machine learning model capable of detecting malware from images created from the contents of application files. A year later, the same technique was used to develop a machine learning system that detects phishing websites.
The combination of binary visualization and machine learning is a powerful technique that can provide new solutions to old problems. It shows promise in cybersecurity, but it could be applied to other areas as well.
Detect malware with deep learning
The traditional way to detect malware is to scan files for known signatures of malicious payloads. Malware detectors maintain a database of virus definitions that include opcode sequences or snippets of code, and they scan new files for the presence of those signatures. Unfortunately, malware developers can easily bypass these detection methods by using various techniques such as obfuscating their code or using polymorphism techniques to mutate their code at runtime.
Dynamic scanning tools attempt to detect malicious behavior at runtime, but they are slow and require you to set up a sandbox environment to test for suspicious programs.
In recent years, researchers have also tried a range of machine learning techniques to detect malware. These ML models have been successful in advancing some of the challenges of malware detection, including code obfuscation. But they present new challenges, including the need to learn too many features and a virtual environment to analyze target samples.
Binary visualization can redefine malware detection by turning it into a computer vision problem. In this methodology, files are executed through algorithms that transform binary and ASCII values into color codes.
In a paper published in 2019, researchers at the University of Plymouth and the University of the Peloponnese showed that when benign and malicious files are viewed using this method, new patterns emerge that separate malicious files. safe files. These differences would have gone unnoticed with traditional malware detection methods.
According to the newspaper, “Malicious files tend to often include ASCII characters from different categories, presenting a colorful image, while benign files have a sharper image and distribution of values.”
When you have such detectable patterns, you can train an artificial neural network to tell the difference between malicious files and safe files. The researchers created a visualized binary file dataset that included both benign and malicious files. The dataset contained a variety of malicious payloads (viruses, worms, Trojans, rootkits, etc.) and file types (.exe, .doc, .pdf, .txt, etc.).
The researchers then used the images to train a classifying neural network. The architecture they used is the Self-Organizing Incremental Neural Network (SOINN), which is fast and particularly efficient at dealing with noisy data. They also used an image preprocessing technique to reduce the binary images to feature vectors of 1024 dimensions, which facilitates and optimizes training of the patterns in the input data.
The resulting neural network was efficient enough to compute a training dataset with 4000 samples in 15 seconds on a personal workstation with an Intel Core i5 processor.
The researchers’ experiments showed that the deep learning model was particularly effective in detecting malware in .doc and .pdf files, which are the preferred medium for ransomware attacks. The researchers suggested that the model’s performance can be improved if it is adjusted to take file type as one of its learning dimensions. Overall, the algorithm achieved an average detection rate of around 74%.
Detect phishing websites with deep learning
Phishing attacks are becoming a growing problem for organizations and individuals. Many phishing attacks trick victims into clicking a link to a malicious website that advertises itself as a legitimate service, where they end up entering sensitive information like credentials or financial information.
Traditional approaches to detecting phishing websites revolve around blacklisting malicious domains or whitelisting secure domains. The first method misses new phishing websites until someone is victimized, and the second is too restrictive and requires considerable effort to provide access to all safe domains.
Other detection methods rely on heuristics. These methods are more accurate than blacklists, but they still do not provide optimal detection.
In 2020, a group of researchers from the University of Plymouth and the University of Portsmouth used binary visualization and deep learning to develop a new method of detecting phishing websites.
The technique uses binary visualization libraries to transform website markup and source code into color values.
As is the case with benign and malicious application files, when viewing websites, unique patterns emerge that separate safe and malicious websites. The researchers write: “The legitimate site has a more detailed RGB value because it would be built from additional characters from licenses, hyperlinks and detailed data entry forms.
While the phishing counterpart would typically contain a single or no CSS reference, multiple images rather than forms, and a single login form with no security scripts. This would create a smaller data input string when scratching.
The example below shows the visual representation of a legitimate PayPal login code versus a fake PayPal phishing site.
The researchers created a dataset of images representing the code of legitimate and malicious websites and used it to train a classification machine learning model.
The architecture they used is MobileNet, a lightweight convolutional neural network (CNN) that is optimized to run on user devices instead of high capacity cloud servers. CNNs are particularly suited to computer vision tasks, including image classification and object detection.
Once the model is trained, it is connected to a phishing detection tool. When the user comes across a new website, they first check if the URL is included in their malicious domain database. If this is a new domain, it is transformed through the visualization algorithm and executed through the neural network to check if it presents the malicious website patterns. This two-step architecture ensures that the system uses the speed of blacklisted databases and the intelligent detection of neural network-based phishing detection technique.
The researchers’ experiments showed that the technique could detect phishing websites with an accuracy of 94%. “Using visual representation techniques provides insight into the structural differences between legitimate and phishing web pages. Based on our first experimental results, the method seems promising and capable of quickly detecting a phishing attacker with great precision. In addition, the method learns from classification errors and improves its efficiency, ”the researchers wrote.
I recently spoke to Stavros Shiaeles, professor of cybersecurity at the University of Portsmouth and co-author of the two articles. According to Shiaeles, researchers are now preparing the technique for adoption in real applications.
Shiaeles is also exploring the use of binary visualization and machine learning to detect malware traffic in IoT networks.
As machine learning continues to advance, it will provide scientists with new tools to tackle cybersecurity challenges. Binary visualization shows that with enough creativity and rigor, we can find new solutions to old problems.
This article was originally published by Ben Dickson on TechTalks, a publication that examines technological trends, how they affect the way we live and do business, and the problems they solve. But we also discuss the evil side of technology, the darker implications of new technology, and what to look out for. You can read the original article here.