Building a Web Vulnerability Scanner

Written by LouisS | Published 2019/11/11
Tech Story Tags: vulnerabilities | cybersecurity | security | startups | fintech | devops | latest-test-stories | cybercrime

TLDR Security Audit Tool scans a given URL, detects the software and matches it with the last six months of vulnerabilities. It works by scraping the website and looking for clues that give away the software used. It serves as a reminder of the wide range of software in use that may have dormant vulnerabilities from the front-end JavaScript libraries to back-end caching layers. The best defense is to keep software up and stay alert to new vulnerabilities that could affect your stack. The tool is available on SecAlerts, a free security product that sends subscribers a customised report of vulnerabilities and security news relevant to their software stack.via the TL;DR App

In May this year I was part of a team that launched SecAlerts, a free security product that sends subscribers a customised weekly report of vulnerabilities and security news relevant to their software stack. The service was deliberately made a low-barrier of entry way to keep users informed and, as it nears 1,000 subscribers, the decision to 'keep it simple' appears to have merit.
At the end of August I added a new tool to SecAlerts. The Security Audit Tool scans a given URL, detects the software and matches it with the last six months of vulnerabilities. It serves as a reminder of the wide range of software in use that may have dormant vulnerabilities from the front-end JavaScript libraries to back-end caching layers.
I built the Security Audit Tool using an instance of Wappalyzer to detect the software from a given URL. This works by scraping the website and looking for little clues that give away the software used. For instance a WordPress blog will likely have wp-content in the HTML, Nginx will mostly respond with a Server: nginx response header. With these clues we build up a list of software that is likely used to power the website.
An example of the software detected when looking at https://reddit.com:
It identified Varnish, Python and Webpack. How did it know that? The rules in the source for Wappalyzer show that the check for Webpack will look for any JavaScript which contains the string webpackJsonp.
To detect Varnish it looks for specific response headers and in this case it finds
Via: varnish
Once there is a collection of software detected from a URL, they are converted to what's known as a CPE (Common Platform Enumeration). A CPE is a structured naming scheme used by public vulnerabilities to list the affected software and versions. The CPE for Varnish looks like
cpe:2.3:a:varnish-cache:varnish:*:*:*:*:*:*:*:*. 
With a lookup table we generate the CPE and search our vulnerability database to find vulnerabilities in the last six months that match.
This method of vulnerability scanning, while simple for the user, has its downsides. The detected software may be incorrect, some software might be impossible to detect and rarely is it able to accurately detect versions. Let's go over the pros and cons of different vulnerability scanners...
Remote scanning
It has very limited access to the system. It starts with an IP address or URL and has to find as much as it can from what the system or network reveals. Remote scanning is generally limited to remote attacks and other forms of remote detection like our Security Audit Tool. Other remote scanners can attempt to detect the software then run a set of benign attacks from public exploit databases.
This is generally not very useful unless your infrastructure has severely outdated software because public exploits are not frequently released or up-to-date and many exploits are not remotely exploitable (only 15% of exploits are remotely exploitable on Exploit-DB).
Some remote scanners will attempt an automated pentest by going over some basic heuristic checks such as SQL injection attempts in input fields, XSS by entering scripts into input, looking for hidden URLs in
robots.txt
, validating HTTP security headers, guessing common subdomains and paths (
/admin
,
/wp-admin
).
Depending on the capabilities of the scanner this can be worth the effort though generally a scan isn't necessary on a frequent basis as it will be running largely the same tests over and over again. A manual pentester would try different paths and techniques.
Local scanning
Has a much better chance of finding vulnerabilities as the scanner is installed on the system and can literally look through the file system to find the installed and running software. A great middle ground is known as "agentless scanning" where a scanner does not need to be installed on the target machine, it simply uses an SSH connection to gather the information.
While scanners can be useful in case a major public exploit affects you or some unknown services are open on a port, the best defense is to keep software up to date and stay alert to new vulnerabilities that could affect your stack.

Written by LouisS | Co-founder of software security startup, SecAlerts - secalerts.co
Published by HackerNoon on 2019/11/11