NodeJS: File Upload and Virus Scan
As part of this guide, I’m going to be uploading a file using the HTML input type file element and scan the uploaded file for virus(es) using the npm module, clamscan.
Prerequisites
- The clamscan module requires the clamdscan or clamav installed on your machine (or the server). If you are not familiar with clamav engine, I highly recommend you read about it first. Additionally, you can follow my guide on how to install clamav and/or clamdscan on Ubuntu.
- A node.js module to parse multipart-form data (I’ve used formidable — multiparty, busboy, multer are some of the well known alternatives).
Uploading the file
There is no surprise in this part to be honest. We will make use of the standard HTML file type input to capture the user’s file. The markup would look like -
<form method="POST" action="/scan-file">
<input type="file" id="field-someDocument" name="someDocument" accept="image/png,image/jpeg,application/pdf">
<input type="submit" value="submit" />
</form>
As you can see, this form on submission will hit the /scan-file
express route which we will get into later.
Handling Multipart-form data
When the above form is submitted, the form data is not sent via the usual application/x-www-form-urlencoded
but as multipart/form-data
. Traditional parsers like body-parser
cannot handle multipart-form data. Hence, as I mentioned earlier, we need to use special parsing libraries like formidable
, multer
, multiparty
etc which can handle multipart (in my example, I have used formidable
).
Let’s now write a middleware that handles multipart/form-data
and copies the formidable’s files
object into the express req
object, so we can use it later in the subsequent middlewares.
I have created a middleware for modularity and reusability purposes. You can easily skip using the middleware and directly parse the form data within the route method (
app.post('/scan-file’,
) — the choice is yours!
formidable
by default uploads the file to a temporary folder which can be modified by listening to thefileBegin
event.
Anti-Virus Scanning
Now, we can use the above middleware in our express route and access the file details with the req.files
object within the route method (app.post
) as shown below.
There are lot of things going on here, so let’s go through them one at a time.
Firstly, the NodeClam().init
is used to initialize the module using the supplied configurations. The values of properties socket
, config_file
, path
are from the server (or you local machine) where the clamdscan
is installed.
Again, the understanding of clamav and/or clamdscan configurations are beyond the scope of this guide. That is covered under my separate guide on how to install clamav and/or clamdscan on Ubuntu.
Moving on, the next point of interest is the local scanFile
function which holds the main anti virus check code using the clamscan
module. The usage of the clamscan.scan_file(
is pretty standard which you can read from their docs.
The .scan_file
function takes in a parameter which is the full path of the file to be scanned (this is the path where formidable
uploads the file). The full path can be obtained & passed to the local scanFile
using the req.files.someDocument.path
(or files.someDocument.path
if you have not used the middleware) where someDocument
is the value of the HTML name attribute.
Finally, the .scan_file
function returns a Promise which contains is_infected
flag and viruses
(array), if any viruses were found while scanning. In our example, if the file is found to be infected, we log and return an Error
object.
That’s about it! Hope you guys found that useful. 😀
Conclusion
I understand that the prerequisite of setting up & understanding the clamav engine might be bit of a learning curve but trust me it’s well worth it! It’s open source, high performant (even faster with clamav-daemon), versatile and it makes sure the files that are uploaded into your application are screened before it enters your server.