13 Oct, 2015

Secure File Uploads

by Jonn Callahan

Implementing secure file uploads is something a lot of developers struggle with. Not because they’re bad developers, but because of how difficult it can be to do correctly. This post is going to cover a few different methods for handling this common functionality and the possible pitfalls that come with each. Sample code snippets leveraging Python+Flask for each implementation are also provided. Additionally, there is a general checklist at the end which should help developers bring their apps up to a decent security level.

Database Storage

Although it’s an easy way to prevent possibly malicious files from touching your file system, saving uploads within a database comes with its own issues. An obvious one to worry about is the possibility of SQL injection. That really shouldn’t be much a concern, though; everybody parameterizes their queries these days (well, a guy can dream).

A less obvious potential issue has to do with performance bottlenecks. While web servers are specifically designed to handle large quantities of traffic, in most architectures, database servers are not (at least, not to the same degree). If an attacker is able to simultaneously upload several large files, he could tie up all available connections from the pool.

A second denial-of-service vector is disk space exhuastion. If old file uploads are not removed when a user uploads a new version of a file, the database may eventually run out of space.

The final possible issue with this strategy is with Insecure Direct Object Reference (IDOR). If each user’s uploaded files are supposed to be private, ensure whatever logic used to fetch these files can’t be manipulated to return uploads a user shouldn’t access.

import json,base64,os
from flask import Flask, request, redirect, url_for
from werkzeug import secure_filename

app = Flask(__name__)

#SQLAlchemy object for user uploads
class Uploads(db.Model):
	__tablename__ = 'uploads'
	#one upload per user so a userID is sufficient for a primary key. if there are multiple places to upload
	#a file, you can leverage a composite primary key using the user's ID and the name of the upload type
	userID = db.Column(db.Integer, primary_key=True, db.ForeignKey('user.id'))
	data = db.Column(db.String(40), unique=True)

	def __init__(self, userID, data):
		self.userID = userID
		self.data = data

@app.route('/postFileToDatabase', methods=['POST'])
def uploadToDatabase():
	file = request.files['file']
	success = False

	#verify the file exists, has an allowed extension, and is under our max file size
	if file and validateExtension(file.filename) and validateFileSize(file):
		#search the database via SQLAlchemy for an upload associated with our user
		upload = Uploads.query.filter_by(id=getuserId()).first()
		#if the user already has an upload, find and overwrite in database
		if upload is not None:
			upload.file = base64.b64encode(file.read())
		#otherwise, create a new upload database object
			upload = Uploads(userID=getUserID(),data=base64.b64encode(file.read()))
		#add our users upload to the database
		success = True

	return json.dumps({"Success" : success})

File System Storage

Storing user uploads on the file system can be a bit risky but shouldn’t pose too much of a problem if done correctly. While all files should be validated to minimize the chance of malicious code being saved (see the checklist below), this is most important when storing uploads on the file system.

If you are truly paranoid, you could even base64 encode the entire file before writing. Even if an attacker did find a way to execute files, he would have to decode the data first. And if an attacker is able to do that, you have bigger issues you should probably attend to besides designing file upload functionality.

The same possible denial-of-service vector exists in this context as well. Depending on how the implementation handles multiple sequential and concurrent uploads, an attacker may be able to exhaust disk space after saving files or memory while processing files.

The truly dangerous issues with this method stem from how files are saved. Ensure that users are not able to influence where files are saved on disk via directory control characters (../). If the functionality doesn’t check if the file exists before saving, a user may be able to overwrite critical files and weaken the overall security posture of the server or simply overwrite files owned by others.

Even if the destination directory cannot be influenced, an attacker may be able to overwrite files saved by other users. The easiest way to solve this issue is to generate filenames instead of using ones provided by users. Additionally, the directory in which files are saved should be non-executable and not web accessible. Instead of serving files directly, access functionality should fetch files according to a provided object reference. As with database storage, though, ensure that this functionality is not vulnerable to IDOR.

import json,base64,os
from flask import Flask, request, redirect, url_for
from werkzeug import secure_filename

app = Flask(__name__)

app.config['ALLOWED_EXTENSIONS'] = ['txt']
app.config['UPLOAD_FOLDER'] = '/var/www/myapp/upload-dir/'
app.config['MAX_FILE_SIZE'] = 1000000 #1MB limit

#Validate file size by parsing the entire file or up to MAX_FILE_SIZE, whichever comes first.
#This is done to prevent DoS attacks by forcing the system to parse the entirety of very large
#files to get the total size.
#This will force the file to be parsed twice, however; once for file size check, once to save
#the file data. Combine both to improve efficiency.
def validateFileSize(file):
	chunk = 10 #chunk size to read per loop iteration; 10 bytes
	data = None
	size = 0

	#keep reading until out of data
	while data != b'':
		data = file.read(chunk)
		size += len(data)
		#return false if the total size of data parsed so far exceeds MAX_FILE_SIZE
		if size > app.config['MAX_FILE_SIZE']:
			return False
	return True

def validateExtension(filename):
	return '.' in filename and filename.split('.')[-1] in app.config['ALLOWED_EXTENSIONS']

@app.route('/postFileToFileSystem', methods=['POST'])
def uploadToFileSystem():
	#fetch our file from the request
    file = request.files['file']

    success = False
    #verify a file was uploaded, it's extension is in our whitelist, and is under our max file size
    if file and validateExtension(file.filename) and validateFileSize(file):
    	#generate a new filename based on a user's ID
        newfilename = '%s_upload.%s'%(getUserID(), file.filename.split('.')[-1])
        #save data to file system
        file.save(os.path.join(app.config['UPLOAD_FOLDER'], newfilename))
        success = True

    return json.dumps({"Success" : success})

Amazon S3 Storage

Far and away the safest method for storing file uploads, this almost completely removes the security burden from the developer and places it on a third-party. However, S3 storage does come with a monetary cost that may or may not be considered cheap to a small startup. Amazon has a pricing page which lists out costs depending on storage and bandwidth requirements. The largest issue that arises from this is the possibility of an attacker leveraging the app to make a large number of requests to Amazon, driving up the monthly bill rate.

Ensure rate limiting is built into the app to prevent an attacker from sequentially uploading or downloading large amounts of resources stored on S3. I know you must be tired of hearing this, but since I still come across this issue on a regular basis, ensure whatever functionality is created to store or return user uploads is not vulnerable to IDOR. While directory traversal is still possible using S3 storage, it’s not nearly as severe. An attacker could only retrieve files stored in the bucket the associated AWS account can read.

import json,base64,os
import boto
from flask import Flask, request, redirect, url_for
from werkzeug import secure_filename

app = Flask(__name__)

#don't store secrets in source!!
app.config['AWSAccessKey'] = os.environ['ACSAccessKey']
app.config['AWSSecret'] = os.environ['AWSSecret']
bucketName = 'flask-s3saver-test'
uploadDir = 'upload-dir'

@app.route('/postFileToS3', methods=['POST'])
def uploadToS3():
	file = request.files['file']
	success = False

	if file and validateExtension(file.filename) and validateFileSize(file):
		#generate our new file name
		newFileName = '%s_upload.%s'%(getUserID(), file.filename.split('.')[-1])
		#open a connection to s3
		conn = boto.connect_s3(app.config['AWSAccessKey'], app.config['AWSSecret'])
		#retrieve our bucket
		bucket = conn.get_bucket(bucketName)

		#create our file on s3
		sml = bucket.new_key('/'.join([uploadDir,newFileName]))
		#save the file contents
		#set appropriate ACL

		success = True

	return json.dumps({"Success" : success})

if __name__ == "__main__":

General Checklist

No matter which method you leverage for securely handling file uploads from a user, the checklist below is an easy way to make sure you’re “mostly” secure. Obscure and weird use cases may still have their own set of problems, but this is a good starting point. A lot of details for these recommendations are also completely contextual and depend on indiviual use cases and business requirements. For example, while limiting file sizes to 100KB may be good for one use case, another use case may require a 50MB limit.

  • Limit the size of uploaded files to something reasonable for your use case. 99% of the time, 10MB will be sufficient, and even that may be overkill in some circumstances (such as uploading an avatar).
  • Limit the number of uploaded files. If a user is allowed to upload a series of files, ensure that there is a maximum number set. Without this check, it would be very easy for an attacker to exhaust hard drive space.
  • Delete “old” downloads. When a user uploads a new file, ensure older versions are either overwritten or explicitly deleted.
  • Validate file extensions. Review the business requirements for file upload functionality and create a whitelist of all allowed file types. Double check the logic for fetching the extension to ensure “malicious.pdf.exe”-like workarounds are not possible.
  • Validate MIME-type via magic bytes. Not many applications seem to implement this logic, but it can be incredibly useful in limiting the number of viable payloads available to an attacker. The “magic bytes” at the beginning of a file give an indication of the type of file. For example, the bytes “\x7F\x45\x4C\x46” indicate the file is an ELF. Most common languages and frameworks have a getMIMEType()-esque function which leverages magic bytes for determing a file’s MIME type.
  • Run an anti-virus scanner. The last step for the truly paranoid is to ensure all provided files are actively scanned by an anti-virus tool.

Remember that security is like an onion: it’s all about the layers. No single item in this post is sufficient in securing file upload functionality, but together they serve as a formidable hurdle for even the most experienced attackers.