

Saving Webcompat images as a microservice
source link: https://www.otsukare.info/2019/11/20/saving-images-microservices
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Update: You may want to fast forward to the latest part… of this blog post. (Head explodes).
Thinking out loud on separating our images into a separate service. The initial goal was to push the images to the cloud, but I think we could probably have a first step. We could keep the images on our server, but instead of the current save
, we could send them to another service, let say upload.webcompat.com
with a HTTP PUT
. And this service would save them locally.
That way it would allow us two things:
- Virtualize the core app on heroku if needed
- Replace when we are ready the microservice by another cloud hosting solution.
All of this is mainly thinking for now.
Anatomy of our environment
config/environment.py
defines:
UPLOADS_DEFAULT_DEST = os.environ.get('PROD_UPLOADS_DEFAULT_DEST')
UPLOADS_DEFAULT_URL = os.environ.get('PROD_UPLOADS_DEFAULT_URL')
The maximum limit for images is defined in __init__.py
Currently in views.py, there is a route for localhost upload.
# set limit of 5.5MB for file uploads
# in practice, this is ~4MB (5.5 / 1.37)
# after the data URI is saved to disk
app.config['MAX_CONTENT_LENGTH'] = 5.5 * 1024 * 1024
The localhost part would probably not changed much. This is just for reading the images URL.
if app.config['LOCALHOST']:
@app.route('/uploads/<path:filename>')
def download_file(filename):
"""Route just for local environments to send uploaded images.
In production, nginx handles this without needing to touch the
Python app.
"""
return send_from_directory(
app.config['UPLOADS_DEFAULT_DEST'], filename)
then the api for uploads is defined in api/uploads.py
This is where the production route is defined.
@uploads.route('/', methods=['POST'])
def upload():
'''Endpoint to upload an image.
If the image asset passes validation, it's saved as:
UPLOADS_DEFAULT_DEST + /year/month/random-uuid.ext
Returns a JSON string that contains the filename and url.
'''
…
# cut some stuff.
try:
upload = Upload(imagedata)
upload.save()
data = {
'filename': upload.get_filename(upload.image_path),
'url': upload.get_url(upload.image_path),
'thumb_url': upload.get_url(upload.thumb_path)
}
return (json.dumps(data), 201, {'content-type': JSON_MIME})
except (TypeError, IOError):
abort(415)
except RequestEntityTooLarge:
abort(413)
upload.save
is basically where we should replace this by an HTTP PUT
to a micro service.
What is Amazon S3 doing?
In these musings, I wonder if we could mimick the way Amazon S3 operates at a very high level. No need to replicate everything. We just need to save some bytes into a folder structure.
boto 3 has a documentation for uploading files.
def upload_file(file_name, bucket, object_name=None):
"""Upload a file to an S3 bucket
:param file_name: File to upload
:param bucket: Bucket to upload to
:param object_name: S3 object name. If not specified then file_name is used
:return: True if file was uploaded, else False
"""
# If S3 object_name was not specified, use file_name
if object_name is None:
object_name = file_name
# Upload the file
s3_client = boto3.client('s3')
try:
response = s3_client.upload_file(file_name, bucket, object_name)
except ClientError as e:
logging.error(e)
return False
return True
We could keep the image validation on the size of webcompat.com, but then the naming and checking is done. We can save this to a service the same way aws is doing.
So our priviledged service could accept images and save them locally in the same folder structure a separate flask structure. And later on, we could adjust it to use S3.
Surprise. Surprise.
I just found out that each time you put an image in an issue or a comment. GitHub is making a private copy of this image. Not sure if it's borderline with regards to property.
If you enter:

Then it creates this markup.
<p><a target="_blank"
rel="noopener noreferrer"
href="https://camo.githubusercontent.com/a285646de4a7c3b3cdd3e82d599e46607df8d3cc/687474703a2f2f7777772e6c612d6772616e67652e6e65742f323031392f30312f30312f323533352d6d6973657265"><img
src="https://camo.githubusercontent.com/a285646de4a7c3b3cdd3e82d599e46607df8d3cc/687474703a2f2f7777772e6c612d6772616e67652e6e65742f323031392f30312f30312f323533352d6d6973657265"
alt="I'm root"
data-canonical-src="http://www.la-grange.net/2019/01/01/2535-misere"
style="max-width:100%;"></a></p>
And we can notice that the img src
is pointing to… GitHub?
I checked in my server logs to be sure. And I found…
140.82.115.251 - - [20/Nov/2019:06:44:54 +0000] "GET /2019/01/01/2535-misere HTTP/1.1" 200 62673 "-" "github-camo (876de43e)"
That will seriously challenge the OKR for this quarter.
Update: 2019-11-21 So I tried to decipher what was really happening. It seems GitHub acts as a proxy using camo, but still has a caching system keeping a real copy of the images, instead of just a proxy. And this can become a problem in the context of webcompat.com.
Early on, we had added s3.amazonaws.com to our connect-src since we had uses that were making requests to https://s3.amazonaws.com/github-cloud. However, this effectively opened up our connect-src to any Amazon S3 bucket. We refactored our URL generation and switched all call sites and our connect-src to use https://github-cloud.s3.amazonaws.com to reference our bucket.
GitHub is hosting the images on Amazon S3.
Otsukare!
Recommend
-
59
In this blog post, we discuss how the implementation of a Go microservice coded during a Hackathon is saving us $50,000 in costs per year.
-
11
Context The first week of January, we had to disable anonymous reporting. GitHub in a two steps strike blocked webcompat...
-
6
Closing issues in webcompat.comClosing issues in webcompat.com 04 Sep 2014 In between checking twitter vanity searches and denying LinkedIn connection requests, we’ve been adding features little by little to
-
11
A label editor for webcompat.com issuesA label editor for webcompat.com issues 04 Aug 2014 Today we finally deployed the ability to edit labels on issues at webcompat, for logged in users. Here’s what it looks lik...
-
11
An update on issues in webcompat.comAn update on issues in webcompat.com 14 Jul 2014 We just pushed an update to webcompat.com today that will allow you to view issues inside of the sit...
-
10
Introducing webcompat.com 22 Apr 2014 For the past couple of months—together with Alexa Roman and the rest of the MozWebCompat crew—I’ve been...
-
10
There's always a sequence of things I do before working on a new branch for webcompat.com development. # Probably one of the most used command for me. (See below) git status # back to the master branch git checkout master...
-
6
[worklog] Edition 051 Discussions on webcompat.comotsukare Thoughts after a day of work webcompat issues Keeping the light on. Triage, filtering, looking for conta...
-
5
[worklog] Understanding Gecko code to better understand Webcompat bugs.otsukare Thoughts after a day of work Last week-end, I went to see the work of a forest company in Kanagaw...
-
14
Monday, January 25, 2016. It's morning in Japan. The office room temperature is around 3°C (37.4F). I just put on the Aladdin. The sleep was short or more exactly interrupted a c...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK