We were bitten late last week by our email processing microservice getting completely locked up, memory ballooning, and eventually dying (I'm guessing an OOM).
Once we'd tracked the issue down to the email at fault, I was eventually able to work out that the problem was that the email was a text/plain; format=flowed reply from Mozilla Thunderbird to an HTML email containing ~10M of pictures.
What appears to have happened is that Thunderbird has converted the HTML to plain text which is then quoted (>) in the reply, and the images have been included as massive blobs of base64 encoded text on the end of the reply body.
FlowedDecoder is storing all the chunks, splitting on newline, doing its reflow processing, then joining again - and in our case there were >130k lines of base64 in the body.
We've got a temporary workaround tested where we just make the FlowedDecoder passthrough:
const FlowedDecoder = require("mailsplit/lib/flowed-decoder.js");
FlowedDecoder.prototype._transform = function _transform(chunk, enc, cb) { return cb(null, chunk); }
FlowedDecoder.prototype._flush = function _flush(cb) { return cb(); }
I'm hoping to start working on a proper fix this week where FlowedDecoder does proper stream decoding. I'm intending to have most of the logic in FlowedDecoder itself because libmime seems to only work with strings. Would this be acceptable, or would you really want libmime to be made stream aware instead?
Oh and thank you for a great set of email processing libs!
We were bitten late last week by our email processing microservice getting completely locked up, memory ballooning, and eventually dying (I'm guessing an OOM).
Once we'd tracked the issue down to the email at fault, I was eventually able to work out that the problem was that the email was a
text/plain; format=flowedreply from Mozilla Thunderbird to an HTML email containing ~10M of pictures.What appears to have happened is that Thunderbird has converted the HTML to plain text which is then quoted (
>) in the reply, and the images have been included as massive blobs of base64 encoded text on the end of the reply body.FlowedDecoder is storing all the chunks, splitting on newline, doing its reflow processing, then joining again - and in our case there were >130k lines of base64 in the body.
We've got a temporary workaround tested where we just make the
FlowedDecoderpassthrough:I'm hoping to start working on a proper fix this week where
FlowedDecoderdoes proper stream decoding. I'm intending to have most of the logic inFlowedDecoderitself becauselibmimeseems to only work with strings. Would this be acceptable, or would you really wantlibmimeto be made stream aware instead?Oh and thank you for a great set of email processing libs!