Mastodon

Error Decoding msgpack data: invalid byte sequence in UTF-8

My new app use msgpack to encode data before sending to server. On server side,
its a Sinatra app that decode and store the data to database.

The app works fine until I push real data. With real data the app crash with error
"invalid byte sequence in UTF-8".

After some lengthy investigation, I found the data I sent to server is decoded
incorrectly. The offending code look like this:

unpacked = MessagePack.unpack(data)

What could possibly gone wrong?

Turns out as discussed here,
msgpack is a binary serialization format, and it expects to unpack from a raw binary string. You
need to force the data string (from HTTP POST request) to binary encoding.

MessagePack.unpack(data.force_encoding(Encoding::BINARY))

Now msgpack unpack the data properly.

P.S. If you use JRuby and msgpack-jruby, beware another issue that, msgpack-jruby
behave differently than the MRI version. It will not use default_external
encoding, but you will need to explicitly specify the encoding during unpack. (As
discussed here)