Artificial truth

The more you see, the less you believe.

[archives] [latest] | [homepage] | [atom/rss]

Serving utf8 text files with nginx
Thu 22 September 2016 — download

I recently changed the theme of my blog (and likely hammered your RSS/Atom reader) to have a more lightweight, readable and responsive theme, a banner with a glitch effect in pure css, and the articles available in Markdown, so you can download them, instead of having to print web pages if you want to read them outside of your web browser, or archive them.

The latest article contained snippets of radare2's output, with fancy utf8-powered arrows. Unfortunately, despite the presence of the charset utf8; directive in my nginx configuration, the render was quite ugly:

[0x080485ba]> pd 20 @ sub.memcpy_8cb
â•’ (fcn) sub.memcpy_8cb 491
│           0x080488cb      83ec5c         sub esp, 0x5c
│           0x080488ce      c744244c0000.  mov dword [esp + 0x4c], 0
│           0x080488d6      c74424480000.  mov dword [esp + 0x48], 0
│           0x080488de      8b442464       mov eax, dword [esp + 0x64]
│           0x080488e2      83e001         and eax, 1
│           0x080488e5      85c0           test eax, eax
│       ┌─< 0x080488e7      751b           jne 0x8048904
│       │   0x080488e9      837c246477     cmp dword [esp + 0x64], 0x77
│      ┌──< 0x080488ee      7614           jbe 0x8048904
│      ││   0x080488f0      8b442464       mov eax, dword [esp + 0x64]
│      ││   0x080488f4      83e00f         and eax, 0xf
│      ││   0x080488f7      85c0           test eax, eax
│     ┌───< 0x080488f9      7509           jne 0x8048904
│     │││   0x080488fb      8b442468       mov eax, dword [esp + 0x68]
│     │││   0x080488ff      833800         cmp dword [eax], 0
│    ┌────< 0x08048902      750d           jne 0x8048911
│    │└└└─> 0x08048904      c74424100000.  mov dword [esp + 0x10], 0
│    │  ┌─< 0x0804890c      e99d010000     jmp 0x8048aae
│    └────> 0x08048911      8b442460       mov eax, dword [esp + 0x60]
│       │   0x08048915      8a00           mov al, byte [eax]
[0x080485ba]> 

The trick is that nginx only appends charset=utf8 to the Content-type header only if the MIME type is either text/html, text/xml, text/plain, text/vnd.wap.wml, application/javascript or application/rss+xml, and I'm serving the source of my articles with the type text/markdown.

I just had to add text/markdown to the charset_types option to serve utf8-powered documents:

[0x080485ba]> pd 20 @ sub.memcpy_8cb
 (fcn) sub.memcpy_8cb 491
           0x080488cb      83ec5c         sub esp, 0x5c
           0x080488ce      c744244c0000.  mov dword [esp + 0x4c], 0
           0x080488d6      c74424480000.  mov dword [esp + 0x48], 0
           0x080488de      8b442464       mov eax, dword [esp + 0x64]
           0x080488e2      83e001         and eax, 1
           0x080488e5      85c0           test eax, eax
       ┌─< 0x080488e7      751b           jne 0x8048904
          0x080488e9      837c246477     cmp dword [esp + 0x64], 0x77
      ┌──< 0x080488ee      7614           jbe 0x8048904
      ││   0x080488f0      8b442464       mov eax, dword [esp + 0x64]
      ││   0x080488f4      83e00f         and eax, 0xf
      ││   0x080488f7      85c0           test eax, eax
     ┌───< 0x080488f9      7509           jne 0x8048904
     │││   0x080488fb      8b442468       mov eax, dword [esp + 0x68]
     │││   0x080488ff      833800         cmp dword [eax], 0
    ┌────< 0x08048902      750d           jne 0x8048911
    │└└└─> 0x08048904      c74424100000.  mov dword [esp + 0x10], 0
      ┌─< 0x0804890c      e99d010000     jmp 0x8048aae
    └────> 0x08048911      8b442460       mov eax, dword [esp + 0x60]
          0x08048915      8a00           mov al, byte [eax]
[0x080485ba]> 

You can now enjoy the source of the articles with a real charset.