Last two days I’ve spent fixing Djiggit wrt international, esp cyrillics feeds.

Here are 3 points that I’ve learned:

  1. Read and follow instructions at docs/unicode.txt and communicate on the django-users list
  2. Use latest db adaptors
  3. Beware of your database encoding

Long story.

It took me several hours to trace why cyrillics is stored as ‘????’ in the database. Starting from top - my code - to the database itself.

Distilled down for mysql:

  • make sure you’re using latest python-MySQLdb: earlier versions have known problems with utf8
  • make sure that your db server is started with using utf8 charset in mind
  • backup your database as json using

     manage.py dumpdata > .json
  • recreate your database making sure it has “utf8″ character set and collation:

    mysql> show variables like 'ch%';
    +--------------------------+----------------------------+
    | Variable_name            | Value                      |
    +--------------------------+----------------------------+
    | character_set_client     | latin1                     |
    | character_set_connection | latin1                     |
    | character_set_database   | utf8                       |
    | character_set_filesystem | binary                     |
    | character_set_results    | latin1                     |
    | character_set_server     | utf8                       |
    | character_set_system     | utf8                       |
    | character_sets_dir       | /usr/share/mysql/charsets/ |
    +--------------------------+----------------------------+
    8 rows in set (0.00 sec)
  • load your data back from json dump:

    manage.py loaddata 

Happy hacking!