As you know these escape sequences are valid HTML escaping of unicode characters. The general principal is always that you store the text in the database as characters i.e. not escaped at all and you apply any escaping needed when serving this content to a client. So it appears you need to convert these escaped characters into something you can store in your 8bit database.
Now in general I would suggest using unicode in which case you can just make sure the data being sent to you is correctly converted into unicode characters and then you just store the characters in the database. This would then work with any characters and not just the few you are having problems with. However it sounds like you do not want to move from 8bit to unicode. If that is the case anything you do will be something of a hack, but you can just use $replace on the data coming in to convert say "%u2019" to "'" before you store it in the database to 'normalize' the input. This solution will only cover a few characters where you can find a suitable replacement but it may be enough to get by for the short term while you investigate moving to unicode as a permanent solution.
- Log in to post comments