Web Analytics Made Easy -
StatCounter character encoding 101 - CodingForum

Announcement

Collapse
No announcement yet.

character encoding 101

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • character encoding 101

    We have a form. It's in a flash file. The flash file posts the form data to an ASP script, which writes it into an email and also into a SQL Server database.

    Finnish people are filling in our form. Our form does not like Finnish names.

    Let's say a Finnish person (on Windows) clicks in a form field and types Alt-0230. This will put an "æ" (Unicode U+00E6, ANSI code 0230) character in there. By the time it's been sent by the Flash, and read by the ASP, and written to the database and to the email body, it's turned into two characters.

    The first will always be an "Ã" (Unicode U+00C3, ANSI code 0195) character. The second depends on what character code went in originally. If, as in this case, it was ANSI code 0230, I'll get a "¦" (Unicode U+00A6, ANSI code 0166). If I subtract one from the input ANSI code, and enter an "å" (Unicode U+00E5, ANSI code 0229), I'll get the usual "Ã" followed by a "¥" (Unicode U+00A5, ANSI code 0165)

    It can hardly be coincidence that, if the ANSI code I enter increments or decrements, so does the ANSI code of the second character that's coming out. And that the sum to get the translated code is to add 64 to the original one. But what's the mechanism here? Why am I getting two characters coming out? Why is the first one always ANSI code 0195? What's the correlation?


    Last edited by Spudhead; Oct 12, 2006, 10:30 AM.

  • #2
    Well, I'm still no wiser.

    I've solved the immediate problem, by writing the whole lot to an ASCII text file, and attaching that to the email. Hardly elegant, but it works.

    I just wish I knew why.

    Comment

    Working...
    X