Characters in SMS messages

Space Search
Searching Messaging API
Table of Contents

Textual, non-unicode SMS messages are always encoded using the default character set specified in the GSM 03.38 specification. This is the limitation imposed by the GSM specifications, and is not dependant on the transfer-encoding used in the API's (e.g. UTF-8, URL encoding, etc.).

Certain characters require so called escaping, meaning that they require two characters in the message instead of one. The gateway takes care of this automatically, but it should be kept in mind when counting SMS message lengths. The characters that are escaped are:

Character
Description
Hex (unicode)
|
VERTICAL BAR 0x007C
^
CIRCUMFLEX ACCENT 
0x005E

EURO SIGN
0x20AC
{
LEFT CURLY BRACKET
0x007B
}
RIGHT CURLY BRACKET 
0x007D
[
LEFT SQUARE BRACKET 
0x005B
~
TILDE
0x007E
]
RIGHT SQUARE BRACKET 
0x005D
\
REVERSE SOLIDUS (BACKSLASH) 
0x005C

The Euro sign needs special attention when being passed through the API's. Please note that the Latin-1 (ISO-8859-1) character set does not include the Euro sign, and cannot be passed using Latin-1. Please use UTF-8 or ISO-8859-15 instead.

Labels

 
(None)