1 minute read

Character encoding schemes map characters to binary data. They are fundamental in both file formats and programming data representation.

Character Encoding Schemes

ZhiZe recommends using Unicode’s UTF family encoding scheme:

  • Unicode

The following are some of the schemes that are Unicode compatible or related:

  • ASCII
  • ISO 8859

The following encoding schemes can be replaced by Unicode or are deprecated:

  • Big5
  • GB Character Encoding Scheme Series
  • IBM PC Code Page 437
  • JIS Character Encoding Scheme Series
  • KS X Character Encoding Scheme Series

File Format

Files without extensions will be treated as plain text files by default. If you want to add a suffix to a plain text file, use the .txt.

Plain text files use UTF-8 encoding by default. Other UTF series encodings will add a BOM (Byte Order Mark) at the beginning of the file by default. They can therefore be distinguished from UTF-8.

Character or String Representation in Programming Models

Different programming languages use different encoding schemes. Which variant of UTF a programming language uses by default is determined by the programming language specification.

Generally speaking, even if a programming language uses UTF-16 or UTF-32 as the default character and string encoding scheme, it will not add a BOM at the beginning of the character or string data structure explicitly. Byte order is determined by the programming language’s compiler or runtime environment.