java native2ascii的用法介紹

jopen 9年前發布 | 19K 次閱讀 Java開發 native2ascii

將非Unicode編碼字符轉化為Unicode編碼的字符,即國際化。

語法:native2ascii [options] [inputfile [outputfile]]


描述:如果outputfile沒有指定,標準輸出將用于結果輸出;如果inputfile沒有指定,標準輸入設備將用于輸入。

參數
-reverse
使用該參數將Unicode編碼字符轉換為本地編碼字符

-encoding encoding_name 
用于指定轉化時使用的字符編碼。默認編碼從系統屬性file.encoding獲取。后面的表格式字符編碼,指定encoding_name使用表格第一欄。

-Joption
該參數一般無需使用,用于指定java虛擬機的啟動參數。例如:-J-Xms48m設置虛擬機啟動時分配內存為48M 。

Example1:
native2ascii test.txt test_unicode.txt

test.txt文件內容:native2ascii測試

test_unicode.txt文件內容:native2asciiu6d4bu8bd5

Example2:
native2ascii test_unicode.txt test_gbk.txt -reverse

test_gbk.txt內容:native2ascii測試

Basic Encoding Set (contained in lib/rt.jar)
Supported by java.nio, java.io and java.lang APIs

Canonical Name for java.nio API

</th>

Canonical Name for java.io and java.lang API

</th>

Description

</th> </tr> </thead>

US-ASCII

</td>

ASCII

</td>

American Standard Code for Information Interchange

</td> </tr>

windows-1250

</td>

Cp1250

</td>

Windows Eastern European

</td> </tr>

windows-1251

</td>

Cp1251

</td>

Windows Cyrillic

</td> </tr>

windows-1252

</td>

Cp1252

</td>

Windows Latin-1

</td> </tr>

windows-1253

</td>

Cp1253

</td>

Windows Greek

</td> </tr>

windows-1254

</td>

Cp1254

</td>

Windows Turkish

</td> </tr>

windows-1257

</td>

Cp1257

</td>

Windows Baltic

</td> </tr>

ISO-8859-1

</td>

ISO8859_1

</td>

ISO 8859-1, Latin Alphabet No. 1

</td> </tr>

ISO-8859-2

</td>

ISO8859_2

</td>

Latin Alphabet No. 2

</td> </tr>

ISO-8859-4

</td>

ISO8859_4

</td>

Latin Alphabet No. 4

</td> </tr>

ISO-8859-5

</td>

ISO8859_5

</td>

Latin/Cyrillic Alphabet

</td> </tr>

ISO-8859-7

</td>

ISO8859_7

</td>

Latin/Greek Alphabet

</td> </tr>

ISO-8859-9

</td>

ISO8859_9

</td>

Latin Alphabet No. 5

</td> </tr>

ISO-8859-13

</td>

ISO8859_13

</td>

Latin Alphabet No. 7

</td> </tr>

ISO-8859-15

</td>

ISO8859_15

</td>

Latin Alphabet No. 9

</td> </tr>

KOI8-R

</td>

KOI8_R

</td>

KOI8-R, Russian

</td> </tr>

UTF-8

</td>

UTF8

</td>

Eight-bit UCS Transformation Format

</td> </tr>

UTF-16

</td>

UTF-16

</td>

Sixteen-bit UCS Transformation Format, byte order identified by an optional byte-order mark

</td> </tr>

UTF-16BE

</td>

UnicodeBigUnmarked

</td>

Sixteen-bit Unicode Transformation Format, big-endian byte order

</td> </tr>

UTF-16LE

</td>

UnicodeLittleUnmarked

</td>

Sixteen-bit Unicode Transformation Format, little-endian byte order

</td> </tr>

Not available

</td>

UnicodeBig

</td>

Sixteen-bit Unicode Transformation Format, big-endian byte order, with byte-order mark

</td> </tr>

Not available

</td>

UnicodeLittle

</td>

Sixteen-bit Unicode Transformation Format, little-endian byte order, with byte-order mark

</td> </tr> </tbody> </table>

Extended Encoding Set (contained in lib/charsets.jar)
Supported by java.nio, java.io and java.lang APIs

  • Canonical Name for java.nio API

    </th>

    Canonical Name for java.io and java.lang API

    </th>

    Description

    </th> </tr> </thead>

    windows-1255

    </td>

    Cp1255

    </td>

    Windows Hebrew

    </td> </tr>

    windows-1256

    </td>

    Cp1256

    </td>

    Windows Arabic

    </td> </tr>

    windows-1258

    </td>

    Cp1258

    </td>

    Windows Vietnamese

    </td> </tr>

    ISO-8859-3

    </td>

    ISO8859_3

    </td>

    Latin Alphabet No. 3

    </td> </tr>

    ISO-8859-6

    </td>

    ISO8859_6

    </td>

    Latin/Arabic Alphabet

    </td> </tr>

    ISO-8859-8

    </td>

    ISO8859_8

    </td>

    Latin/Hebrew Alphabet

    </td> </tr>

    windows-31j

    </td>

    MS932

    </td>

    Windows Japanese

    </td> </tr>

    EUC-JP

    </td>

    EUC_JP

    </td>

    JISX 0201, 0208 and 0212, EUC encoding Japanese

    </td> </tr>

    x-EUC-JP-LINUX

    </td>

    EUC_JP_LINUX

    </td>

    JISX 0201, 0208 , EUC encoding Japanese

    </td> </tr>

    Shift_JIS

    </td>

    SJIS

    </td>

    Shift-JIS, Japanese

    </td> </tr>

    ISO-2022-JP

    </td>

    ISO2022JP

    </td>

    JIS X 0201, 0208, in ISO 2022 form, Japanese

    </td> </tr>

    x-mswin-936

    </td>

    MS936

    </td>

    Windows Simplified Chinese

    </td> </tr>

    GB18030

    </td>

    GB18030

    </td>

    Simplified Chinese, PRC standard

    </td> </tr>

    x-EUC-CN

    </td>

    EUC_CN

    </td>

    GB2312, EUC encoding, Simplified Chinese

    </td> </tr>

    GBK

    </td>

    GBK

    </td>

    GBK, Simplified Chinese

    </td> </tr>

    ISCII91

    </td>

    ISCII91

    </td>

    ISCII91 encoding of Indic scripts

    </td> </tr>

    x-windows-949

    </td>

    MS949

    </td>

    Windows Korean

    </td> </tr>

    EUC-KR

    </td>

    EUC_KR

    </td>

    KS C 5601, EUC encoding, Korean

    </td> </tr>

    ISO-2022-KR

    </td>

    ISO2022KR

    </td>

    ISO 2022 KR, Korean

    </td> </tr>

    x-windows-950

    </td>

    MS950

    </td>

    Windows Traditional Chinese

    </td> </tr>

    x-MS950-HKSCS

    </td>

    MS950_HKSCS

    </td>

    Windows Traditional Chinese with Hong Kong extensions

    </td> </tr>

    x-EUC-TW

    </td>

    EUC_TW

    </td>

    CNS11643 (Plane 1-3), EUC encoding, Traditional Chinese

    </td> </tr>

    Big5

    </td>

    Big5

    </td>

    Big5, Traditional Chinese

    </td> </tr>

    Big5-HKSCS

    </td>

    Big5_HKSCS

    </td>

    Big5 with Hong Kong extensions, Traditional Chinese

    </td> </tr>

    TIS-620

    </td>

    TIS620

    </td>

    TIS620, Thai

    </td> </tr> </tbody> </table>

    Extended Encoding Set (contained in lib/charsets.jar)
    Supported by java.io and java.lang APIs

    Canonical Name

    </th>

    Description

    </th> </tr> </thead>

    Big5_Solaris

    </td>

    Big5 with seven additional Hanzi ideograph character mappings for the Solaris zh_TW.BIG5 locale

    </td> </tr>

    Cp037

    </td>

    USA, Canada (Bilingual, French), Netherlands, Portugal, Brazil, Australia

    </td> </tr>

    Cp273

    </td>

    IBM Austria, Germany

    </td> </tr>

    Cp277

    </td>

    IBM Denmark, Norway

    </td> </tr>

    Cp278

    </td>

    IBM Finland, Sweden

    </td> </tr>

    Cp280

    </td>

    IBM Italy

    </td> </tr>

    Cp284

    </td>

    IBM Catalan/Spain, Spanish Latin America

    </td> </tr>

    Cp285

    </td>

    IBM United Kingdom, Ireland

    </td> </tr>

    Cp297

    </td>

    IBM France

    </td> </tr>

    Cp420

    </td>

    IBM Arabic

    </td> </tr>

    Cp424

    </td>

    IBM Hebrew

    </td> </tr>

    Cp437

    </td>

    MS-DOS United States, Australia, New Zealand, South Africa

    </td> </tr>

    Cp500

    </td>

    EBCDIC 500V1

    </td> </tr>

    Cp737

    </td>

    PC Greek

    </td> </tr>

    Cp775

    </td>

    PC Baltic

    </td> </tr>

    Cp838

    </td>

    IBM Thailand extended SBCS

    </td> </tr>

    Cp850

    </td>

    MS-DOS Latin-1

    </td> </tr>

    Cp852

    </td>

    MS-DOS Latin-2

    </td> </tr>

    Cp855

    </td>

    IBM Cyrillic

    </td> </tr>

    Cp856

    </td>

    IBM Hebrew

    </td> </tr>

    Cp857

    </td>

    IBM Turkish

    </td> </tr>

    Cp858

    </td>

    Variant of Cp850 with Euro character

    </td> </tr>

    Cp860

    </td>

    MS-DOS Portuguese

    </td> </tr>

    Cp861

    </td>

    MS-DOS Icelandic

    </td> </tr>

    Cp862

    </td>

    PC Hebrew

    </td> </tr>

    Cp863

    </td>

    MS-DOS Canadian French

    </td> </tr>

    Cp864

    </td>

    PC Arabic

    </td> </tr>

    Cp865

    </td>

    MS-DOS Nordic

    </td> </tr>

    Cp866

    </td>

    MS-DOS Russian

    </td> </tr>

    Cp868

    </td>

    MS-DOS Pakistan

    </td> </tr>

    Cp869

    </td>

    IBM Modern Greek

    </td> </tr>

    Cp870

    </td>

    IBM Multilingual Latin-2

    </td> </tr>

    Cp871

    </td>

    IBM Iceland

    </td> </tr>

    Cp874

    </td>

    IBM Thai

    </td> </tr>

    Cp875

    </td>

    IBM Greek

    </td> </tr>

    Cp918

    </td>

    IBM Pakistan (Urdu)

    </td> </tr>

    Cp921

    </td>

    IBM Latvia, Lithuania (AIX, DOS)

    </td> </tr>

    Cp922

    </td>

    IBM Estonia (AIX, DOS)

    </td> </tr>

    Cp930

    </td>

    Japanese Katakana-Kanji mixed with 4370 UDC, superset of 5026

    </td> </tr>

    Cp933

    </td>

    Korean Mixed with 1880 UDC, superset of 5029

    </td> </tr>

    Cp935

    </td>

    Simplified Chinese Host mixed with 1880 UDC, superset of 5031

    </td> </tr>

    Cp937

    </td>

    Traditional Chinese Host miexed with 6204 UDC, superset of 5033

    </td> </tr>

    Cp939

    </td>

    Japanese Latin Kanji mixed with 4370 UDC, superset of 5035

    </td> </tr>

    Cp942

    </td>

    IBM OS/2 Japanese, superset of Cp932

    </td> </tr>

    Cp942C

    </td>

    Variant of Cp942

    </td> </tr>

    Cp943

    </td>

    IBM OS/2 Japanese, superset of Cp932 and Shift-JIS

    </td> </tr>

    Cp943C

    </td>

    Variant of Cp943

    </td> </tr>

    Cp948

    </td>

    OS/2 Chinese (Taiwan) superset of 938

    </td> </tr>

    Cp949

    </td>

    PC Korean

    </td> </tr>

    Cp949C

    </td>

    Variant of Cp949

    </td> </tr>

    Cp950

    </td>

    PC Chinese (Hong Kong, Taiwan)

    </td> </tr>

    Cp964

    </td>

    AIX Chinese (Taiwan)

    </td> </tr>

    Cp970

    </td>

    AIX Korean

    </td> </tr>

    Cp1006

    </td>

    IBM AIX Pakistan (Urdu)

    </td> </tr>

    Cp1025

    </td>

    IBM Multilingual Cyrillic: Bulgaria, Bosnia, Herzegovinia, Macedonia (FYR)

    </td> </tr>

    Cp1026

    </td>

    IBM Latin-5, Turkey

    </td> </tr>

    Cp1046

    </td>

    IBM Arabic - Windows

    </td> </tr>

    Cp1097

    </td>

    IBM Iran (Farsi)/Persian

    </td> </tr>

    Cp1098

    </td>

    IBM Iran (Farsi)/Persian (PC)

    </td> </tr>

    Cp1112

    </td>

    IBM Latvia, Lithuania

    </td> </tr>

    Cp1122

    </td>

    IBM Estonia

    </td> </tr>

    Cp1123

    </td>

    IBM Ukraine

    </td> </tr>

    Cp1124

    </td>

    IBM AIX Ukraine

    </td> </tr>

    Cp1140

    </td>

    Variant of Cp037 with Euro character

    </td> </tr>

    Cp1141

    </td>

    Variant of Cp273 with Euro character

    </td> </tr>

    Cp1142

    </td>

    Variant of Cp277 with Euro character

    </td> </tr>

    Cp1143

    </td>

    Variant of Cp278 with Euro character

    </td> </tr>

    Cp1144

    </td>

    Variant of Cp280 with Euro character

    </td> </tr>

    Cp1145

    </td>

    Variant of Cp284 with Euro character

    </td> </tr>

    Cp1146

    </td>

    Variant of Cp285 with Euro character

    </td> </tr>

    Cp1147

    </td>

    Variant of Cp297 with Euro character

    </td> </tr>

    Cp1148

    </td>

    Variant of Cp500 with Euro character

    </td> </tr>

    Cp1149

    </td>

    Variant of Cp871 with Euro character

    </td> </tr>

    Cp1381

    </td>

    IBM OS/2, DOS People's Republic of China (PRC)

    </td> </tr>

    Cp1383

    </td>

    IBM AIX People's Republic of China (PRC)

    </td> </tr>

    Cp33722

    </td>

    IBM-eucJP - Japanese (superset of 5050)

    </td> </tr>

    ISO2022_CN_CNS

    </td>

    CNS11643 in ISO 2022 CN form, Traditional Chinese (conversion from Unicode only)

    </td> </tr>

    ISO2022_CN_GB

    </td>

    GB2312 in ISO 2022 CN form, Simplified Chinese (conversion from Unicode only)

    </td> </tr>

    JISAutoDetect

    </td>

    Detects and converts from Shift-JIS, EUC-JP, ISO 2022 JP (conversion to Unicode only)

    </td> </tr>

    MS874

    </td>

    Windows Thai

    </td> </tr>

    MacArabic

    </td>

    Macintosh Arabic

    </td> </tr>

    MacCentralEurope

    </td>

    Macintosh Latin-2

    </td> </tr>

    MacCroatian

    </td>

    Macintosh Croatian

    </td> </tr>

    MacCyrillic

    </td>

    Macintosh Cyrillic

    </td> </tr>

    MacDingbat

    </td>

    Macintosh Dingbat

    </td> </tr>

    MacGreek

    </td>

    Macintosh Greek

    </td> </tr>

    MacHebrew

    </td>

    Macintosh Hebrew

    </td> </tr>

    MacIceland

    </td>

    Macintosh Iceland

    </td> </tr>

    MacRoman

    </td>

    Macintosh Roman

    </td> </tr>

    MacRomania

    </td>

    Macintosh Romania

    </td> </tr>

    MacSymbol

    </td>

    Macintosh Symbol

    </td> </tr>

    MacThai

    </td>

    Macintosh Thai

    </td> </tr>

    MacTurkish

    </td>

    Macintosh Turkish

    </td> </tr>

    MacUkraine

    </td>

    Macintosh Ukraine

    </td> </tr> </tbody> </table> 來自:http://blog.csdn.net/love_xsq/article/details/41911681

     本文由用戶 jopen 自行上傳分享,僅供網友學習交流。所有權歸原作者,若您的權利被侵害,請聯系管理員。
     轉載本站原創文章,請注明出處,并保留原始鏈接、圖片水印。
     本站是一個以用戶分享為主的開源技術平臺,歡迎各類分享!
  • sesese色