C++國際化 UTF-8 CPP

fmms 13年前發布 | 9K 次閱讀 C/C++

一個處理UTF-8編碼字符串的簡單、小巧、跨平臺的泛型庫。

#include <fstream>

include <iostream>

include <string>

include <vector>

include "utf8.h"

using namespace std; int main(int argc, char** argv) { if (argc != 2) { cout << "\nUsage: docsample filename\n"; return 0; }

const char* test_file_path = argv[1];
// Open the test file (contains UTF-8 encoded text)
ifstream fs8(test_file_path);
if (!fs8.is_open()) {
cout << "Could not open " << test_file_path << endl;
return 0;
}

unsigned line_count = 1;
string line;
// Play with all the lines in the file
while (getline(fs8, line)) {
   // check for invalid utf-8 (for a simple yes/no check, there is also utf8::is_valid function)
    string::iterator end_it = utf8::find_invalid(line.begin(), line.end());
    if (end_it != line.end()) {
        cout << "Invalid UTF-8 encoding detected at line " << line_count << "\n";
        cout << "This part is fine: " << string(line.begin(), end_it) << "\n";
    }

    // Get the line length (at least for the valid part)
    int length = utf8::distance(line.begin(), end_it);
    cout << "Length of line " << line_count << " is " << length <<  "\n";

    // Convert it to utf-16
    vector<unsigned short> utf16line;
    utf8::utf8to16(line.begin(), end_it, back_inserter(utf16line));

    // And back to utf-8
    string utf8line; 
    utf8::utf16to8(utf16line.begin(), utf16line.end(), back_inserter(utf8line));

    // Confirm that the conversion went OK:
    if (utf8line != string(line.begin(), end_it))
        cout << "Error in UTF-16 conversion at line: " << line_count << "\n";        

    line_count++;
}
return 0;

}</pre> 項目地址: http://utfcpp.sourceforge.net/

 本文由用戶 fmms 自行上傳分享,僅供網友學習交流。所有權歸原作者,若您的權利被侵害,請聯系管理員。
 轉載本站原創文章,請注明出處,并保留原始鏈接、圖片水印。
 本站是一個以用戶分享為主的開源技術平臺,歡迎各類分享!