如何編寫一個JSON解析器
編寫一個JSON解析器實際上就是一個函數,它的輸入是一個表示JSON的字符串,輸出是結構化的對應到語言本身的數據結構。
和XML相比,JSON本身結構非常簡單,并且僅有幾種數據類型,以Java為例,對應的數據結構是:
- "string":Java的String;
- number:Java的Long或Double;
- true/false:Java的Boolean;
- null:Java的null;
- [array]:Java的List<Object>或Object[];
- {"key":"value"}:Java的Map<String, Object>。
解析JSON和解析XML類似,最終都是解析為內存的一個對象。出于效率考慮,使用流的方式幾乎是唯一選擇,也就是解析器只從頭掃描一遍JSON字符串,就完整地解析出對應的數據結構。
本質上解析器就是一個狀態機,只要按照JSON定義的格式(參考http://www.json.org,正確實現狀態轉移即可。但是為了簡化代碼,我們也沒必要完整地實現一個字符一個字符的狀態轉移。
解析器的輸入應該是一個字符流,所以,第一步是獲得Reader,以便能不斷地讀入下一個字符。
在解析的過程中,我們經常要根據下一個字符來決定狀態跳轉,此時又涉及到回退的問題,就是某些時候不能用next()取下一個字符,而是用peek()取下一個字符,但字符流的指針不移動。所以,Reader接口不能滿足這個需求,應當進一步封裝一個CharReader,它可以實現:
- char next():讀取下一個字符,移動Reader指針;
- char peek():讀取下一個字符,不移動Reader指針;
- String next(int size):讀取指定的N個字符并移動指針;
- boolean hasMore():判斷流是否結束。
JSON解析比其他文本解析要簡單的地方在于,任何JSON數據類型,只需要根據下一個字符即可確定,仔細總結可以發現,如果peek()返回的字符是某個字符,就可以期望讀取的數據類型:
- {:期待一個JSON object;
- ::期待一個JSON object的value;
- ,:期待一個JSON object的下一組key-value,或者一個JSON array的下一個元素;
- [:期待一個JSON array;
- t:期待一個true;
- f:期待一個false;
- n:期待一個null;
- ":期待一個string;
- 0~9:期待一個number。
但是單個字符要匹配的狀態太多了,需要進一步把字符流變為Token,可以總結出如下幾種Token:
- END_DOCUMENT:JSON文檔結束;
- BEGIN_OBJECT:開始一個JSON object;
- END_OBJECT:結束一個JSON object;
- BEGIN_ARRAY:開始一個JSON array;
- END_ARRAY:結束一個JSON array;
- SEP_COLON:讀取一個冒號;
- SEP_COMMA:讀取一個逗號;
- STRING:一個String;
- BOOLEAN:一個true或false;
- NUMBER:一個number;
- NULL:一個null。
然后,將CharReader進一步封裝為TokenReader,提供以下接口:
- Token readNextToken():讀取下一個Token;
- boolean readBoolean():讀取一個boolean;
- Number readNumber():讀取一個number;
- String readString():讀取一個string;
- void readNull():讀取一個null。
由于JSON的Object和Array可以嵌套,在讀取過程中,使用一個棧來存儲Object和Array是必須的。每當我們讀到一個BEGIN_OBJECT時,就創建一個Map并壓棧;每當讀到一個BEGIN_ARRAY時,就創建一個List并壓棧;每當讀到一個END_OBJECT和END_ARRAY時,就彈出棧頂元素,并根據新的棧頂元素判斷是否壓棧。此外,讀到Object的Key也必須壓棧,讀到后面的Value后將Key-Value壓入棧頂的Map。
如果讀到END_DOCUMENT時,棧恰好只剩下一個元素,則讀取正確,將該元素返回,讀取結束。如果棧剩下不止一個元素,則JSON文檔格式不正確。
最后,JsonReader的核心解析代碼parse()就是負責從TokenReader中不斷讀取Token,根據當前狀態操作,然后設定下一個Token期望的狀態,如果與期望狀態不符,則JSON的格式無效。起始狀態被設定為STATUS_EXPECT_SINGLE_VALUE | STATUS_EXPECT_BEGIN_OBJECT | STATUS_EXPECT_BEGIN_ARRAY,即期望讀取到單個value、{或[。循環的退出點是讀取到END_DOCUMENT時。
public class JsonReader { TokenReader reader; public Object parse() { Stack stack = new Stack(); int status = STATUS_EXPECT_SINGLE_VALUE | STATUS_EXPECT_BEGIN_OBJECT | STATUS_EXPECT_BEGIN_ARRAY; for (;;) { Token currentToken = reader.readNextToken(); switch (currentToken) { case BOOLEAN: if (hasStatus(STATUS_EXPECT_SINGLE_VALUE)) { // single boolean: Boolean bool = reader.readBoolean(); stack.push(StackValue.newJsonSingle(bool)); status = STATUS_EXPECT_END_DOCUMENT; continue; } if (hasStatus(STATUS_EXPECT_OBJECT_VALUE)) { Boolean bool = reader.readBoolean(); String key = stack.pop(StackValue.TYPE_OBJECT_KEY).valueAsKey(); stack.peek(StackValue.TYPE_OBJECT).valueAsObject().put(key, bool); status = STATUS_EXPECT_COMMA | STATUS_EXPECT_END_OBJECT; continue; } if (hasStatus(STATUS_EXPECT_ARRAY_VALUE)) { Boolean bool = reader.readBoolean(); stack.peek(StackValue.TYPE_ARRAY).valueAsArray().add(bool); status = STATUS_EXPECT_COMMA | STATUS_EXPECT_END_ARRAY; continue; } throw new JsonParseException("Unexpected boolean.", reader.reader.readed); case NULL: if (hasStatus(STATUS_EXPECT_SINGLE_VALUE)) { // single null: reader.readNull(); stack.push(StackValue.newJsonSingle(null)); status = STATUS_EXPECT_END_DOCUMENT; continue; } if (hasStatus(STATUS_EXPECT_OBJECT_VALUE)) { reader.readNull(); String key = stack.pop(StackValue.TYPE_OBJECT_KEY).valueAsKey(); stack.peek(StackValue.TYPE_OBJECT).valueAsObject().put(key, null); status = STATUS_EXPECT_COMMA | STATUS_EXPECT_END_OBJECT; continue; } if (hasStatus(STATUS_EXPECT_ARRAY_VALUE)) { reader.readNull(); stack.peek(StackValue.TYPE_ARRAY).valueAsArray().add(null); status = STATUS_EXPECT_COMMA | STATUS_EXPECT_END_ARRAY; continue; } throw new JsonParseException("Unexpected null.", reader.reader.readed); case NUMBER: if (hasStatus(STATUS_EXPECT_SINGLE_VALUE)) { // single number: Number number = reader.readNumber(); stack.push(StackValue.newJsonSingle(number)); status = STATUS_EXPECT_END_DOCUMENT; continue; } if (hasStatus(STATUS_EXPECT_OBJECT_VALUE)) { Number number = reader.readNumber(); String key = stack.pop(StackValue.TYPE_OBJECT_KEY).valueAsKey(); stack.peek(StackValue.TYPE_OBJECT).valueAsObject().put(key, number); status = STATUS_EXPECT_COMMA | STATUS_EXPECT_END_OBJECT; continue; } if (hasStatus(STATUS_EXPECT_ARRAY_VALUE)) { Number number = reader.readNumber(); stack.peek(StackValue.TYPE_ARRAY).valueAsArray().add(number); status = STATUS_EXPECT_COMMA | STATUS_EXPECT_END_ARRAY; continue; } throw new JsonParseException("Unexpected number.", reader.reader.readed); case STRING: if (hasStatus(STATUS_EXPECT_SINGLE_VALUE)) { // single string: String str = reader.readString(); stack.push(StackValue.newJsonSingle(str)); status = STATUS_EXPECT_END_DOCUMENT; continue; } if (hasStatus(STATUS_EXPECT_OBJECT_KEY)) { String str = reader.readString(); stack.push(StackValue.newJsonObjectKey(str)); status = STATUS_EXPECT_COLON; continue; } if (hasStatus(STATUS_EXPECT_OBJECT_VALUE)) { String str = reader.readString(); String key = stack.pop(StackValue.TYPE_OBJECT_KEY).valueAsKey(); stack.peek(StackValue.TYPE_OBJECT).valueAsObject().put(key, str); status = STATUS_EXPECT_COMMA | STATUS_EXPECT_END_OBJECT; continue; } if (hasStatus(STATUS_EXPECT_ARRAY_VALUE)) { String str = reader.readString(); stack.peek(StackValue.TYPE_ARRAY).valueAsArray().add(str); status = STATUS_EXPECT_COMMA | STATUS_EXPECT_END_ARRAY; continue; } throw new JsonParseException("Unexpected char \'\"\'.", reader.reader.readed); case SEP_COLON: // : if (status == STATUS_EXPECT_COLON) { status = STATUS_EXPECT_OBJECT_VALUE | STATUS_EXPECT_BEGIN_OBJECT | STATUS_EXPECT_BEGIN_ARRAY; continue; } throw new JsonParseException("Unexpected char \':\'.", reader.reader.readed); case SEP_COMMA: // , if (hasStatus(STATUS_EXPECT_COMMA)) { if (hasStatus(STATUS_EXPECT_END_OBJECT)) { status = STATUS_EXPECT_OBJECT_KEY; continue; } if (hasStatus(STATUS_EXPECT_END_ARRAY)) { status = STATUS_EXPECT_ARRAY_VALUE | STATUS_EXPECT_BEGIN_ARRAY | STATUS_EXPECT_BEGIN_OBJECT; continue; } } throw new JsonParseException("Unexpected char \',\'.", reader.reader.readed); case END_ARRAY: if (hasStatus(STATUS_EXPECT_END_ARRAY)) { StackValue array = stack.pop(StackValue.TYPE_ARRAY); if (stack.isEmpty()) { stack.push(array); status = STATUS_EXPECT_END_DOCUMENT; continue; } int type = stack.getTopValueType(); if (type == StackValue.TYPE_OBJECT_KEY) { // key: [ CURRENT ] ,} String key = stack.pop(StackValue.TYPE_OBJECT_KEY).valueAsKey(); stack.peek(StackValue.TYPE_OBJECT).valueAsObject().put(key, array.value); status = STATUS_EXPECT_COMMA | STATUS_EXPECT_END_OBJECT; continue; } if (type == StackValue.TYPE_ARRAY) { // xx, xx, [CURRENT] ,] stack.peek(StackValue.TYPE_ARRAY).valueAsArray().add(array.value); status = STATUS_EXPECT_COMMA | STATUS_EXPECT_END_ARRAY; continue; } } throw new JsonParseException("Unexpected char: \']\'.", reader.reader.readed); case END_OBJECT: if (hasStatus(STATUS_EXPECT_END_OBJECT)) { StackValue object = stack.pop(StackValue.TYPE_OBJECT); if (stack.isEmpty()) { // root object: stack.push(object); status = STATUS_EXPECT_END_DOCUMENT; continue; } int type = stack.getTopValueType(); if (type == StackValue.TYPE_OBJECT_KEY) { String key = stack.pop(StackValue.TYPE_OBJECT_KEY).valueAsKey(); stack.peek(StackValue.TYPE_OBJECT).valueAsObject().put(key, object.value); status = STATUS_EXPECT_COMMA | STATUS_EXPECT_END_OBJECT; continue; } if (type == StackValue.TYPE_ARRAY) { stack.peek(StackValue.TYPE_ARRAY).valueAsArray().add(object.value); status = STATUS_EXPECT_COMMA | STATUS_EXPECT_END_ARRAY; continue; } } throw new JsonParseException("Unexpected char: \'}\'.", reader.reader.readed); case END_DOCUMENT: if (hasStatus(STATUS_EXPECT_END_DOCUMENT)) { StackValue v = stack.pop(); if (stack.isEmpty()) { return v.value; } } throw new JsonParseException("Unexpected EOF.", reader.reader.readed); case BEGIN_ARRAY: if (hasStatus(STATUS_EXPECT_BEGIN_ARRAY)) { stack.push(StackValue.newJsonArray(this.jsonArrayFactory.createJsonArray())); status = STATUS_EXPECT_ARRAY_VALUE | STATUS_EXPECT_BEGIN_OBJECT | STATUS_EXPECT_BEGIN_ARRAY | STATUS_EXPECT_END_ARRAY; continue; } throw new JsonParseException("Unexpected char: \'[\'.", reader.reader.readed); case BEGIN_OBJECT: if (hasStatus(STATUS_EXPECT_BEGIN_OBJECT)) { stack.push(StackValue.newJsonObject(this.jsonObjectFactory.createJsonObject())); status = STATUS_EXPECT_OBJECT_KEY | STATUS_EXPECT_BEGIN_OBJECT | STATUS_EXPECT_END_OBJECT; continue; } throw new JsonParseException("Unexpected char: \'{\'.", reader.reader.readed); } } } }
詳細源碼請參考:https://github.com/michaelliao/jsonstream。
來自:http://www.liaoxuefeng.com/article/0014211269349633dda29ee3f29413c91fa65c372585f23000