
Time for action – implementing an XML parser for player data
In this exercise, we are going to create a parser to fill data that represents players and their inventory in an RPG game:
struct InventoryItem { enum Type { Weapon, Armor, Gem, Book, Other } type; QString subType; int durability; }; struct Player { QString name; QString password; int experience; int hitPoints; QList<Item> inventory; QString location; QPoint position; }; struct PlayerInfo { QList<Player> players; };
Save the following document somewhere. We will use it to test whether the parser can read it:
<PlayerInfo> <Player hp="40" exp="23456"> <Name>Gandalf</Name> <Password>mithrandir</Password> <Inventory> <InvItem type="weapon" durability="3"> <SubType>Long sword</SubType> </InvItem> <InvItem type="armor" durability="10"> <SubType>Chain mail</SubType> </InvItem> </Inventory> <Location name="room1"> <Position x="1" y="0"/> </Location> </Player> </PlayerInfo>
Let's create a class called PlayerInfoReader
that will wrap QXmlStreamReader
and expose a parser interface for the PlayerInfo
instances. The class will contain two private members—the reader itself and a PlayerInfo
instance that acts as a container for the data that is currently being read. We'll provide a result()
method that returns this object once the parsing is complete, as shown in the following code:
class PlayerInfoReader { public: PlayerInfoReader(QIODevice *); inline const PlayerInfo& result() const { return m_pinfo; } private: QXmlStreamReader reader; PlayerInfo m_pinfo; };
The class constructor accepts a QIODevice
pointer that the reader is going to use to retrieve data as it needs it. The constructor is trivial, as it simply passes the device to the reader
object:
PlayerInfoReader(QIODevice *device) { reader.setDevice(device); }
Before we go into parsing, let's prepare some code to help us with the process. First, let's add an enumeration type to the class that will list all the possible tokens—tag names that we want to handle in the parser:
enum Token { T_Invalid = -1, T_PlayerInfo, /* root tag */ T_Player, /* in PlayerInfo */ T_Name, T_Password, T_Inventory, T_Location, /* in Player */ T_Position, /* in Location */ T_InvItem /* in Inventory */ };
To use these tags, we'll add a static method to the class that returns the token type based on its textual representation:
static Token PlayerInfoReader::tokenByName(const QStringRef &r) { static QStringList tokenList = QStringList() << "PlayerInfo" << "Player" << "Name" << "Password" << "Inventory" << "Location" << "Position" << "InvItem"; int idx = tokenList.indexOf(r.toString()); return (Token)idx; }
You can notice that we are using a class called QStringRef
. It represents a string reference—a substring in an existing string—and is implemented in a way that avoids expensive string construction; therefore, it is very fast. We're using this class here because that's how QXmlStreamReader
reports tag names. Inside this static method, we are converting the string reference to a real string and trying to match it against a list of known tags. If the matching fails, -1
is returned, which corresponds to our T_Invalid
token.
Now, let's add an entry point to start the parsing process. Add a public read
method that initializes the data structure and performs initial checks on the input stream:
bool PlayerInfoReader::read() { m_pinfo = PlayerInfo(); if(reader.readNextStartElement() && tokenByName(reader.name()) == T_PlayerInfo) { return readPlayerInfo(); } else { return false; } }
After clearing the data structure, we call readNextStartElement()
on the reader to make it find the starting tag of the first element, and if it is found, we check whether the root tag of the document is what we expect it to be. If so, we call the readPlayerInfo()
method and return its result, denoting whether the parsing was successful. Otherwise, we bail out, reporting an error.
The QXmlStreamReader
subclasses usually follow the same pattern. Each parsing method first checks whether it operates on a tag that it expects to find. Then, it iterates all the starting elements, handling those it knows and ignoring all others. Such an approach lets us maintain forward compatibility, since all tags introduced in newer versions of the document are silently skipped by an older parser.
Now, let's implement the readPlayerInfo
method:
bool readPlayerInfo() { if(tokenByName(reader.name()) != T_PlayerInfo) return false; while(reader.readNextStartElement()) { if(tokenByName(reader.name()) == T_Player) { Player p = readPlayer(); m_pinfo.players.append(p); } else reader.skipCurrentElement(); } return true; }
After verifying that we are working on a PlayerInfo
tag, we iterate all the starting subelements of the current tag. For each of them, we check whether it is a Player
tag and call readPlayer()
to descend into the level of parsing data for a single player. Otherwise, we call skipCurrentElement()
, which fast-forwards the stream until a matching ending element is encountered.
The structure of readPlayer()
is similar; however, it is more complicated as we also want to read data from attributes of the Player
tag itself. Let's take a look at the function piece by piece:
Player readPlayer() { if(tokenByName(reader.name()) != T_Player) return Player(); Player p; const QXmlStreamAttributes& playerAttrs = reader.attributes(); p.hitPoints = playerAttrs.value("hp").toString().toInt(); p.experience = playerAttrs.value("exp").toString().toInt();
After checking for the right tag, we get the list of attributes associated with the opening tag and ask for values of the two attributes that we are interested in. After this, we loop all child tags and fill the Player
structure based on the tag names. By converting tag names to tokens, we can use a switch
statement to neatly structure the code in order to extract information from different tag types, as shown in the following code:
while(reader.readNextStartElement()) { Token t = tokenByName(reader.name()); switch(t) { case Name: p.name = reader.readElementText(); break; case Password: p.password = reader.readElementText(); break; case Inventory: p.inventory = readInventory(); break;
If we are interested in the textual content of the tag, we can use readElementText()
to extract it. This method reads until it encounters the closing tag and returns the text contained within it. For the Inventory
tag, we call the dedicated readInventory()
method.
For the Location
tag, the code is more complex than before as we again descend into reading child tags, extracting the required information and skipping all unknown tags:
case T_Location: { p.location = reader.attributes().value("name").toString(); while(reader.readNextStartElement()) { if(tokenByName(reader.name()) == T_Position) { const QXmlStreamAttributes& attrs = reader.attributes(); p.position.setX(attrs.value("x").toString().toInt()); p.position.setY(attrs.value("y").toString().toInt()); reader.skipCurrentElement(); } else reader.skipCurrentElement(); } }; break; default: reader.skipCurrentElement(); } } return p; }
The last method is similar in structure to the previous one—iterate all the tags, skip everything that we don't want to handle (everything that is not an inventory item), fill the inventory item data structure, and append the item to the list of already parsed items, as shown in the following code:
QList<InventoryItem> readInventory() { QList<InventoryItem> inventory; while(reader.readNextStartElement()) { if(tokenByName(reader.name()) != T_InvItem) { reader.skipCurrentElement(); continue; } InventoryItem item; const QXmlStreamAttributes& attrs = reader.attributes(); item.durability = attrs.value("durability").toString().toInt(); QStringRef typeRef = attrs.value("type"); if(typeRef == "weapon") { item.type = InventoryItem::Weapon; } else if(typeRef == "armor") { item.type = InventoryItem::Armor; } else if(typeRef == "gem") { item.type = InventoryItem::Gem; } else if(typeRef == "book") { item.type = InventoryItem::Book; } else item.type = InventoryItem::Other; while(reader.readNextStartElement()) { if(reader.name() == "SubType") item.subType = reader.readElementText(); else reader.skipCurrentElement(); } inventory << item; } return inventory; }
In main()
of your project, write some code that will check whether the parser works correctly. You can use the qDebug()
statements to output the sizes of lists and contents of variables. Take a look at the following code for an example:
qDebug() << "Count:" << playerInfo.players.count(); qDebug() << "Size of inventory:" << playerInfo.players.first().inventory.size(); qDebug() << "Room: " << playerInfo.players.first().location << playerInfo.players.first().position;
What just happened?
The code you just wrote implements a full top-down parser of the XML data. First, the data goes through a tokenizer, which returns identifiers that are much easier to handle than strings. Then, each method can easily check whether the token it receives is an acceptable input for the current parsing stage. Based on the child token, the next parsing function is determined and the parser descends to a lower level until there is nowhere to descend to. Then, the flow goes back up one level and processes the next child. If at any point an unknown tag is found, it gets ignored. This approach supports a situation when a new version of software introduces new tags to the file format specification, but an old version of software can still read the file by skipping all the tags that it doesn't understand.
Have a go hero – an XML serializer for player data
Now that you know how to parse XML data, you can create the complementary part—a module that will serialize PlayerInfo
structures into XML documents using QXmlStreamWriter
. Use methods such as writeStartDocument()
, writeStartElement()
, writeCharacters()
, and writeEndElement()
for this. Verify that the documents saved with your code can be parsed with what we implemented together.
JSON files
JSON stands for JavaScript Object Notation, which is a popular lightweight textual format that is used to store object-oriented data in a human-readable form. It comes from JavaScript where it is the native format used to store object information; however, it is commonly used across many programming languages and a popular format for web data exchange. A simple JSON-formatted definition looks as follows:
{ "name": "Joe", "age": 14, "inventory: [ { "type": "gold; "amount": "144000" }, { "type": "short_sword"; "material": "iron" } ] }
JSON can express two kinds of entities: objects (enclosed in braces) and arrays (enclosed in square brackets) where an object is defined as a set of key-value pairs, where a value can be a simple string, an object, or array. In the previous example, we had an object containing three properties—name, age, and inventory. The first two properties are simple values and the last property is an array that contains two objects with two properties each.
Qt can create and read JSON descriptions using the QJsonDocument
class. A document can be created from the UTF-8-encoded text using the QJsonDocument::fromJson()
static method and can later be stored in a textual form again using toJson()
. Since the structure of JSON closely resembles that of QVariant
(which can also hold key-value pairs using QVariantMap
and arrays using QVariantList
), conversion methods to this class also exist using a set of fromVariant()
and toVariant()
calls. Once a JSON document is created, you can check whether it represents an object or an array using one of the isArray
and isObject
calls. Then, the document can be transformed into QJsonArray
or QJsonObject
using the toArray
and toObject
methods.
QJsonObject
is an iterable type that can be queried for a list of keys (using keys()
) or asked for a value of a specific key (with a value()
method). Values are represented using the QJsonValue
class, which can store simple values, an array, or object. New properties can be added to the object using the insert()
method that takes a key as a string, a value can be added as QJsonValue
, and the existing properties can be removed using remove()
.
QJsonArray
is also an iterable type that contains a classic list API—it contains methods such as append()
, insert()
, removeAt()
, at()
, and size()
to manipulate entries in the array, again working on QJsonValue
as the item type.