Write a Simple XML Parser
Ever run into a problem where data is delivered in XML but the
client has to be lightweight and as such cannot afford the baggage of
the DOM and SAX parsers? An example of this could be an applet
running on a web page. Most applets which are widely accepted use
Java version 1.1.x. What do you do when you need to parse an XML
document using such an applet? My answer, I would write a
lightweight XML parser myself. Lets look at the kinds of parsers and
decide on a design for our lightweight parser.
Types of parsers
DOM: These kind of parsers read the complete XML file into
memory and create a tree structure. The whole XML file is loaded in
memory, this can become pretty memory intensive.
SAX: These parsers read the XML file line by line and
generate events as and when an element is found. These events can be
handled and action can be taken on finding required elements.
SAX parsers are stream-based, event-based, push parsers. This
is a much more lightweight parser than a DOM parser but still is an
overkill for small apps and applets.
Poll Parser: This kind of parser looks for elements in an
XML file and drills down to the specific elements that the
application is interested in. This is very effective when the size
of XML is not very large and no XML validation is required.
In our design, we will exploit the way XML is structured and
assume certain things for the time being. Once we have a simple XML
parser that works for us, we can extend it to handle special cases
later. Lets examine an XML file structure first :
Sample XML file :
<!DOCTYPE STUDENTS [
<!ELEMENT STUDENTS (STUDENT*)>
<!ELEMENT STUDENT (NAME, AGE, CLASS)>
<!ELEMENT NAME (#PCDATA)>
<!ELEMENT AGE (#PCDATA)>
<!ELEMENT CLASS (#PCDATA)>
Notice that in an XML file, everything exists as a tree structure.
That means, each entity contains children, or holds data. For eg. The
STUDENT tag contains three child nodes, namely, NAME, AGE and CLASS.
Similarly, the STUDENTS tag contains one or more STUDENT tags. Due to
this tree structure, it becomes easy to drill down to a particular
entity easily. We will exploit this structure to write our
Lets lay down certain limitations also. Since XML can get much
more complex than the example selected, we need to know our
limitations too. Our parser has the following limitations:
Support only complete XML tags. i.e. It doesn't support empty
elements which are terminated using the start tag itself. eg. <AGE
/>. This is equivalent to writng <AGE></AGE>. Due to
the simple nature of our parser, lets only support the latter
No support for attributes : XML tags may have associated
attributes, which we will not support in this parser. eg. <CLASS
division=”B”>10</CLASS>. Although our parser
will discover 10 as the value for the CLASS tag in this example, it
cannot be queried for the division attribute, because it has no idea
So now, we have laid down the limitations and had a look at a
sample XML to parse, we can start our design and implementation of
We need to implement just one method. Yes, thats right, just ONE.
If we look closely at the way XML is structured, we are always only
concerned with the value of a given element or elements. For eg, in
the above XML file, if I look at the value of the tag STUDENTS, it
contains a list of STUDENT records. Therefore, the value of the
STUDENTS tag is the text enclosed within <STUDENTS> and
</STUDENTS>. Once we have this text, each STUDENT record is the
text that comes in between <STUDENT> and </STUDENT> and
so on to the leaf nodes NAME, AGE and CLASS.
Therefore, in order to parse an XML file, we need a single
function that will return us the text that comes in between the start
and end of any given tag. In this way, we can narrow down on the
value contained in the tag we want, which is the data we can process.
Therefore, using a single function it is possible to drill down to
the data contained by any tag in an XML file. Lets define the
prototype of the function :
public static Vector getXMLTagValue(String xmlFileString, String tagName);
The above function takes in the XML file or a subset of an XML
file as a String and the tag name to extract. Values of all tags in
the XML file String matching the tag name provided are extracted and
filled into the Vector as Strings. Therefore, If one has to extract
the list of STUDENT tags in the XML file, it is achieved by writing
the following code :
// First get the XML file in a string.
String xmlFile = getXMLFile(“x.xml”);
Vector v = getXMLTagValue(xmlFile, “STUDENT”);
This will give us the complete list of STUDENT tag values as Strings
filled in the Vector. To drill down to a particular STUDENT tag, we
need to extract each element of the Vector. Each element of the
Vector can now be used as the xmlFile string for further calls to
getXMLTagValue, in order to drill down further.
Now consider I want to extract the age of the first STUDENT, I can
do the following :
Vector ageV = getXMLTagValue(v.elementAt(0), “AGE”);
System.out.println(“Age is : “ + ageV.elementAt(0));
As you can see, this function is powerful enough to extract all
values from an XML file governed by the limitations we have decided
on. Lets get on to the code of the function.
public class XMLParser
* Pass only the name of the section for example "QUESTION"
public static Vector getXMLTagValue(String xml, String section) throws Exception
String xmlString = new String(xml);
Vector v = new Vector();
String beginTagToSearch = "<" + section + ">";
String endTagToSearch = "</" + section + ">";
// Look for the first occurrence of begin tag
int index = xmlString.indexOf(beginTagToSearch);
while(index != -1)
// Look for end tag
// DOES NOT HANDLE <section Blah />
int lastIndex = xmlString.indexOf(endTagToSearch);
// Make sure there is no error
if((lastIndex == -1) || (lastIndex < index))
throw new Exception("Parse Error");
// extract the substring
String subs = xmlString.substring((index + beginTagToSearch.length()), lastIndex) ;
// Add it to our list of tag values
// Try it again. Narrow down to the part of string which is not
// processed yet.
xmlString = xmlString.substring(lastIndex + endTagToSearch.length());
xmlString = "";
// Start over again by searching the first occurrence of the begin tag
// to continue the loop.
index = xmlString.indexOf(beginTagToSearch);
This simple XML parser can be used to extract values from an XML
file when the XML file is small and when the application cannot take
the baggage of the SAX or DOM parsers. This is a nice XML parser to
use for applets which receive information in XML format. This XML
parser may not work with any XML file since it has some limitations,
therefore, it is most effective when both the XML file and the client
application are written by the same developer or team.
Please let me know if there are any more limitations to this
parser or if you are successful in using this in your project. I can
be contacted at anandh@JavaReference.com
About the Author
Anand is a Senior Software Engineer at Veritas Software Corporation. He has a Masters degree in Computer Science from University of Pune, India. Anand started his career as a C++ programmer, shifting focus to Java. He is a server side buff, and believes Server-side Java is it. Over the years, Anand has designed and implemented numerous projects in Java, C++ and Visual Basic. Primary interests are networking and server side technologies, primarily J2EE. Anand is also interested in teaching and uses his spare time to teach Java. He can be reached at anandh@JavaReference.com.