Introduction to XML: XML is n eXtensible Markup Language used to describe the data. It is a flexible way to create information formats and electronically share structured data via the public internet. Simply, it is a software and hardware-independent tool for storing and transporting data.
It is a computer language which uses markup and is capable of being extended. That is:
- Mark up:
- Notations or symbols that are used to correct in making up text and indicate how the text should be displayed are called markups (i.e. symbols like <, >, ? etc..)
- Markup refers to the sequence of characters or other symbols that can be inserted at certain places in a text file, to indicate how a file should look when it is printed or displayed
- A program or programming language that is designed so that users and developers can expand or add to its capabilities.
Example: Go through the below example to understand a sample XML:
Points to understand from above XML:
- Red color text in the above example is nothing but mark up
- Black color text in the above example is nothing but a normal text which needs to be displayed or transferred
Components in XML: The following are the different components which together form an XML.
- Start tag: Any tag in an XML file starts with a start tag. The following text colored in red represents a start tag:
- <name> QAFox </name>
- End tag: Any tag in an XML file which is used for ending a start tag. The following text colored in red represents an end tag in XML:
- <name> QAFox </name>
And, / in the end tag is known as slash or solidus
Elements in XML must start with a letter or underscore and can have letters, digits, period, underscore etc .. but not white space. The following is an example of an element:
Text ‘name’ called in the above example is an element name, whereas text ‘sample’ is element value
Following are the rules to keep in mind while writing the XML elements:
- XML element names are case sensitive
- XML document should have one root element
- Elements should not be overlapped
Following is the structure of XML elements:
Understanding XML declaration or prolog:
<? xml version=”1.0” encoding=”UTF-8” ?> is the XML declaration or prolog of XML. Which is the very first statement in XML. It is explained indetailed below:
version=”1.0” – Version of XML
There are two versions of XML i.e. 1.0 and 1.1. Mostly used one is 1.0 and 1.1 only used when one needs any special features like using non-ASCII characters. We can find non-ascii characters here: http://rbutterworth.nfshost.com/Tables/common
UTF– It is an encoding type and is nothing but Universal Character Set Transformation Format.
- UTF itself has two versions i.e. UTF-8 and UTF-16.
- 8 and 16 represent the number of bits used to represent a character.
We have other encoding types along with UTF. The following is the list:
- UTF-8 and UTF-16
- And others. All the other types are mentioned here: https://www.iana.org/assignments/character-sets/character-sets.xml
XML declaration statement <? xml version=”1.0” encoding=”UTF-8” ?> doesn’t have any end tag, hence it is not an element and can be called as a declaration.
Following rules to be followed while writing the above declaration:
- It is optional. If it appears in a document then it should be on top. Not even whitespace or comment come before it
- All XML parsers are required to support UTF-8 and UTF-16
- It is case sensitive
Property of XML element called as XML attributes. It consists of name and value separated by an equal sign. Following example elaborates about XML attribute:
The text name=”mine” is nothing but an XML attribute. And, text ‘name’ is attribute name and text ‘mine’ is attribute value and it is should always be in single or double quotes.
Few rules of XML attributes to keep in mind are:
- The attribute is only placed inside the start tag with white space associated
- Only one value for the same attribute in the same start-tag
A reference is used to add additional text or markup in an XML document, where an error may occur if the character is typed directly. Such as < , &, >, etc.. can be replaced by &, <, >, etc. References start with & (ampersand) and end with ;(semicolon)
References are of two types:
- Entity References
- Character References
Entity references: Starts with & (ampersand) and ends with ;(semicolon). Below example illustrates entity reference:
If you write <salary> salary < 100 </salary>, throws XML error. Hence it can be fixed using entity reference as <salary>salary < 1000</salary>.
- Here, ‘<’ is the equivalent of ‘<’ symbol
- List of entity references are shown here: http://xml.silmaril.ie/specials.html
Character references: starts with &# and ends with ; (semicolon).
- Examples: a→a, A→A, Z→ z etc.
- List of character references are shown here: http://myhandbook.info/codes_htmlchr.html
XML Comments: Declaration of XML comments is shown below:
<!-comment goes here->
XML comments are same like any other language comments. Which are added as notes or lines for understanding the purpose of the code. Also, comments can be used to include terms and conditions links or any other related links. They may appear anywhere in the code.
Follow below instructions while writing comments in XML:
- Comments are placed anywhere expect inside tag and top of the XML
- XML comments cannot be nested
Defining Empty XML elements: An XML element which has no value is known as empty XML element. The following are the examples for empty XML elements:
<element></element> or <element/>
XML Namespaces: It provides a way to avoid element name conflicts.
Name conflict will occur when you are trying to mix XML documents from different applications like above XML 1 and XML 2. To avoid such situations we can add prefix like below:
When using these prefixes in XML, namespace must be defined. Which can be defined by using attribute called ‘xmlns’ in the start tag of an element. And the syntax is shown below:
xmlns:prefix = “URI”
Example: The following is an example for using the above syntax:
When any prefix is mentioned with xmlns then it will become a qualified namespace. Namespaces can also be declared in the XML root element. Defining a default namespace for an element saves us from using prefixes in all the child elements. It is shown below:
- XML was not designed to display data like HTML.
- XML doesn’t have predefined tags.
Uses of XML:
- XML can separate data from HTML.
- XML is used to exchange data
- XML can be used to share data
- XML can be used to store data
- XML can make your data more useful
- XML can be used to create new *ML languages
Elements vs attributes:
- Attributes cannot contain multiple values – child elements can
- Attributes are not easily expandable – for future changes
- Attributes cannot describe structures—child elements can
- Attributes are more difficult to manipulate by program code
- Attribute values are not easy to test against Document Type Definition – DTD: Which is used to define legal elements of an XML document.
XML Example using all the above expalined components:
<?xml version=“1.0” encoding=“UTF-8”?>
<!– It is a sample XML document which includes components of XML –>
<!DOCTYPE note [
<!ENTITY writer SYSTEM “https://www.w3schools.com/entities.dtd”>
<!ENTITY copyright SYSTEM “https://www.w3schools.com/entities.dtd”>
<author>J K. Rowling</author>
Above example illustrates all the above components of XML. Brief of them given below:
- Tags shown in blue color. Ex: <bookstore>, <author>, <price>, <h:book>,</h:book>,</bookstore>
- Element is a combination of start tag, end tag and data i.e. <h:td>Book1</h:td>’,’<title>Harry Potter</title>’
- Attributes mentioned inside start tag i.e. <book category=“children”> – category is an attribute name and children is attribute value.
- References mentioned as ‘&writer;©right;’ these set in DOCTYPE note declaration in the top of XML
- Comments mentioned as <!– It is a sample XML document which includes components of XML –>
- Namespaces mentioned in <h:book xmlns:h=“http://www.w3.org/TR/html4/“> and ‘xmlns:h=“http://www.w3.org/TR/html4/“’ is a namespace.