XMLUnit for Java provides a set of supporting classes (e.g. Diff,  DetailedDiff, Transform, SimpleXpathEngine, Validator,NodeTest) that allow assertions to be made about:
  • The differences between two pieces of XML
  • The outcome of transforming a piece of XML using XSLT
  • The evaluation of an XPath expression on a piece of XML
  • The validity of a piece of XML
  • Individual nodes in a piece of XML that are exposed by DOM Traversal
XMLUnit for Java can also treat HTML content (even badly-formed HTML) as valid XML to allow these assertions to be made about the content of web pages too.



I've used XMLUnit in 2 situations: assert that 2 xml files are semantically the same regardless of structural differences (eg. different formatting, different namespace prefixes, usage of opening and closing tag vs. empty tag...) and finding real differences between two xml files regardless of prior mentioned differences in structure.

The equality of two xml files can be compared in two ways. We can check if two documents are identical. Two documents are considered to be "identical" if they contain the same elements and attributes in the same order. But we can check if two documents are similar, where those two documents must contain the same elements and attributes regardless of order. In most cases you will checking similarity and that is what is checked when XMLAssert.assertXMLEqual methdo is called.


There are only few classes that needs to be known. Not so complicated but not so easy when compared to ordinary Assert implementations (JUnit, TestNG or FEST).
XMLUnit class is used as a static configuration class. If there is need for some general configuration change (eg. to ignore whitespaces, to normalize text nodes, to ignore xml comments...) this is a place to do it before calling asserts. Also it is possible to change parsers and engines used by XMLUnit.

DifferenceEngine is comparing Nodes (just Nodes of type: ELEMENT_NODE, CDATA_SECTION_NODE, COMMENT_NODE, DOCUMENT_TYPE_NODE, PROCESSING_INSTRUCTION_NODE and TEXT_NODE) and when difference is found the responsibility gets delegated to DifferenceListener interface which implementation is the final judge if difference is significant. DifferenceEngine is not used directly but in some not so simple use cases difference listener should be set.
There are several implementations of DifferenceListener, most useful are Diff and DetailedDiff.

Diff is used for simple comparison where only the state of comparison is needed (identical, similar or different) and the comparison is halted as soon as the status is known with certainty.

If you need o collect all the differences and have all details DetaildedDiff should be used. It is just a decorator for other DifferenceListeners.

MatchTracker interface gets notified when a match between test and actual node is found. It is useful to implement listener that logs matches so that you get insight of comparison process. This interface has no implementation.

ElementQualifier Interface used by the DifferenceEngine to determine which elements can be compared within a NodeList of child nodes. It controls which two nodes from test and actual XML document should be compared.

There are few implementations of this interface: ElementNameQualifier which is default, compares element nodes whose tag name is same. ElementNameAndAttributeQualifier uses tag name and attribute, ElementNameAndTextQualifier uses tag name and text content, RecursiveElementNameAndTextQualifier compares all Element and Text nodes in two pieces of XML. Allows elements of complex, deeply nested types that are returned in different orders but have the same content to be recognized as comparable. 

Every single compare is done using ElementNameQualifier. I had a special case where I needed recursive ElementQualifier but which compares single nodes not just by tag name but also by attributes. So I have tweaked RecursiveElementNameAndTextQualifier to be just a recursive decorator around existing ElementQualifier implementation.


I've crated example at https://github.com/aurelije/xmlunit I had an use case where I needed to refactor one complex wsdl file. I wanted to sort all operations, types and messages, so I can easily compare that sorted Old API wsdl file with New API wsdl file and see what is changed and what is added. So I needed some proof that the original and sorted wsdl are semantically the same. And then I could also use this test to list all differences between Old API and New API wsdl.