Applications using strict XML parsing will either use the JAXP API or the SAX API to create their parsers. Applications needing to parse HTML will generally instantiate their own parsers.
There are four parser flavors with the same API: Xml, LooseXml, Html and LooseHtml. The core of the API is in AbstractParser.
You can parse XML into a DOM tree or you can use the SAX callback API. The core of the API is documented in AbstractParser.
DOM parsing looks like:
Document doc = new Html().parseDocument("test.html");
Parsing directly from a string looks like:
String str = "<em>test html doc</em>";
Document doc = new Html().parseDocumentString(str);
SAX parsing looks like:
Html html = new Html();
html.setContentHandler(myContentHandler);
html.parse("test.html");