The XML C parser and toolkit of Gnome

Note: this is the flat content of the web site

libxml, a.k.a. gnome-xml

"Programming with libxml2 is like the thrilling embrace of an exotic stranger." Mark Pilgrim

Libxml2 is the XML C parser and toolkit developed for the Gnome project (but usable outside of the Gnome platform), it is free software available under the MIT License. XML itself is a metalanguage to design markup languages, i.e. text language where semantic and structure are added to the content using extra "markup" information enclosed between angle brackets. HTML is the most well-known markup language. Though the library is written in C a variety of language bindings make it available in other environments.

Libxml2 is known to be very portable, the library should build and work without serious troubles on a variety of systems (Linux, Unix, Windows, CygWin, MacOS, MacOS X, RISC Os, OS/2, VMS, QNX, MVS, VxWorks, ...)

Libxml2 implements a number of existing standards related to markup languages:

In most cases libxml2 tries to implement the specifications in a relatively strictly compliant way. As of release 2.4.16, libxml2 passed all 1800+ tests from the OASIS XML Tests Suite.

To some extent libxml2 provides support for the following additional specifications but doesn't claim to implement them completely:

A partial implementation of XML Schemas Part 1: Structure is being worked on but it would be far too early to make any conformance statement about it at the moment.

Separate documents:

Hosting sponsored by Open Source CMS services from AOE media.

Logo designed by Marc Liyanage.

Introduction

This document describes libxml, the XML C parser and toolkit developed for the Gnome project. XML is a standard for building tag-based structured documents/data.

Here are some key points about libxml:

Warning: unless you are forced to because your application links with a Gnome-1.X library requiring it, Do Not Use libxml1, use libxml2

FAQ

Table of Contents:

License(s)

  1. Licensing Terms for libxml

    libxml2 is released under the MIT License; see the file Copyright in the distribution for the precise wording

  2. Can I embed libxml2 in a proprietary application ?

    Yes. The MIT License allows you to keep proprietary the changes you made to libxml, but it would be graceful to send-back bug fixes and improvements as patches for possible incorporation in the main development tree.

Installation

  1. Do Not Use libxml1, use libxml2
  2. Where can I get libxml ?

    The original distribution comes from xmlsoft.org or gnome.org

    Most Linux and BSD distributions include libxml, this is probably the safer way for end-users to use libxml.

    David Doolin provides precompiled Windows versions at www.ce.berkeley.edu/~doolin/code/libxmlwin32/

  3. I see libxml and libxml2 releases, which one should I install ?
  4. I can't install the libxml package, it conflicts with libxml0

    You probably have an old libxml0 package used to provide the shared library for libxml.so.0, you can probably safely remove it. The libxml packages provided on xmlsoft.org provide libxml.so.0

  5. I can't install the libxml(2) RPM package due to failed dependencies

    The most generic solution is to re-fetch the latest src.rpm , and rebuild it locally with

    rpm --rebuild libxml(2)-xxx.src.rpm.

    If everything goes well it will generate two binary rpm packages (one providing the shared libs and xmllint, and the other one, the -devel package, providing includes, static libraries and scripts needed to build applications with libxml(2)) that you can install locally.

Compilation

  1. What is the process to compile libxml2 ?

    As most UNIX libraries libxml2 follows the "standard":

    gunzip -c xxx.tar.gz | tar xvf -

    cd libxml-xxxx

    ./configure --help

    to see the options, then the compilation/installation proper

    ./configure [possible options]

    make

    make install

    At that point you may have to rerun ldconfig or a similar utility to update your list of installed shared libs.

  2. What other libraries are needed to compile/install libxml2 ?

    Libxml2 does not require any other library, the normal C ANSI API should be sufficient (please report any violation to this rule you may find).

    However if found at configuration time libxml2 will detect and use the following libs:

  3. Make check fails on some platforms

    Sometimes the regression tests' results don't completely match the value produced by the parser, and the makefile uses diff to print the delta. On some platforms the diff return breaks the compilation process; if the diff is small this is probably not a serious problem.

    Sometimes (especially on Solaris) make checks fail due to limitations in make. Try using GNU-make instead.

  4. I use the SVN version and there is no configure script

    The configure script (and other Makefiles) are generated. Use the autogen.sh script to regenerate the configure script and Makefiles, like:

    ./autogen.sh --prefix=/usr --disable-shared

  5. I have troubles when running make tests with gcc-3.0

    It seems the initial release of gcc-3.0 has a problem with the optimizer which miscompiles the URI module. Please use another compiler.

Developer corner

  1. Troubles compiling or linking programs using libxml2

    Usually the problem comes from the fact that the compiler doesn't get the right compilation or linking flags. There is a small shell script xml2-config which is installed as part of libxml2 usual install process which provides those flags. Use

    xml2-config --cflags

    to get the compilation flags and

    xml2-config --libs

    to get the linker flags. Usually this is done directly from the Makefile as:

    CFLAGS=`xml2-config --cflags`

    LIBS=`xml2-config --libs`

  2. I want to install my own copy of libxml2 in my home directory and link my programs against it, but it doesn't work

    There are many different ways to accomplish this. Here is one way to do this under Linux. Suppose your home directory is /home/user. Then:

  3. xmlDocDump() generates output on one line.

    Libxml2 will not invent spaces in the content of a document since all spaces in the content of a document are significant. If you build a tree from the API and want indentation:

    1. the correct way is to generate those yourself too.
    2. the dangerous way is to ask libxml2 to add those blanks to your content modifying the content of your document in the process. The result may not be what you expect. There is NO way to guarantee that such a modification won't affect other parts of the content of your document. See xmlKeepBlanksDefault () and xmlSaveFormatFile ()
  4. Extra nodes in the document:

    For an XML file as below:

    <?xml version="1.0"?>
    <PLAN xmlns="www.argus.ca/autotest/1.0/">
    <NODE CommFlag="0"/>
    <NODE CommFlag="1"/>
    </PLAN>

    after parsing it with the function pxmlDoc=xmlParseFile(...);

    I want to the get the content of the first node (node with the CommFlag="0")

    so I did it as following;

    xmlNodePtr pnode;
    pnode=pxmlDoc->children->children;

    but it does not work. If I change it to

    pnode=pxmlDoc->children->children->next;

    then it works. Can someone explain it to me.

    In XML all characters in the content of the document are significant including blanks and formatting line breaks.

    The extra nodes you are wondering about are just that, text nodes with the formatting spaces which are part of the document but that people tend to forget. There is a function xmlKeepBlanksDefault () to remove those at parse time, but that's an heuristic, and its use should be limited to cases where you are certain there is no mixed-content in the document.

  5. I get compilation errors of existing code like when accessing root or child fields of nodes.

    You are compiling code developed for libxml version 1 and using a libxml2 development environment. Either switch back to libxml v1 devel or even better fix the code to compile with libxml2 (or both) by following the instructions.

  6. I get compilation errors about non existing xmlRootNode or xmlChildrenNode fields.

    The source code you are using has been upgraded to be able to compile with both libxml and libxml2, but you need to install a more recent version: libxml(-devel) >= 1.8.8 or libxml2(-devel) >= 2.1.0

  7. Random crashes in threaded applications

    Read and follow all advices on the thread safety page, and make 100% sure you never call xmlCleanupParser() while the library or an XML document might still be in use by another thread.

  8. The example provided in the web page does not compile.

    It's hard to maintain the documentation in sync with the code <grin/> ...

    Check the previous points 1/ and 2/ raised before, and please send patches.

  9. Where can I get more examples and information than provided on the web page?

    Ideally a libxml2 book would be nice. I have no such plan ... But you can:

  10. What about C++ ?

    libxml2 is written in pure C in order to allow easy reuse on a number of platforms, including embedded systems. I don't intend to convert to C++.

    There is however a C++ wrapper which may fulfill your needs:

  11. How to validate a document a posteriori ?

    It is possible to validate documents which had not been validated at initial parsing time or documents which have been built from scratch using the API. Use the xmlValidateDtd() function. It is also possible to simply add a DTD to an existing document:

    xmlDocPtr doc; /* your existing document */
    xmlDtdPtr dtd = xmlParseDTD(NULL, filename_of_dtd); /* parse the DTD */
    
            dtd->name = xmlStrDup((xmlChar*)"root_name"); /* use the given root */
    
            doc->intSubset = dtd;
            if (doc->children == NULL) xmlAddChild((xmlNodePtr)doc, (xmlNodePtr)dtd);
            else xmlAddPrevSibling(doc->children, (xmlNodePtr)dtd);
              
  12. So what is this funky "xmlChar" used all the time?

    It is a null terminated sequence of utf-8 characters. And only utf-8! You need to convert strings encoded in different ways to utf-8 before passing them to the API. This can be accomplished with the iconv library for instance.

  13. etc ...

Developer Menu

There are several on-line resources related to using libxml:

  1. Use the search engine to look up information.
  2. Check the FAQ.
  3. Check the extensive documentation automatically extracted from code comments.
  4. Look at the documentation about libxml internationalization support.
  5. This page provides a global overview and some examples on how to use libxml.
  6. Code examples
  7. John Fleck's libxml2 tutorial: html or pdf.
  8. If you need to parse large files, check the xmlReader API tutorial
  9. James Henstridge wrote some nice documentation explaining how to use the libxml SAX interface.
  10. George Lebl wrote an article for IBM developerWorks about using libxml.
  11. Check the TODO file.
  12. Read the 1.x to 2.x upgrade path description. If you are starting a new project using libxml you should really use the 2.x version.
  13. And don't forget to look at the mailing-list archive.

Reporting bugs and getting help

Well, bugs or missing features are always possible, and I will make a point of fixing them in a timely fashion. The best way to report a bug is to use the Gnome bug tracking database (make sure to use the "libxml2" module name). I look at reports there regularly and it's good to have a reminder when a bug is still open. Be sure to specify that the bug is for the package libxml2.

For small problems you can try to get help on IRC, the #xml channel on irc.gnome.org (port 6667) usually have a few person subscribed which may help (but there is no guarantee and if a real issue is raised it should go on the mailing-list for archival).

There is also a mailing-list xml@gnome.org for libxml, with an on-line archive (old). To subscribe to this list, please visit the associated Web page and follow the instructions. Do not send code, I won't debug it (but patches are really appreciated!).

Please note that with the current amount of virus and SPAM, sending mail to the list without being subscribed won't work. There is *far too many bounces* (in the order of a thousand a day !) I cannot approve them manually anymore. If your mail to the list bounced waiting for administrator approval, it is LOST ! Repost it and fix the problem triggering the error. Also please note that emails with a legal warning asking to not copy or redistribute freely the information they contain are NOT acceptable for the mailing-list, such mail will as much as possible be discarded automatically, and are less likely to be answered if they made it to the list, DO NOT post to the list from an email address where such legal requirements are automatically added, get private paying support if you can't share information.

Check the following before posting:

Then send the bug with associated information to reproduce it to the xml@gnome.org list; if it's really libxml related I will approve it. Please do not send mail to me directly, it makes things really hard to track and in some cases I am not the best person to answer a given question, ask on the list.

To be really clear about support:

Of course, bugs reported with a suggested patch for fixing them will probably be processed faster than those without.

If you're looking for help, a quick look at the list archive may actually provide the answer. I usually send source samples when answering libxml2 usage questions. The auto-generated documentation is not as polished as I would like (i need to learn more about DocBook), but it's a good starting point.

How to help

You can help the project in various ways, the best thing to do first is to subscribe to the mailing-list as explained before, check the archives and the Gnome bug database:

  1. Provide patches when you find problems.
  2. Provide the diffs when you port libxml2 to a new platform. They may not be integrated in all cases but help pinpointing portability problems and
  3. Provide documentation fixes (either as patches to the code comments or as HTML diffs).
  4. Provide new documentations pieces (translations, examples, etc ...).
  5. Check the TODO file and try to close one of the items.
  6. Take one of the points raised in the archive or the bug database and provide a fix. Get in touch with me before to avoid synchronization problems and check that the suggested fix will fit in nicely :-)

Downloads

The latest versions of libxml2 can be found on the xmlsoft.org server ( FTP and rsync are available), there are also mirrors (France and Antonin Sprinzl also provide a mirror in Austria). (NOTE that you need both the libxml(2) and libxml(2)-devel packages installed to compile applications using libxml if using RPMs.)

You can find all the history of libxml(2) and libxslt releases in the old directory. The precompiled Windows binaries made by Igor Zlatovic are available in the win32 directory.

Binary ports:

If you know other supported binary ports, please contact me.

Snapshot:

Contributions:

I do accept external contributions, especially if compiling on another platform, get in touch with the list to upload the package, wrappers for various languages have been provided, and can be found in the bindings section

Libxml2 is also available from GIT:

Releases

The change log describes the recents commits to the GIT code base.

Here is the list of public releases:

2.9.0: Sep 11 2012

2.8.0: May 23 2012