Storing HTML data in XML files, in a PHP application

Recently, I have been working on a PHP website with no SQL support (yeah, don’t ask), but needed to maintain a small database of pages and posts that could be updated via a WYSIWYG web interface. I ended up going with the following configuration:

  1. Store data in XML files.
  2. Handle XML data in PHP using the SimpleXMLElement class
  3. Allow WYSIWYG editing with CKEditor, an Javascript-based open-source text editor

This is a very simple set-up, but there are a few things to note:

  • To store HTML data in XML, the angle brackets < > need to be properly escaped; otherwise, all HTML elements will be read as XML elements! So, run the content through htmlspecialchars() first, or through filter_var() with the FILTER_SANITIZE_FULL_SPECIAL_CHARS flag if you have a newer version of PHP (>5.2).
  • When XML files are loaded with simplexml_load_file, escaped HTML entities are unescaped! This makes displaying stored HTML data really easy (you can literally just echo the content retrieved from the XML file), but it is important to remember to escape this data if you need to load it into a form for web editing!
  • With the default CKEditor settings, Javascript mysteriously disappears after you save it. Where did it all go? It turns out that CKEditor disallows script and noscript tags by default. To allow these tags, open the config.js file in the root folder of your CKEditor installation, and add the following line to the CKEDITOR.editorConfig function:
    config.extraAllowedContent = 'noscript script[*]';
    This means “allow noscript and script tags, with any attributes ([*])”. Note that [*] applies to both noscript and script tags. If you wanted [*] to only apply to script, you would do this:
    config.extraAllowedContent = 'noscript; script[*]';
  • Content from CKEditor must be escaped before it’s stored in XML. Even though CKEditor looks like a real text processor, it really is just a Javascript interface placed on top of a <textarea> element, and we all know that user input in regular HTML form elements don’t escape themselves.

Here’s a small demo. Grab the following code and save it into a PHP file. CKEditor is optional, so feel free to remove the <script> tags if you don’t want to play with CKEditor.

<?php

    /**
     * Loads HTML data from an XML file 'sample.xml' and displays it in a
     * form for editing. On submit (via post), saves the updated HTML data
     * back into 'sample.xml'.
     */

    if (isset($_POST) && isset($_POST['editor1'])) {
        // Get text from the editor and escape it.
        // I'm using htmlspecialchars for backwards compatibility.
        // Use the newer filter_var if you can.
        $text = htmlspecialchars($_POST['editor1']);

        // Make a new SimpleXMLElement with 'article' as the root node.
        $node = new SimpleXMLElement('<article></article>');

        // Add a 'content' child node to 'article'. This holds all our html data.
        // In an actual implementation, you would add more child nodes to 'article'
        // like this, for title, author, date, etc.
        $node->addChild('content', $text);

        // Export the node to an XML string.
        $xml = $node->asXML();

        // Write the XML string to file.
        file_put_contents('sample.xml', $xml);
    }

    // Load XML data from file into a SimpleXMLElement.
    $xml = simplexml_load_file('sample.xml');

    // Get the text from the 'content' child node and escape it. We'll be echoing
    // this into the form to allow editing.
    $text = htmlspecialchars($xml->content);
?>

<html>
    <head>
        <title>HTML data: Storing in XML, editing with CKEditor</title>
        <script type="text/javascript" src="ckeditor/ckeditor.js"></script>
    </head>

    <body>
        <!-- Our page editing form -->
        <form action="" method="post">
            <textarea name="editor1">
                <?php
                    // Echo our escaped HTML data into the textarea.
                    echo $text;
                ?>
            </textarea>
            <input type="submit" value="Submit">
        </form>

        <script>
            // Add CKEditor to textarea with name 'editor1'.
            CKEDITOR.replace( "editor1" );
        </script>
    </body>
</html>
  • RickMick

    This is what I have been looking for and thank you. I have tried using your code and the editor appears but it does not echo the xml text. I change the name of my xml file to sample.xml and place it the same folder with the script.

  • pat

    thanks. this helps alot

  • Kasey

    Apologies in advance for my noobery, I’m very new to coding in general. I’m wondering if this would work for an integration I’m working on. The current state of the project is that a standard HTML input form collects username/password/site. Upon submitting, it checks the given site for enabled services via XML API and returns those in a series of select dropdowns. What I want is for the user to be able to make their selections from the dropdowns and submit those as defaults. The issue here is that the service that my integration is talking to has no ability to accept these defaults, so I need them to be stored locally for use later. Doable with this?

  • sdfdsf

    sdfdsdsf

  • Jeff Majors

    Just a note: you don’t need to use “file_put_contents()” function to save XML. Just put a filename in “$xml->asXML($filename)” and it will save the string as a file.