Storing HTML data in XML files, in a PHP application

By Caroline Liu

Recently, I have been working on a PHP website with no SQL support (yeah, don’t ask), but needed to maintain a small database of pages and posts that could be updated via a WYSIWYG web interface. I ended up going with the following configuration:

  1. Store data in XML files.
  2. Handle XML data in PHP using the SimpleXMLElement class
  3. Allow WYSIWYG editing with CKEditor, an Javascript-based open-source text editor

This is a very simple set-up, but there are a few things to note:

  • To store HTML data in XML, the angle brackets < > need to be properly escaped; otherwise, all HTML elements will be read as XML elements! So, run the content through htmlspecialchars() first, or through filter_var() with the FILTER_SANITIZE_FULL_SPECIAL_CHARS flag if you have a newer version of PHP (>5.2).

  • When XML files are loaded with simplexml_load_file, escaped HTML entities are unescaped! This makes displaying stored HTML data really easy (you can literally just echo the content retrieved from the XML file), but it is important to remember to escape this data if you need to load it into a form for web editing!

  • With the default CKEditor settings, Javascript mysteriously disappears after you save it. Where did it all go? It turns out that CKEditor disallows script and noscript tags by default. To allow these tags, open the config.js file in the root folder of your CKEditor installation, and add the following line to the CKEDITOR.editorConfig function:

    config.extraAllowedContent = 'noscript script[*]';

    This means “allow noscript and script tags, with any attributes ([*])”. Note that [*] applies to both noscript and script tags. If you wanted [*] to only apply to script, you would do this:

    config.extraAllowedContent = 'noscript; script[*]';
  • Content from CKEditor must be escaped before it’s stored in XML. Even though CKEditor looks like a real text processor, it really is just a Javascript interface placed on top of a <textarea> element, and we all know that user input in regular HTML form elements don’t escape themselves.

Here’s a small demo. Grab the following code and save it into a PHP file. CKEditor is optional, so feel free to remove the <script> tags if you don’t want to play with CKEditor.

<?php

/**
* Loads HTML data from an XML file 'sample.xml' and displays it in a
* form for editing. On submit (via post), saves the updated HTML data
* back into 'sample.xml'.
*/


if (isset($_POST) && isset($_POST['editor1'])) {
// Get text from the editor and escape it.
// I'm using htmlspecialchars for backwards compatibility.
// Use the newer filter_var if you can.
$text = htmlspecialchars($_POST['editor1']);

// Make a new SimpleXMLElement with 'article' as the root node.
$node = new SimpleXMLElement('<article></article>');

// Add a 'content' child node to 'article'. This holds all our html data.
// In an actual implementation, you would add more child nodes to 'article'
// like this, for title, author, date, etc.
$node->addChild('content', $text);

// Export the node to an XML string.
$xml = $node->asXML();

// Write the XML string to file.
file_put_contents('sample.xml', $xml);
}

// Load XML data from file into a SimpleXMLElement.
$xml = simplexml_load_file('sample.xml');

// Get the text from the 'content' child node and escape it. We'll be echoing
// this into the form to allow editing.
$text = htmlspecialchars($xml->content);
?>


<html>
<head>
<title>HTML data: Storing in XML, editing with CKEditor</title>
<script type="text/javascript" src="ckeditor/ckeditor.js"></script>
</head>

<body>
<!-- Our page editing form -->
<form action="" method="post">
<textarea name="editor1">
<?php
// Echo our escaped HTML data into the textarea.
echo $text;
?>

</textarea>
<input type="submit" value="Submit">
</form>

<script>
// Add CKEditor to textarea with name 'editor1'.
CKEDITOR.replace( "editor1" );
</script>
</body>
</html>
Fetching comments...