Search

I'm just learning the ropes and have a question about Symphony's publishing/content delivery.

I'm interested in applying the CMS to an existing non-blog site -- but I want to avoid increasing per-request server processing for current content or producing content which is dependent on Symphony's specific content delivery system (php, mod_rewrite, et al). In short, I'd like to adjust Symphony's publishing routines to build flat html files (via XML/XSLT) directly from the CMS upon an author-triggered publish event.

My searches haven't turned up much, but I suspect I've missed relevant pieces of documentation. Any advice or links to docs, extensions, etc., would be appreciated.

Thanks!

You could change the mod_rewrite rules but you're going to need PHP to transform the XML with XSLT using Symphony.

What exactly are you trying to do? It doesn't sound like you want to use a CMS.

I don't know what the browser support is like for converting XML with XSLT, but I think that would be your only option.

Thanks for the feedback. To be clear, I'm distinguishing here between content management (creating, editing, publishing content) and content delivery (serving content to site visitors on request, and performing whatever dynamic assembling of disparate content components required thereof). By now, most major CMS apps provide some degree of content delivery for user convenience, but it's nevertheless secondary to the primary function of a CMS. Just about every CMS I've worked with (with the notable exception of ColdFusion) allows the decoupling of CMS and content delivery, freeing the user to serve content in whatever preferred way (eg, as flat files -- or via PHP, Mason, CGI, ASP, etc).

I need a CMS which facilitates flexible, highly customizable content management, which Symphony does do. And I very much like the idea of storing data and templates as XML and XSLT. But I need to publish the data components into fully assembled HTML files prior to serving it to site visitors. In other words, the work that Symphony, by default, is configured to do every time a visitor requests a content page, I want to do 'once' -- via a 'publish' event triggered by a CMS user -- in order to accommodate the existing content architecture requirements of the site that I want to manage with Symphony. (Such a publishing system has other benefits, as well, such as reducing per-request server load.)

I expect Symphony already supports this -- or that other users have modified it to do so -- since such would be necessary to apply Symphony to certain existing sites already ensconced with on other content delivery frameworks/engines. In any case, I want to be sure no such support or modifications exist before wading into the code to add the functionality.

I can recall an earlier discussion about this, where somebody ended up using a sitemap script which writes back the entire website as static files. So, it seems you will not have to write it your self.

However, I can't find the discussion anymore. Maybe your search skills are better than mine ;)

That's an interesting approach -- I'll have to see if I can find the article, or else research sitemap for the purpose.

There are complications, though, which may render Sitemap (or similar solutions) not applicable to my circumstances. In fact, I need to write raw PHP, Perl, and Ruby files to manage the sites in question.

I'm thinking the solution should be fairly straightforward -- all that's seemingly needed is a publish script that approximates the XML->XSLT content packaging that's currently happening per content request. So even if no one's created a publishing feature that works quite as I'm describing, the essential functionality must already be in place for the re-purposing.

Is it this discussion? Output site to flat file

Hi Charlie and welcome :-)

This is a great question, and hopefully half of the work has been done for you. Stuff I'm going to mention here includes proactive vs. reactive caching, and time vs. event expiry/purging operations.

There are presently two extensions which approach the caching problem, gladly both from opposing directions. I'll weigh up the pros and cons here, and maybe you'll decide one of these fits your needs in their current guise.

CacheLite

The first is CacheLite. CacheLite is a PHP class built for buffering page output and caching it to disk. It's included in a Symphony extension of the same name. A request hits your site (a visitor) and if the page has not been cached before, the page is built in the usual way (a full Symphony process) and served to the user in the usual way, only, the generated HTML is also saved to a text file. Subsequent requests to this URL serve up the text file rather than building an entire Symphony page.

This is made possible by Symphony's delegates (explanation and full list). These are "hooks" (method calls) peppered across the system to which extensions subscribe, and provide their own callback methods. There happens to be a delegate that executes just before the built page is returned to the user. Callback methods of this delegate are handed the page XSL, XML and generated HTML. CacheLite subscribes, receives the HTML string, and saves to the /manifest/cache/ folder, using a hashed string of the URL as the filename.

Subsequent requests still go through mod_rewrite, and still instantiate Symphony, because of the need to first establish that the page URL is valid. However no database queries for building content are performed; only one to build your session (Symphony stores session references in its database), another one or two to check the page URL against your site page structure, and probably a few other queries for housekeeping. This performance gain is significant, and you will be able to handle much more traffic. But, of course, it still puts each request through PHP, albeit much lighter processing (CacheLite uses the first available delegate in Symphony's page lifecycle to serve the cached file before exiting the lifecycle entirely).

CacheLite text files are purged in two ways:

Firstly by time (reactive) — the extension provides a configuration panel (Symphony > System > Preferences) from which you define the expiry TTL (time to live) in seconds. Could be 10 seconds, could be 604800 for a week. (It also lets you define URLs to ignore, in case you need parts of your site to remain dynamic.)

Secndly, we recently added the ability to purge cache based on content updates themselves (proactive). CacheLite can optionally maintain a reference of what content (entries) appears on what URL (this is ascertained when the page is built, and entry IDs read from the page XML as the CacheLite text file is written). The extension subscribes to further delegates (in the backend this time) so that when an entry is created or updated, any URLs (text files) rendering this entry are purged. Their cache will be rebuilt when the URL is next requested by a visitor.

The benefits of reactive caching are obvious: install the extension and you're done. No additional workflow for authors, and immediate benefits to your users. Also, it can cache things like search results, filters (when you have potentially infinite combinations of URL parameters), pagination etc, since each resolves to a unique URL. Now that you can purge the cache based on author activity in the Symphony backend, you get the benefit of both the reactive (frontend) and proactive (backend) approaches.

Static Site Exporter

The second extension is the Static Site Exporter (related discussion). It's not fully released, as it's more of a proof of concept than a polished application. Its inception was born from the need to use Symphony on a private staging environment, but publish the website to a production environment that did not have the required software (no PHP or MySQL).

In short, the Static Site Exporter provides an interface in the backend (a button...) which instantiates a web crawler process that indexes all pages on your site. Starting from the homepage, hyperlinks are parsed from your page HTML and followed one by one. Pages are saved as static HTML documents in a directory structure that mirrors the mod_rewrite rewritten URLs. An author makes their content changes, navigates to the Static Site Exporter page in the backend, clicks "Crawl" and once that is complete clicks a "Publish" button to FTP the changes. (That's the theory at least.)

This is a purely proactive approach and means your published site is entirely decoupled from the CMS. The assumption, therefore, is that you cannot do anything dynamic on your site: no filtering, searching or receiving of user content (comments etc). In reality you can get filtering/pagination working, so long as your URLs are nice and clean (/articles/1/ rather than /articles/?page=1). The downside is that it's a manual process to publish content.

Hybrid?

If neither of these meet your needs then you could marry the two together. You could take the proactive part of the CacheLite extension (subscribe to backend delegates, to purge/rebuild pages on content editing/creation), but use the static directory structure publishing of Static Site Exporter. It's technically possible, but a couple of challenges spring to mind:

Firstly, Symphony is an object based CMS, rather than a page based CMS. Authors create content (entries) in silos (sections) which are then rendered in whatever ways necessary on pages, whereas other CMS make a 1:1 relationship between a page (a URL) and its content. Should you decide to cache an entry's URL when it is saved in the backend, you will need some sort of mapping of entries to URLs, since an entry itself is not aware of the pages/URLs on which it could be displayed.

I created the Entry URL Field which could be useful here. Add the field to a section and define an XPath expression which, when evaluated against the entry's built XML, creates a string. For example, you might give the field an XPath /news/article/{title/@handle}/, and provided the entry has a Title text input field, upon saving the Entry URL Field would substitute the title value into the string (e.g. /news/article/my-new/article/). The original idea behind the extension was to keep URL definitions at an entry level, rather than littering XSLT with URL references. So your delegate callback, listening for an entry create or entry edit event, would look for existence of an Entry URL field in the entry, and would cache the page at this URL (perhaps a CURL or file_get_contents() request) into a directory structure of the same value.

Secondly, this approach only works for pages that represent an entry. What about "aggregate" pages such as a homepage or index/category listing page. These pages aren't defined by a single entry, so how would you know to rebuild these pages? You could use the CacheLite mechanism of maintaining a list of pages on which a new entry should appear. For example if you have a section Blog, then CacheLite will purge pages in two ways:

  • when an existing entry is updated, pages currently featuring this entry will be purged (and rebuilt on next request)
  • when a new entry is created, all pages selecting content from this entry's section (Blog) will be purged

This is "pessimistic" purging: we need to purge the cache for all pages that make up the blog, because our new entry might now appear on any one of them (the top of the blog homepage, the date-based blog archive, paginated archive pages, the "latest posts" list on every other blog post page etc). You could perform the same operation in your extension, but it is expensive. CacheLite just needs to purge (delete a text file) for every URL that is to be reactively regenerated. This is a cheap operation, even for hundreds of pages at a time. But if you are proactively caching then you will need to not only purge, but also rebuild (CURL requests) all of these URLs. If you've got several dozen or more, then it's going to take more than a few seconds, and authors are going to hesitate before clicking the "Save Entry" button!

With a bit of luck one of the two above will satisfy your needs. For very simple sites then proactive Static Site Exporter is worth a shot, but for anything more complex then the reactive CacheLite affords you greater flexibility. But if neither are spot on, then at least pull CacheLite apart (see the extension.driver.php file) which includes all delegate subscriptions and their callback methods — it's a perfect starting point.

Let us know how you get on!

woah… Comments such as the above make me ♥ the Symphony community. Kudos Nick!

Great explanation nick,
About the cachLite i had an issue regarding purging the cache by updating and deleting an entry...
Can anybody confirm this issue?

Thanks for the very thorough response, Nick -- you addressed a good number of the considerations I'm trying to account for here. Many ideas for me to reflect on in your post. I'll have to ruminate a bit and try a few things -- will update soon to let the board know progress.

I'm a bit pressed at the moment and must be brief, but in short, some details of the sites to which I'm looking to apply Symphony are muddying the view for best way forward. Among them is something you touched on; applying Symphony to websites with private staging and production environments. In fact, I need to accommodate multi-staged development, staging, and production environments spread across different networks (an overcomplicated situation owing to legacy considerations) -- and here the requirement is that files managed in a CMS need to be published to several environments simultaneously. But it's worse than that: several technologies are at play -- PHP, Ruby, Python, etc -- there are many instances where the CMS is required to publish raw PHP, Python, etc., files and components into these various environments -- and in the circumstances I can imagine no practical remedy for this.

I've never seen anything quite like Symphony -- its flexibility and the organization of its UI, in particular, make it appealing to me -- also like the utilisation of XML/XSLT, though this is new for me -- just starting to explore implications. Symphony seems like a good way forward for our new content items and evolving requirements, generally, though it may be pissing in the wind to hope for any one system to fully replace the Frankenstein's monster our sites have become. At the moment my goal is to implement an incremental move to Symphony for the bulk of our content needs, the first step of which is, of course, figuring out just how far I can bend it to fit into the current circumstances.

Again, thanks for all the info, Nick -- you've offered much for me to think about. Thanks also to CreativeDutchman, moonoo2, and everyone else who replied. I'll be spending the next few days experimenting and will post updates as things progress.

In light of this discussion: does anyone have any experience with MemCache in combination with Symphony?

Not directly. On a few sites I've worked on I've seen developers modify data sources so the XML response is cached using APC though. Very similar to Cacheable Datasource (another discussion), which uses flat files.

We should start a new thread specifically about APC/memcache to discuss further if you fancy ;-)

XML response is cached using APC

I'd love to hear more about this.

I am trying to achieve a similar thing; publishing my locally hosted site pages to Google App Engine as a static website. So all I need is a nice directory structure and xhtml files.
Since the static site extension has some problems, I thought about using curl (I am on a mac). But I really like the idea of cachelight updating based on content updates:
What if rather then using a script to crawl all the pages, use a script that visits all the pages, just one time upon setup. So all pages are in cache. And new pages will be too. And then somehow transform the hashed page files into files in nested directories.... Then again I'd still need to add a cron event to publish the new state to google, so it updates. So if a cron is needed per updated entry, anyway, it might be silly to automate the update of the files in the first place.
So in stead of semi-manually updating I might be better of using cirruxcache. Maybe the topic starter can find some inspiration in that app?
Or maybe he can push his symphony pages to github and then use GitHub post-receive hooks to push it further to all his sites in the formats needed, like drydrop or the payed github pages feature does?

newnomad for your requirements the Static Site Exporter is exactly what you need. Have you tried it? I think the only issue is a minor one of following relative rather than absolute URLs. Should be easily fixed if you have the inclination, and would save you having to visit every page or faff around with CURL.

If you're wanting to use it, then if you find any bugs please report and we'll try and fix it up.

Well for my scenario (try to limit commandline, as a designer) I believe the best approach would be symphony hosted locally, indeed with the (fixed) static site exporter, and with git for mac (or another gui gitapp), pushing my static pages to github pages as CDN, which is free now. Possibly also push pdfs, big images elsewheres. Search and discussion can be replaced by free bing search, and discus:features to services. Others might find inspiration in the links, also amazon has a cache service now.
I'll be revisting SSE again.

moved to the apc memcahce thread

Create an account or sign in to comment.

Symphony • Open Source XSLT CMS

Server Requirements

  • PHP 5.3-5.6 or 7.0-7.3
  • PHP's LibXML module, with the XSLT extension enabled (--with-xsl)
  • MySQL 5.5 or above
  • An Apache or Litespeed webserver
  • Apache's mod_rewrite module or equivalent

Compatible Hosts

Sign in

Login details