Search

Currently working on some client's website that has his website shared with other domains/web services they run. And obviously are not too happy with the corporate website taking some time to load.

When other services are on full load I even had to wait over 20sec sometimes to load some pages. Whilst asking them to upgrade/separate their servers and give some priority this is bound to take some time more likely to be months rather then days and i've been asked to make the website as efficient as possible.

I am already using DS caching on some of the most used data-sources however I would like to speed up XSLT transformation; I cannot really use something like cachelite as they have dynamic data coming from different feeds and sources and thus some of the pages change quite frequently such as the homepage.

I was thinking of something like a partial html cacher where one would be able to say cache a section or block of code - like the footer and header of the page (assuming these do not change) and then replace each block with the cached version.

Would like to hear some ideas on weather you think this would be a good idea and if it would have any value to the symphony community.

When other services are on full load I even had to wait over 20sec sometimes to load some pages

Yikes!

however I would like to speed up XSLT transformation

I think this could be a useful idea, but are you sure that it is your XSLT transformations that are causing slow page loads? Would you mind providing the output of a page ?profile (data sources, page generation times etc) and we'll take a look.

The reason I ask is because I have only ever rarely found XSLT to be the bottleneck. If you're processing many thousands of lines of XML in complex ways, then yes, but the vast majority of Symphony pages I've found the XSLT to be mighty fast. Tweaking this layer would be micro optimisation only. Although I haven't seen the structure of your site, I'm going to suggest that it's the data sources that need optimisation and not the XSLT:

  • do you have complex Symphony data sources that take a long time to run (queries from the database)?
  • do you have dynamic XML data sources pulling content from another site, and are these causing the bottlenecks?

If the latter, then you can solve this by pulling this content asynchronously. Dynamic XML data sources grab new content at rumtime (when a user loads the page the DS is attached to), and if the origin server takes a long time to respond, then your own page will take a long time too. To prevent this being the cache, you can write a small PHP script that grabs and caches this XML to disk, and execute this script via a scheduled cron task. You then load the static, local XML file in your XSLT, and it'll be much faster — the user has to wait for the content to refresh since it occurs asynchronously to your site page loads.

And obviously are not too happy with the corporate website taking some time to load.

Is the corporate website taking content from the Symphony site? In which case you should implement caching at both ends.

Thanks for the quick reply. This is a sample from when the site is working properly and is pretty fast. They suffer most issues between something like 7-11am where their website and their other services are under heavy load. Only during that time there are these enormous load time issues.

Engine Initialisation   0.0000 s
Page creation process started   0.0754 s
XML Built   1.3312 s
XML Generation  0.1868 s
Page Built  3.7144 s
XSLT Transformation 0.7399 s
Page creation complete  4.5298 s

The page in particular has no external live data its just a news section in symphony. The other pages that require live data there an additional 0.7s in DS generation due to a fetch from an external source (where possible I am trying to cache and avoid the external call not always possible as this is a financial firm and on some pages they require data that is live and not cached every few minutes...)

The corporate website is their symphony one. The rest of their websites/services run on normal php/java If i remember correctly and their hosting department are blaming symphony, while I ask them to upgrade so I was asked to devise a list of possible improvements... Unfortunately this one of the few websites I can optimize from my side; the rest I have no control over.

I am assuming that it is a server issue if once I noticed something of around 10s on Page creation process started which I don't think should include some heavy computation... As for optimization what I could think of from my end was to have all Datasources extend CachableDatasource while optimizing some as they do have a massive ds that is included on all pages for translation purposes (even though this is cached using CachableDatasource) this is what I think could have an impact on the XSLT transformation due to the large amount of translations required and used in-text.

Would you be able to post the profile output of data sources, and your query list too (removing any sensitive data if it exists)?

I remember Allen saying some time ago that he once found a shared host who was throttling MySQL queries. Symphony might require several hundred per page load, and if the database is throttling these, they might get queued up and take longer.

I'm not sure why XSLT transformation takes 0.7s yet an overall page takes 4.5s. Not sure where the bottleneck is precisely...

Yes sure here it is

i18n    0.0027 s from 0 queries
monthly_archives    0.0607 s from 1 query
page_hierarchy  0.0069 s from 0 queries
post    0.4846 s from 171 queries
post_categories 0.0185 s from 8 queries
post_category_filter    0.0091 s from 3 queries
post_comment_counts 0.0624 s from 37 queries
post_id 0.1534 s from 3 queries
post_single 0.0393 s from 3 queries
recent_posts    0.0576 s from 28 queries
recent_posts_market_commentaries    0.2317 s from 46 queries
widgets 0.1349 s from 19 queries
post_comments   0.1249 s from 13 queries
signature   0.0073 s from 2 queries

For the same page as output before. Post might be a bit big due to the large number of entries (950+) and running a filter regexp:{$post-handle:.+} on the title to check if empty or not due to all posts not being translated into all languages. The two largest DSs are cached hence 0queries. Eventually planning on changing all datasources to extend Cachable Datasource so it should make loading time much lower.

also copying an uncached version of the DS maybe it helps

i18n    0.3718 s from 9 queries
monthly_archives    0.0723 s from 1 query
page_hierarchy  1.4339 s from 402 queries
post    0.6572 s from 171 queries
post_categories 0.0412 s from 8 queries
post_category_filter    0.0130 s from 3 queries
post_comment_counts 0.0700 s from 37 queries
post_id 0.1642 s from 3 queries
post_single 0.0491 s from 3 queries
recent_posts    0.0812 s from 28 queries
recent_posts_market_commentaries    0.4731 s from 46 queries
widgets 0.1222 s from 16 queries
post_comments   0.1219 s from 13 queries
signature   0.0083 s from 2 queries

Small note page-hierarchy is is the DS that generates all the menus/sitemap links and is attached to all various page types. Required since pages are built on top of homepage using parameters for each language. So that a menu-hierarchy can be output so I doubt I can improve that by much..

Would I be correct If I say that the only thing(s) I can optimize from my end is Datasources and their efficiency? And potential partial html caching (was not too convinced of the gain of this but considering it since I was asked to speed it up a bit) I'd like to point out the website however they insisted in not being mentioned.

@Nick I am also considering extending CachableDatasource due to the infrastructure. Currently having issues to purge the cache as their host doesn't allow me to purge cache on the separate servers. So they have to be hit one by one.

If I had to set a new variable fetched from DB which would state last time it was purged, and then use this variable in order to clear or not clear the cache. So in this way I can purge all three once. This would then allow me to extend the purge to become automatic on entry save for some of the data sources.

Would you think any of that would make sense? I know i will be adding a DB call when I want to reduce but its the only thing I can think of being shared between servers. + its my first experience of such environment usually work on single server systems...

gunglien, take a look at CloudFlare

It does not make symphony run any faster but it does provide you with caching, a CDN, security and all kind of great stuff. For fee! All you need is the ability to change something in the DNS record of your clients domain. (the CNAME record for www)

I know i will be adding a DB call when I want to reduce but its the only thing I can think of being shared between servers.

A simpler solution would be to store the cache into the database anyway, rather than the file system. You could modify the CacheableDatasource class so that persists the cache into Symphony's own sym_cache table (or maybe just your own custom table) for this. This gets around the problem of multiple application/file servers and means you have a central place from which to purge should you wish.

Granted, it's a database query to retrieve the cache but the speed difference should be negligible compared to the existing time to read a file from disk. One query is still better than dozens!

From your stats above I think you need to focus on these data sources. Between them they account for ~3s of page load.

i18n                                0.3718 s from 9 queries
page_hierarchy                      1.4339 s from 402 queries
post                                0.6572 s from 171 queries
recent_posts_market_commentaries    0.4731 s from 46 queries

Do they contain XML that you don't need? Do you have the "Include a count of entries in associated sections" checkbox ticked on these data sources when you don't need the value?

402 queries for a navigation is incredibly high, so I would suggest refactoring this. Perhaps rewriting it as a custom query specific to your site. How is your PHP/SQL?

Similarly i18n is only 9 queries but the time is high. Is this a normal section? Any extensions? How many entries? Perhaps MySQL doesn't have table indexes set up properly. That's one avenue to go down.

Have you also checked with the host that MySQL queries are not being throttled?

Post might be a bit big due to the large number of entries (950+) and running a filter regexp:{$post-handle:.+} on the title to check if empty or not due to all posts not being translated into all languages

Is there scope for refactoring this with something more efficient perhaps?

@zimmen unfortunately that's not an option they have sensitive data that cannot go on something like CloudFare.

@nick

Granted i18n and page_hierarchy need some optimizing. However I do feel that I am partially limited in the gains I can make in both datasources hence why they were cached by their previous developer.

i18n at the moment contains something around 330 entries. a simple 2 field section which uses Multilingual Field on one of the fields to be used as translation for template related stuff. Most of this is tied to things like header/footer and widgets that happen all over the various pages. A few others show only on particular pages with some parameter combinations only. This DS takes up 1/4th of my page when debugging certain pages. talking about 1300 lines of XML. To insert the translated text the previous developer made use of a template that calls the key and finds the translation from XML data. (hence why I wanted partial html caching)

Similarly Page Hierarchy contains over 112entries which contains much more complicated data structure but is required on every page to show both top menu as well as a sitemap down the bottom of the page. I am looking to reduce this as well however it takes 2000 lines in my XML when I debug. Which I do think its a bit too much. But since they were not pressing with site-speed before I did not look into optimizing these since I didn't create them in the first place.

As regards throttling still waiting for a concrete answer however I doubt this as the MySQL Server should be dedicated (pr so they say) but might still suffer from heavy traffic since it has 3 servers querying it.

I will try to create cache in DB then if you think it's not that big an impact. However I was slightly worried due to the size of DSs however these I will try to streamline as much as I can and will probably have to use some custom queries/filters, to reduce data depending on parameters as they are attached irrelevantly of the page they are on.

Nick currently working on the DS; its nearly all up and running. Cache in DB - Flushing cache from back-end done. Now I am trying to flush entries when saved. ie if i change an entry it would update all related data sources.

You wouldn't know if it would be possible to check if a field's value has been changed? As it doesn't make sense to flush eg a DS containing titles when only the content was changed.

Hopefully if this works out properly will try release it as a new extension inspired from Nick & CachableDatasource as I don't think it would be an upgrade as it changes from file to database for caching.

Now I am trying to flush entries when saved. ie if i change an entry it would update all related data sources

Take a look at CaccheLite which implements this behaviour. It has a database table storing which entries appear in which page. When that entry is saved, those pages are purged from the cache. It doesn't check per-field, just per-entry — I wouldn't know where to begin just checking the field values individually!

Ah ok no big deal to be honest. Have got that part implemented storing into db a set of parameters that a user can insert to match a change. Say a page title and then match it with the actual value prior to flushing.

Will go through the extension again make sure it works on my local and then hopefully share my first piece of work with other symphonists :)

Nick I've got this one nearly done... just tried to implement into into some other environments before pushing this extension on live and I noticed I have some minor issues (not sure will run into these on live but would like to clarify)

On some items I have text fields as title and wherever there is an & included I get xmlParseEntityRef: no name in Entity I assume this is because the & is not encoded. But it is strange that the XML seemed valid before enabling Cacheable Extension however invalid afterwards; If I disable the check in extension symphony also throws an error... any clue on a fix?

Also would I be allowed to fork your Cacheable Extension on git-hub to release a db enhanced version?

Hmm, which version of the extension are you using? I think I had the same problem myself and subsequently fixed it in this commit.

Also would I be allowed to fork your Cacheable Extension on git-hub to release a db enhanced version?

Its open source! You can do what you want :-)

Maybe you'd like to include the option, using file or database, so this remains as one extension?

Yep that was the fix.. forgot to check for updates before i started working on it... Since we were only using it on text areas and other stuff before we never came across the issue on cacheable.

Hmm about the file/db I can check if I can re-implement it as I am using 2 tables at the moment one for caching purposes and another for flushing purposes where I am keeping a param list & section id so I can easily compare on update what has to be flushed. Seems to work ok so far on my tests but quite sure it can be improved.

Also added a new variable Lastupdate which tracks when the file should have been last updated; if older then this date ds is flushed. Over the weekend will start playing with git-hub as am quite new to it and I'll see If I'll be able to do it the proper way :)

Nick all is nearly done and results look quite good so hopefully soon enough will try to find time to play around with github and push the new extension.

just noticed a minor issue. in some instances when I have cache enabled and I try debugging the page the xml of some datasources appear on a single line. Only a small group of datasources have this strange behaviour and everything works ok if I flush my DS to get a fresh copy/disable cache.

Tried to have a look at both my code and debug dev-kit and couldn't really understand what's going on. Just asking in-case you had the same issue. (ps on Debug Devkit v1.1 I was getting no XML at all when I updated to v1.2.1 xml appeared for all DS but a couple of them are squeezed into a single line.

For sure this seemed to work paired with a few more changes page load averages below 1s for most pages now :)

I try debugging the page the xml of some datasources appear on a single line

This is a bug with the Debug Devkit. A Symphony data source can return either an XMLElement object or a string that represents an XML fragment. The cached data sources return a string (read from the cache file), and Symphony doesn't quite parse this properly.

For sure this seemed to work paired with a few more changes page load averages below 1s for most pages now :)

Great!

This is a bug with the Debug Devkit. A Symphony data source can return either an XMLElement object or a string that represents an XML fragment. The cached data sources return a string (read from the cache file), and Symphony doesn't quite parse this properly.

Ahh ok thought I messed something up since it was inconsistent :) will try play with github later today/tomorrow then since everything else looks pretty good :)Blockquotedoesn't quite parse this properly.

Create an account or sign in to comment.

Symphony • Open Source XSLT CMS

Server Requirements

  • PHP 5.3-5.6 or 7.0-7.3
  • PHP's LibXML module, with the XSLT extension enabled (--with-xsl)
  • MySQL 5.5 or above
  • An Apache or Litespeed webserver
  • Apache's mod_rewrite module or equivalent

Compatible Hosts

Sign in

Login details