“Datasource caching” – Forum Thread – Discuss

- creativedutchmen
- 25 Nov 09, 1:15 am
- Comment #1

I really like the cachelite extension, but for a lot of websites datasource caching would be a better solution.

Is there an extension that does just that, or should I start writing my own?

- creativedutchmen
- 25 Nov 09, 1:20 am
- Comment #2

The way I want to implement this is the following:

If a user adds/edits/removes an entry, the extension sets a flag on the datasource.

Then, when the frontpage (which loads from the datasource) is refreshed, it re-queries the database for the new data, and generates the cache files.

How to handle random orders and linked sections I don’t know yet, any idea’s?

- nickdunn
- 25 Nov 09, 1:58 am
- Comment #3

We’ve implemented Data Source caching in Symphony on several occasions (fragment caching) but abandoned just short of implementing into the core. There isn’t sufficient access to Data Sources via delegates to make a good job of this (which is fair enough, delegates are more for UI workflow than system/data workflow). As such, each site has had slightly different requirements and I’ve ended up customising the core.

Our basic working was:

added a dsParamCACHE member to a Data Source, which holds the cache frequency in seconds
added a text field to the DS Editor to enter in this time and save it to the generated DS PHP file
every time a DS is executed, check to see whether it has expired or not (by checking the last modified date of its cached file)
if it has expired, run the DS and cache the XML output to a static .xml file in the /manifest/cache directory (use a hash of the public properties of the DS as a unique file name)
if it has not expired, serve the static XML file instead of running the Data Source

It worked relatively well until you think of the variations a DS can have. Imagine a Data Source that allows for dynamic sorting and paging of entries. Every single combination of parameters would result in a new cache file, which could run into the thousands.

How about dependencies between Data Sources? If you are chaining DSs using Output Parameters, if a “parent” DS providing an Output Param changes, how do you trigger the update or all “children”? And if the DS provides Output Params itself, you would need to cache these parameters along with the XML output, and push them back into the parameter pool at runtime. We kind of solved this last problem (putting the Output Params back into the pool) but I’m not 100% convinced of its implementation.

Here are two functions that were added to class.datasource.php: http://pastie.org/712879

Then modified the __processDatasources function inside class.frontendpage.php, where there is a loop and the grab() function called on each DS, change this to call the fetch() method (which is added with the code above, which checks the cache and calls the original grab() if the cache has expired).

In a DS that implements caching we would add:

public $dsParamCACHE = '1440';

What about several Data Sources that select entries from the same section? Your delegate callback will need to modify one or many Data Sources when content is edited.

More recently I have been modifying Max’s CacheLite extension (page output caching) to implement more closely what you are looking for — as each page is cached, it stores a reference of which entries and sections are used to compile the page. Using a delegate call for the backend, and an Event filter for the frontend, the cache of specific pages can be expired when an entry is updated. This should be released in a few days.

I suppose the same method would work for fragment caching.

But we abandoned the idea because at every turn there would be a new use-case such that our implementation would be flawed. So we roll our own solution as and when we need it on client sites. I was talking to Alistair about this only today and he came to the same conclusion…

But don’t let it deter you!

In fact, this could be made a little easier by adding two delegates to the Data Source execution procedure: one before a DS runs (to check the cache) and another after it has run (to get the XML result). This only solves the problem of not having to modify the core, but not the numerous other problems we encountered with randomising/dependencies/refreshing etc.

Good luck :-)

- nickdunn
- 25 Nov 09, 2:05 am
- Comment #4

I also created a tool for managing the cache. It listed the Data Sources that were cached, and how many instances of each (e.g. a DS with pagination that spreads over three pages would create three instances) and the ability to manually purge the cache.

Cache Purger

I want to re-write and release this extension as a way to manage the Symphony cache in general:

cached Dynamic XML Data Sources
cached images from the JIT extension
cached pages from the CacheLite extension
[insert your fragment caching extension here too…]

- creativedutchmen
- 25 Nov 09, 2:28 am
- Comment #5

Nick, you are on fire!

I have already made a little mind-map on the subject, and this is what I came up with:

In the cases I met before, there were three different types of caching involved:

Not at all; A site-search, for instance, would not need any caching.
Cached, but not quite; All the examples you mentioned; sorting, linked sources, pagination, filters would fall under this category. In the projects I have done in the past, this was mainly solved by giving the datasource filter capabilities on a cached data-object.
Cached all the way. Previous output from a ds is stored as an xml file, and can be used over and over again.

Ofcouse, 2 is the one where the trouble begins..

What kind of problems do you see with this approach? (except from the core-modifications it would take?)

- creativedutchmen
- 25 Nov 09, 2:30 am
- Comment #6

Oh, and I generally dislike the idea of a time-controlled cache for the usual reasons..

- creativedutchmen
- 25 Nov 09, 2:54 am
- Comment #7

In response to my own posts:

This only works if the getting of the data is the most time-consuming.

It assumes the datasets are small enough for php to handle without any problems. (If the database contains hundreds of raw images the allocated memory to php would soon be used-up..)

Also, the steps above will likely speed up the datasource, but there is a possibility they are slower! (Mysql was built for data + logic, php was not).

- nickdunn
- 25 Nov 09, 3:34 am
- Comment #8

It assumes the datasets are small enough for php to handle without any problems. (If the database contains hundreds of raw images the allocated memory to php would soon be used-up..)

One of Symphony’s hogs is the use of its XMLElement class instead of a native DomDocument object to build its XML. This increases the memory footprint somewhat. I’d be concerned that storing these in memory might rapidly eat up resource?

I’ve done some work for the forthcoming 2.1 release with query optimisation, keeping the number of database queries to a minimum. I found I had some Data Sources with huge query counts (several thousand) but after the query optimisations this was reduced to a less than one hundred.

Reading from static files had extremely good performance. Reading from the file system was very quick indeed, and Data Sources that previously took a 2–5 seconds (when the server was receiving a spike) would be instant from static XML.

Are there specific examples where you’ve found Data Source bottlenecks and found caching to be a necessity?

- ashooner
- 25 Nov 09, 5:48 am
- Comment #9

One of Symphony’s hogs is the use of its XMLElement class instead of a native DomDocument object to build its XML.

I always thought that was a little funky, but I didn’t want to say anything. Does 2.1 still use it?

- creativedutchmen
- 25 Nov 09, 9:27 am
- Comment #10

Are there specific examples where you’ve found Data Source bottlenecks and found caching to be a necessity?

Not really, but for an upcoming project I am afraid I might.. The website is estimated to peek at 50.000 hits per minute, and the hosting budget isn’t really huge.

To continue the discussion on caching: what do you think is more usable: fragment caching, or datasource caching? (Not really sure how to implement fragment caching efficiently, but anyway)

- nickdunn
- 25 Nov 09, 9:37 am
- Comment #11

I consider DS caching and fragment caching to be the same thing — caching a fragment of a page. Where possible I would favour page output caching over fragment caching, since it gives the biggest performance gain. You don’t need to configure the cache frequency of each DS (fragment), since the entire page is handled as one and no DSs run at all. With fragment caching you may choose to cache only certain Data Sources, leaving others to run (which may suffer under load).

I used to run a CSS Gallery aggregation site (Classic ASP and Microsoft Access database!) to which I added page output caching and it survived Slashdot and Digg running on a cheap shared Windows host.

I favour page output caching because it’s easier to debug. I can kill the cache for an entire page and instantly see the result, without needing to debug which DS my content is coming from.

However we needed fragment caching in the past because the sites have been dynamic in that they allow user logins. At the top of each page the user’s name is shown “Hi Nick | Log out”, which meant that page output caching would not work. Fragment caching of the biggest Data Sources was the only option. Using ?profile I could figure out which Data Sources were using the most queries and taking the longest to execute.

Have you considered Apache-level caching such as memcached?

- nickdunn
- 25 Nov 09, 9:39 am
- Comment #12

(XMLElement) I always thought that was a little funky, but I didn’t want to say anything. Does 2.1 still use it?

Symphony 2.1 will still use XMLElement because of the complexities of changing it.

- creativedutchmen
- 25 Nov 09, 11:50 pm
- Comment #13

I consider DS caching and fragment caching to be the same thing — caching a fragment of a page.

Hmmm. I would say fragment caching caches a part of the output (so (x)html, or whatever format you output) while DS caching caches the XML used to create that output.

The biggest benefit of fragment caching vs DS caching is that it could be even faster (it skips the XSLT part). The biggest downside of fragment caching is the pain to implement it. In every single template file, parts have to be identified as cache-able, or un-cacheable. (The step from no caching to fragment caching is a big one)

Since with most of the websites I build, the loading time issue arises after I finish the website, my preferred method would be to cache DS’es.

For something completely different: Are the datasource classes going to be rewritten in the new version in any way? (I really dislike the include in the middle of the ds..) If you don’t: would you mind reading my proposal on the rewrite?

- nickdunn
- 26 Nov 09, 5:44 pm
- Comment #14

Ta for the clarification on fragment vs. DS caching. I had used the terms interchangeably but it’s a sensible distinction.

Fragment caching (of HTML blocks) would be incredibly difficult to achieve! Not sure I’d ever want to get into that one ;-)

But DS caching is still an attainable goal I think. I’ve been thinking some more, and it’s probably doable without modifying the core at all. You’d have to modify the Data Source itself (thereby rendering it un-editable by the Data Source Editor) but that’s not the end of the world. I’ll see if I can get a basic proof of concept working as there are a few ideas swimming around my head.

For something completely different: Are the datasource classes going to be rewritten in the new version in any way? (I really dislike the include in the middle of the ds..) If you don’t: would you mind reading my proposal on the rewrite?

I believe not. With Symphony 2 (and the forthcoming 2.1) most changes need to remain backwards compatible, so no drastic changes to the innards can be made. I would imagine Symphony 3 would be complete core rewrite so the concept of DSs might change entirely. But I think in S2 this might be easier said than done.

I recall your post but couldn’t find the thread again. Could you link to it here?

- creativedutchmen
- 26 Nov 09, 7:17 pm
- Comment #15

I recall your post but couldn’t find the thread again. Could you link to it here?

I made a few comments scattered around the forum, but none of them really explained my solution.

Every ds at this moment extends a parent class (Datasource). However, the grab function itself is not extended in a OOP way (the actual code is included from a file).

What I would suggest is putting that code in the class.datasource.php file, then extend that, and call the function using parent::grab($param_pool);

Because there are multiple possible parents (authors, sections etc), a datasource should possible extend a more dedicated datasource (authors, etc) which extends the core datasource.

If all of this doesn’t make any sense, I’ll fork symphony and give an example.

- nickdunn
- 26 Nov 09, 7:35 pm
- Comment #16

Yep, all makes perfect sense. I’ve found this to be pretty messy too. I imagine it’s a legacy thing that the guys simply never had the chance to refactor.

Back on the topic of DS caching, I’ve implemented a proof of concept by making cached DSs extend a new CacheableDatasource class.

http://gist.github.com/243364

Then to make a DS cacheable, edit the specific DS PHP file in the following ways:

require the CacheableDatasource file, e.g require_once(WORKSPACE . '/class.cacheabledatasource.php'); (depending on where you saved the class)
make your DS extend CacheableDatasource instead of Datasource
add a dsParamCACHE property with the timeout in minutes e.g. public $dsParamCACHE = '10';
remove the grab() function from your DS
set the allowEditorToParse() function to return false (so you can no longer edit the DS through the Data Source Editor, thereby overwriting your customisation)

Credit to yourheropaul for most of the code which I’ve refactored and simplified from the Pastie a few posts above.

There’s no functionality for purging the cache when entries are updated, but I’ve written that logic for the updated CacheLite extension so it could be lifted. Essentially you’ll need a database table to store a row for each cached DS storing the filename (hash of the DS instance), its section ID and the entry IDs it returns.

I believe Symphony 2.1 will add multiple output parameters from a single DS (so you can choose several fields to output, not just a single field) so my proof of concept will likely break. When 2.1 is out perhaps we can extend this further for a more bulletproof implementation?

- creativedutchmen
- 26 Nov 09, 7:46 pm
- Comment #17

Yep, all makes perfect sense. I’ve found this to be pretty messy too. I imagine it’s a legacy thing that the guys simply never had the chance to refactor.

Yeah, I guess so.. If I were to change that bit, how would I submit my changes? Should I fork symphony, then make the changes, push my changes and submit a pull request, or is there a better way?

When 2.1 is out perhaps we can extend this further for a more bulletproof implementation?

Sounds brilliant!

I would really have to dive into your code to see where this could possibly go wrong, but it looks promising!

- nickdunn
- 26 Nov 09, 8:41 pm
- Comment #18

Yeah, I guess so.. If I were to change that bit, how would I submit my changes? Should I fork symphony, then make the changes, push my changes and submit a pull request, or is there a better way?

Yep, through Github is the best way. But I would recommend holding off until Symphony 2.1 is released, since Alistair has done work on Output Parameters I believe; so it’d be best to get 2.1 out and then tweak things. There’s a danger of cramming too much into a single release.

but it looks promising!

Please do take a look and give it a try. We’ve used versions of the above code on some large sites and it’s worked very well indeed.

- creativedutchmen
- 26 Nov 09, 8:47 pm
- Comment #19

But I would recommend holding off until Symphony 2.1 is released

Darn, I just finished it.. (mostly)

Please do take a look and give it a try. We’ve used versions of the above code on some large sites and it’s worked very well indeed.

Will do! I’ll keep you posted on my findings!

- creativedutchmen
- 26 Nov 09, 9:31 pm
- Comment #20

Most of the changes to the DS I made were just pasting the class headers to the file, and replacing the function calls with $this function calls.

The only hard one is the static XML, as it basically pastes the xml in the function call.

As the changes I made are relatively small (and easy to do as well), I wouldn’t mind doing them again for v2.1

Symphony.

Datasource caching

Search

Server Requirements

Symphony.

Datasource caching

Search

You are looking at page 1 of 2

Server Requirements

Sign in