Search

Hi Guys;

These last few weeks been working on a datasource caching extension; and after i was relatively happy with it I tried it on one of my client's production servers; as it was providing a very good performance since the website is relatively large and complex.

The extension is similar to nick dunn's cachable extension however instead of saving to file it saves to database due to some requirements on our side. I had opted to use sym_cache to insert my data as this was suggested by nick also as the table is checked and optimized by symphony itself.

However I ran into a bit of a problem; One of the datasources (external) is quite big and we had set cache to indefinite period to gain as much as possible this had led our cache to go near to 1GB in size compromised of around 80,000 entries. Earlier this week Symphony decided to optimize this table; locking it for quite some time after 15min roughly we got to restart the db and empty the table.

If I had to store this into my own cache table would I still risk having my table optimized? (ps the cache had been running fine for around 2weeks and was about to package it for public use...) but I think I should resolve this possible issue before going public :)

Also how often does optimize run on sym_cache? as I'd like to know if something happened to my db that made it break or it was indeed this optimize that just locked my db and everyone started to get an out of connections error.

Also how often does optimize run on sym_cache? as I'd like to know if something happened to my db that made it break or it was indeed this optimize that just locked my db and everyone started to get an out of connections error.

The clean() function (which calls optimise()) is called when the Cacheable->check($hash) function fails to find a valid cache object. So really depends on how long your Cache objects are set before expiry, and how often you are calling check().

It is possible that the OPTIMIZE caused the lock and this is partly due to the fact that sym_cache is a MyISAM table, which will lock the entire table to perform operations. In your case, 80,000 rows is going to be a bit of a headache ;). You could try (and this perhaps could be considered a Symphony bug) swapping the table type to InnoDB, which supports row level locking. The table size will be slightly larger, but hopefully it will prevent the issue from occurring again (you can test just by running OPTIMIZE TABLEsym_cache; in your favourite SQL IDE).

If you went with your own table route, depending on what values you are actually storing, switching to MEMORY tables might result in even better performance. A MEMORY table loses all data when the server is rebooted, but as the data is stored in RAM it has pretty quick access times. It doesn't support TEXT columns, which is why you can't swap over sym_cache to use it (and why I mentioned 'it depends what you are storing').

This probably isn't the place, but I've done lots of testing and playing around with making Symphony use InnoDB tables (mainly for FK support) and it's something I can see the core tables moving to in the (near?) future.

Hope that helps and please let me know how you go!

Looking through the Cacheable class the table is optimised every time there is a cache "miss" i.e. when a request for a cached object finds stale data.

You're right that optimising a 1GB table is going to be a big process. If 80,000 rows amount to 1GB, that's about 10MB per row. Do you need to store this amount of data, or can you process what you're caching first and only store exactly what you need?

If I had to store this into my own cache table would I still risk having my table optimized

No, not if you weren't using the Cacheable class mentioned above.

This probably isn't the place, but I've done lots of testing and playing around with making Symphony use InnoDB tables (mainly for FK support) and it's something I can see the core tables moving to in the (near?) future.

Would be great to have InnoDB support!

@nick 1GB is really a lot of data - however that was the size of the data if I calculated properly. Reduced the amount of data I was caching in relation to archives as apparently some of the data we were getting from feed was so inefficient that some pages were crashing due to data being to big to store. So now the size should now be smaller then that. Also working on optimizing this data; and think that the XML output will be reduced by around half.

However I am estimating that this external data can produce over 120k entries albeit of a smaller size talking about 20-40k each estimate. plus then an additional 200 entries/day so cache size will indeed continue to grow over time.

I am setting my cache as indefinite so I am not probably calling the clean function since the data is always there. I also took the precaution of writing directly to database to skip the clean for now; instead of using the cacheable class. However I assume some other things still use the cacheable so optimize will eventually run and lock my tables again.

@brendo I am not too sure about doing an alter table to myISAM on installation of the extension if it had to go public don't know if everyone would be happy with being forced to change tables. But it could be a feasible option. If I had to do store this data in a separate table (would make more sense for flushing purposes, so I can flush by datasource type etc.) I would imagine compression and decompression would still be suggested in the way the cacheable class does using gzip

memory tables would be cool actually but I guess I would have to convince the others to move this website into a dedicated server which will not be that easy.. However I think if I do an optimized table for this data which is external and have it parsed using a cron it should be as fast.

Most probably will go into myISAM for a quick solution then as I have some holidays ahead and don't feel like having them interrupted to fix the issue while I am away :) Thanks for the feedback and will post results.

Just another thing - for the extension itself if made public would it make sense to just store everything or you would prefer to go for something like store in cache the most used data? Have some fancy computation calculating which data is used most etc and removing the other entries from the cache/db. (eg keeping max size not more then 5-10,000 entries)

It sounds like what you're after is more of a post processed store of data rather a cache. A cache normally means that it's ok if the cache is cleared, it will just be regenerated (hence why MEMORY tables are useful in a caching situation because it doesn't matter if the server restarts, the cache data will just be refilled).

I am setting my cache as indefinite so I am not probably calling the clean function since the data is always there. I also took the precaution of writing directly to database to skip the clean for now; instead of using the cacheable class. However I assume some other things still use the cacheable so optimize will eventually run and lock my tables again.

Nah, from a project search, the Cacheable class is the only thing that will fire the optimize function.

Have some fancy computation calculating which data is used most etc and removing the other entries from the cache/db. (eg keeping max size not more then 5-10,000 entries)

It's interesting you say that, because if the data had expiry times, this would happen automatically over time. As data expires, it will trigger a miss, causing the clean() & optimise() functions to run. The clean() function goes through the database and removes all the data that has also expired, so this only keep the data that is frequently called in the cache. Of course it also means that you are generating the data more often.

I am not too sure about doing an alter table to myISAM on installation of the extension if it had to go public don't know if everyone would be happy with being forced to change tables.

True, which is why I said it's potentially a Symphony bug as we may of used a less optimal table type. InnoDB doesn't seem to be the table of choice when the majority of operations are read though, although apparently newer versions of MySQL close the performance gap. The reason I suggested it was that it'd be interesting to see if the table change prevented the locking issue, then we could look at the performance/storage implications.

This whole discussion is interesting, especially with the large amount of data you are working with!

It's interesting you say that, because if the data had expiry times, this would happen automatically over time. As data expires, it will trigger a miss, causing the clean() & optimise() functions to run. The clean() function goes through the database and removes all the data that has also expired, so this only keep the data that is frequently called in the cache. Of course it also means that you are generating the data more often.

Thing is that eg 1day expiry does not really make sense because having over 120k entries of data clearing up on a daily basis does not make sense + to get the data from the feed takes an estimated 0.7s (which makes the site a bit slower to be honest). Maybe if I can keep a table that calculates/estimates requests/day I would be able to keep a more optimized cache and requests which happen less then 5 times a day eg are never cached or eventually removed. Whilst other stuff is cached so it would still appear as pretty fast unless you access one of the items which is less often used. (ps this data does not change so that is why its stored forever so to say)

It sounds like what you're after is more of a post processed store of data rather >a cache. A cache normally means that it's ok if the cache is cleared, it will just >be regenerated (hence why MEMORY tables are useful in a caching situation because >it doesn't matter if the server restarts, the cache data will just be >refilled)

I am in-fact working on importing this external data into a custom symphony table using a cron and some custom datasources but its not an overnight process(testing etc) and used the cache as an initial solution which obviously needs more work. (ps we needed cache for some other large symphony datasources) had no clue that this external datasource would grow so big... especially as its in 3 languages at this point so everything expands a bit bigger then possibly intended. I also have no problems if the cache is cleared which is why i created a cache :) thing is I was asked to obtain maximum juice/performance out of it so I probably tried to take too much out of it by pushing everything into the same cache.

The new datasource reduces I am working on would reduce the xml output by about half of the current but I am interested on how you guys would cache such massive amounts of data. Its my first really big job using symphony and been charged of website speedup so been doing quite a lot of optimizations and this is one of them. hopefully next week will put up the code of this extension up on symphony for scrutiny. From the feedback I think it would be best to go for a separate table then sym cache and maybe have an option in back-end to run optimize if required.generating the data more often.

Maybe if I can keep a table that calculates/estimates requests/day I would be able to keep a more optimized cache and requests which happen less then 5 times a day eg are never cached or eventually removed

Perhaps this logic could figure out 'hot' cache requests and increase their expiry times. All data would start with a 24 hour expiry and then the more it's requested/your algorithm it's expiry time is increased exponentially?

When you say 120k entries, is that a single datasource that displays that to the screen? Or just through the course navigating the site up to 120k entries are created in the cache table.

The new datasource reduces I am working on would reduce the xml output by about half of the current but I am interested on how you guys would cache such massive amounts of data.

How does it reduce the XML output by half?

For that amount of data I'd be looking into APC/Memcache/Redis arenas, especially if instant performance is the goal. If you absolutely have to use the database, I would bypass the Symphony abstraction and talk directly via mysql/mysqli functions which I'd anticipate would have better performance and use less memory to generate pages.

I'm looking forward to seeing your extension code :)

hmm the idea for incremental cache time is not too bad at all; I would have to find a way to incorporate it with other functionality. Currently I have set options for either limited-time cache; unlimited cache or time-based expiry. So adding an option for optimal performance should be possible.

For the hot feature I can possibly do something on the lines of a total requests / weekly / monthly requests which will probably align the data in some decent order.

The only issue I would have with that model - considering my data is this. My data is generated on a daily basis by an external source. Which has some bulky xml with some empty nodes and some translation nodes that I do not really need. (so there is where I can gain around 50% if I transform my data) The data can be obtained in 2 ways mainly. 1st if I had togo mysite.com/cooldata/ or else by going to mysite.com/cooldata/{$today}/ so basically I know that people are most likely to visit today's data far more often then that of yesterday, or last week.

The 120k entries are basically all the different entries from archives most of which will not be directly accessed by any person unless they want information on say financial events of that particular day. However a bot which is indexing/looking for data is most-likely to scrape these more often. Driving up the size of the cache with anything else; the issue with the cache wanted on most pages was that we were getting a really bad speed rating; we were one of the worst in our sector. So most likely losing clients; after we used this cache we jumped to being one of the fastest thanks to Symphony's simplicity.

Today will try put some work in wrapping up the extension; using my own table for caching and will see what you guys say. Quite sure there will be plenty to improve and bugs to fix :)

sorry for a very late update - spent a week away from my machine. Anyway found some time to put this thing on github. Its a fork of cacheable datasource in fact there is plenty of stuff (tags) from nick's original code. I need to have some time to properly update docs

  1. Cache into new custom db table
  2. Automatic flushing on section update (optional param setting)
  3. Ability to set expiry time - good for datasources that expire daily eg archives
  4. Total control of cache and purging
  5. Calculation of size in db and uncompressed size (discovered compression really saves up a lot

https://github.com/jonmifsud/cacheabledbdatasource

any feedback maybe before I put this extension as public on symphony would be greatly appriciated

In the README I see that the extension saves the cache to files, not to the database. So what exactly is the difference bteween your extension and Nick's? Can you put it in simple words?

I suggest a different name if you're going to release this to the public, as to avoid any confusion.

@michael-e sorry mate didn't have time to properly go through the documentation. just edited some bits from nick's. It does store into database instead of file; creates a table named sym_cacheabledbdatasource at the moment.

@nickdunn renaming makes sense any suggestions on a proper name?

Advanced cacheable datasource :)

sounds cool :)

there are some advanced features in there that require some knowledge... Will go ahead with that if there aren't any objections :)

I'd also guess it would make sense to delete all tags related to cacheabledatasource to avoid confusion? (sorry guys first time I am forking a repo on github)

Perhaps you should create a brand new repository called advanced_cacheable_datasource and follow the steps from there. After all, it's a new extension. It doesn't matter you started from someone's else work.

Maybe we can find a better name. "Advanced" doesn't tell anything about the inner workings.

DB Datasource Cache extension? Cacheable Datasource to DB extension?

Let's not call things "Advanced" or "Plus", can we think of something more meaningful that describes the application? If the extension caches to a database then why not Cache Datasource to Database or something.

I might be planning on adding features to my own Cacheable Datasource extension which are considered "advanced" ;-)

Touché!

:-)

Create an account or sign in to comment.

Symphony • Open Source XSLT CMS

Server Requirements

  • PHP 5.3-5.6 or 7.0-7.3
  • PHP's LibXML module, with the XSLT extension enabled (--with-xsl)
  • MySQL 5.5 or above
  • An Apache or Litespeed webserver
  • Apache's mod_rewrite module or equivalent

Compatible Hosts

Sign in

Login details