Search

STATUS UPDATE

Owing to time and project contraints, this extension has been discontinued. I do not have any active projects with search requirements, and do not have the capacity to support such a complex extension these days.

I have been playing with ElasticSearch which is far superior to anything I could ever write myself. If you want a good search engine, I strongly advise using ElasticSearch, Solr or Sphinx. I may find time to release a basic integration of ElasticSearch and Symphony, for those who are confident enough to install and maintain ElasticSearch themselves, but this won't be until I have the need for the extension myself.

Hmm, this is quite a shame. We've had a lot of problems with google search lately (old - moved - results, new data takes ages to be added to the index, etc). And, after the amazing talk you gave at the Symposium I promised my client this would all be over when their site was rebuilt - using the search index.

My search requirements are quite complex (ranking based on type, splitting results into media and articles, for example), but these are things that were quite possible using your extension.

Now, for my final question: what do you recommend me to do? Should I use the Search Index, fix as many bugs as I can find, and never update Symphony? Or should I go for the more advanced route of ElastiSearch or Lucene (both of which I have no experience with).

Any advice is welcome!

Same here. I have been deeply impressed by Search Index, and I've had big plans...

I'm so disappointed that the project is being discontinued, but I understand there's only so much time, especially to reinvent wheels.

Ah, that news is really depressing!

But the extension seemed to work somehow when you showed it in Cologne – so which parts are missing in the currently available codebase?

Nick, I understand your comments, and want to thank you for a great extension which helped me out greatly on my last Symphony project.

If you want to hand this over to the Symphonists, maybe it can be continued? It's totally up to you though.

I'll email you about it.

I've been amazed by the talk you gave back in october and implemented search in the next project straight away. It seems to be working good and I'm totally satisfied with it. Most of symphony extension I use haven't been updated for quite a while (except for api changes) but their value hasn't changed over time.

Search index provides a lot of features already. I feel like most users would be pretty satisfied (myself included) with it and wouldn't need all the stuff that the other software mentioned do.

So I wouldn't take your announcement with disappointment, what you gave us is far more than an incomplete and feature-less extension. I understand what the effort would be to fill the gap with the big names and I'm perfectly fine with your decision as most of us don't need that.

Hopefully it would be taken over by the symphonists for future updates, but I still see this extension as a must have, even if it won't be developed anymore.

What would be the simplest way for a newcomer to implement search with 2.2.5 (assuming that this extension won't work).

I don't have a specific purpose in mind, but I'd like to have an idea of what my options are if the need were to arise.

Thanks, D

edit: commented in the wrong place. I've got it all to do, today

@nickdunn, this is a fabulous extension. I totally understand that you don't have time to keep this one up. I agree with @designermonkey, if it's possible, would this be a candidate to move over to the Symphonists?

I think I tried to make Search Index v2 far too complex. I have a project coming up that needs search so I am forging ahead with the ElasticSearch extension. Lucene is genuinely the best text analysis tool there is, so it really doesn't make sense to reinvent something that's already well established.

However, once that is complete and released I will look at simplifying Search Index and passing maintenance over to Symphonists so it can continue. I may end up removing some of the more advanced features such as stop words and word stemming, since this has caused many, many headaches. If you want a complex search engine, use something like Solr or ElasticSearch. I don't think it makes sense to emulate these advanced features in a sub-standard way.

As you can imagine, building a search engine for one site is very difficult. Building a generic search engine that people can switch on and instantly use for any site is even more difficult!

I've been using the Search Index v2. It's worked great for me. Actually, I found on my end, it was easier to manage and I actually got it to work. For some odd reason, I never could get v1 to work properly. With said, thank you so very much for all the work you have done for this community. Most of your extensions listed above have helped some many and frankly enhanced what we've been able to do. Thanks for sharing your skills and be so willing to help users in this forum. You're awesome!

Thanks @nickdunn!


Also, in regards to ElasticSearch. Could users not on Apache use that? What about shared hosting users? Just curious. This is the first time I've heard about ElasticSearch. It looks quite interesting.

@nickdunn, why not all in with ElasticSearch? Is there a need for both solutions?

Uggh, I guess one of the main negatives of Elastic Search is...

ElasticSearch is built using Java, and requires Java 6 in order to run. The version of Java that will be used can be set by setting the JAVA_HOME environment variable.

http://www.elasticsearch.org/guide/reference/setup/installation.html

Uggh, I guess one of the main negatives of Elastic Search is... [ElasticSearch is built using Java, and requires Java 6 in order to run]

Why? It runs fine on Mac OS X and an Ubuntu web server where it installs in just a few minutes. I don't see it as a negative just because it doesn't use a language I know. It doesn't really matter what language it uses (PHP, Java, Ruby, Python) so long as it works on the operating systems we all use.

Could users not on Apache use that?

Sure. Don't be confused by the fact that ElasticSearch uses Apache Lucene under the hood. "Apache" is the foundation who also work on the Apache HTTP Server. They aren't linked. If you use nginx or any other web server, you can still use ElasticSearch.

Why not all in with ElasticSearch? Is there a need for both solutions

It depends on your restrictions. Using ElasticSearch requires that you have root access to your server to install it yourself. So if you're on shared hosting, then forget it. However my argument is that if you're building a site complex enough to require fulltext search, then the chances are you have moved past entry-level shared hosting and are using a VPS or dedicated box.

Because ElasticSearch runs as a RESTful service over HTTP, you could have a VPS set up just to run the search server and nothing else. Just configure the extension to point to the hostname of the server you want. In time there will be third party services offering hosted ES servers (just as they do with Solr now (WebSolr and SolrHQ), which means you can run your site on a crowded shared server and still use ES.

The other requirement is that you are technical enough to read the ES documentation and understand the concept of analysers (tokenisers and filters) which will let you do advanced things like stop words, synonyms, ASCII character collapsing, word stemming and so on. These features are not enabled by default and require you to configure them. My extension will allow you to configure these (by writing a JSON document) and it posts them to ES (which is how ES works). ES and the extension will work "out of the box" to get you up and running with a basic search, but if you want to tweak performance and results, you will need to configure ES yourself. (It is far more powerful than Search Index in this regard).

ElasticSearch is built for scale and speed. I'm running queries on thousands of documents and they return in a few milliseconds. The mailinglist has people running ES with many millions of documents. Rolling our own solution with MySQL is not as performant.

ElasticSearch has nice features such as the ability to search attached documents such as plain text, HTML, PDF, MS Word and more, plus it also natively supports geographic queries so you can perform range-based searches with lat/lon. Equally it supports facets, so that you can build faceted search UIs really easily, which are otherwise very difficult or impossible without custom code in Symphony.

The other nice thing is that you can query across sections with ease. That means a ES can be used as a super-fast Union Datasource, should you need one.

As I said, search means different things to different people, and Search Index has failed some regards (in my opinion) because it tries to offer to much, and therefore does many things poorly rather than few things well.

Nick, thank you very much for the detailed explanation. I am really looking forward to this extension!

Nick, thanks for the explanation!

@bzerangue

Be careful using this extension, there's a lot of xss attack in it.

The user agent can be spoofed and triggers the xss as the search keyword itself.

Be careful using this extension, there's a lot of xss attack in it.

I'm sure Nick would welcome a pull request to fix the issue.

He would. I just talked to him about it. I wouldn't know how to fix myself though.

Create an account or sign in to comment.

Symphony • Open Source XSLT CMS

Server Requirements

  • PHP 5.3-5.6 or 7.0-7.3
  • PHP's LibXML module, with the XSLT extension enabled (--with-xsl)
  • MySQL 5.5 or above
  • An Apache or Litespeed webserver
  • Apache's mod_rewrite module or equivalent

Compatible Hosts

Sign in

Login details