Announcement

Symphony's issue tracker has been moved to Github.

Issues are displayed here for reference only and cannot be created or edited.

Browse

Closed#514: Real 404 response is missed

The problem here is that symphony always redirects the pages from “page” to “page/” with 301 response. It’s easy to test with telnet. Among others this also cause most bots get 301 response instead of 404 and all non existing pages will be indexed by search engines.

The root of problem is .htaccess file produced by symphony: RewriteRule ^(.*)$ $1/ [L,R=301] This rule automatically redirects all the pages with 301 response. Removing R=301 solves the problem. I’ve tested my site w/o R=301 and haven’t found any problem so far. However, i don’t know why such redirection is needed at all and what problems could be shadowed by it. But, anyway, symphony should produce correct response codes for error pages, not 301.

In my eyes 301 is the correct response code for this kind of redirects. The HTTP Status Code Definitions say:

301 Moved Permanently

The requested resource has been assigned a new permanent URI and any future references to this resource SHOULD use one of the returned URIs.

I have never seen search engines indexing these pages twice. Do you have any examples for this?

Unfortunately, yes. I’ve encountered this problem with google recently: some non-existing pages were still indexed (about a year old). It’s quite difficult to provide you with some concrete links. However, you can check the following: soft 404 errors on google, 301 vs 404, and of course google results.

Your link in comment #4 points to a Google page which says that:

  • In order to avoid duplicate content you should rewrite to canonical URLs.
  • 301 rewrites are the right choice.

So who do you trust? Google or some superstitious folks?

Hm, sorry, but per link in comment #4 the main point is “The 301 status code means that a page has permanently moved to a new location.” And it’s definitely not about 404.

Moreover, my first link in comment #3 is also from the Google and it’s clearly states that “Returning a code other than 404 or 410 for a non-existent page (or redirecting users to another page, such as the homepage, instead of returning a 404) can be problematic.” and “We recommend that you always return a 404 (Not found) or a 410 (Gone) response code in response to a request for a non-existing page.” So, i’ll definitely believe Google, and per Google response should be 404.

And, as i wrote previously, first i’ve encountered this problem with the Google too.

If you could show me any examples of content which was indexed twice (although using 301 rewrites) I might re-think my opinion on this.

At the moment I am sure that 301 response codes are the right way to deal with canonical URLs. Imagine a visitor typing http://example.com/news — would you like to serve him a 404 error just because he missed the trailing slash?

(Your example is about non-existing pages, not about canonical URLs. This is not the same thing.)

I vote for closing this issue.

I vote for closing this issue.

Me too.

I see some misunderstanding here. Sorry, if it wasn’t clear, but i’m talking not about double indexing problem here, but about pages that are still indexing by google even though they’re removed already. Such pages produce crawl errors accessible via Google Webmaster Tools. Per google help this caused by incorrect server response (301 instead of 404/410).

Also, i just quite not understand why symhony explicitly returns R=301 in that rewrite rule? I’ve checked other drupal and joomla - they are using the same redirections, but w/o R=301 and it doesn’t seem like a big problem there. Just now i’m running my site w/o R=301 and everything seems ok.

As i understand, for canonical URLs symphony has following in .htaccess:

RewriteRule ^(.*/?)$ index.php?symphony-page=$1&%{QUERY_STRING}    [L]

And it’s usually enough.

However, canonical URLs are not the topic of this bug. Bug was opened against missed 404 response, which cause google crawlers to produce errors.

System redirection from page to page/ with 301 response is the correct choice, because this is exactly what the concept of canonical URLs are for.

As i understand, for canonical URLs symphony has following in .htaccess: RewriteRule ^(.*/?)$ index.php?symphony-page=$1&%{QUERY_STRING} [L] And it’s usually enough.

You could argue that it’s is not the right thing to do but, “usually enough” is not an adequate argument; it can always be better. So far I am not convinced that providing trailing slash redirects for canonical URL purposes is the wrong thing to do from a behavioural perspective.

If on the other hand this functionality causes erroneous behaviour, duplicate content or anything that falls outside of what 301 redirect code should behave, then there are grounds to rectify this.

However, canonical URLs are not the topic of this bug. Bug was opened against missed 404 response, which cause google crawlers to produce errors.

This is a valid concern, but so far no concrete examples have been provided. dushakov, if you could provide some concrete examples, the Symphony Working Group will diligently investigate this issue and will reconsider its stance on the matter.

This issue is marked closed on the grounds that no sufficient evidence have been provided to support the claim that 301 redirect produced errors with search engines.

Comment removed: I didn’t read the first thread properly!

This issue is closed.

Symphony • Open Source XSLT CMS

Server Requirements

  • PHP 5.3-5.6 or 7.0-7.3
  • PHP's LibXML module, with the XSLT extension enabled (--with-xsl)
  • MySQL 5.5 or above
  • An Apache or Litespeed webserver
  • Apache's mod_rewrite module or equivalent

Compatible Hosts

Sign in

Login details