Search

Stephen Bau published an XSLT utility to convert HTML to Markdown text. While being a great source of of inspiration his utility is limited in several respects – which indeed may not be a big deal if you are using it for frontend article editing in Symphony exclusively.

However, I needed much more than that. I am working on a newsletter (mass mailer) system in Symphony which should send HTML emails including an alternative pure text version. The content of a newsletter being written in Markdown and/or HTML (like articles on the website), it turns out that the alternative text would best be generated directly from HTML output. I found that Stephen’s utility would not do this job if things get complicated (e.g. nested lists) or if you need otpimized email output. So I developed my own utility, learning that:

  • one central issue in parsing HTML is whitespace handling
  • configuration options are needed if you want to generate either code for frontend editing or email text (or something in between)

Due to its special whitespace handling, this utility should now work properly with:

  • nested lists, nested blockquotes, even nested lists in nested blockquotes, inline elements with or without surrounding whitespace…
  • …and gererally source code containing lots of (potentially senseless) whitespace.

Configuration is done using global parameters. All parameters are explained in the comments at the top of the file. There is generally some default behaviour for parameters being empty. Those defaults should be fine for the Symphony frontend editing issue. Setting more and more parameters to special values, you will get rather nice email text (maybe dropping some unparseable content – beware of this).

During development I used very complicated source code to test the XSL. However, it seems impossible to test for every possible situation. So if you find any mistakes or have ideas for improvement, please post your comments here.

It took several days to develop this utility with more than 500 lines of code. So I decided to publish it under the MIT licence. I hope that the community and expecially Stephen (who inspired it) do not mind finding a licence at all. If you do mind, I will think about it.

I appreciate any feedback.

Usage:

<xsl:apply-templates select="path/to/your/node" mode="markdown"/>

Version 1.1

Version 1.1 should eliminate problems connected to the “universal scope” of the ninja templates in v1.0. Now these templates have mode="html". So there shouldn’t be any problems as long as the modes markdown, html and break (the latter being used for breaking up HTML tables into separate lines) are not used in any other template. Please do not forget to update the utility’s configuration if you switch to the new version.

Grab it here.

A newsletter manager would be a great extension that can make symphony shine. Up to now I have been using pommo which is more symphony-like-flexible then phpmailinglist, but not quite there yet, and the development slowed down. Would you be using swiftmailer for the heavy lifting? I sure would like to keep updated on your progress.

At the time of writing, I am planning to use phpmailer for actually sending emails, which works with PHP > 5.0. Swiftmailer requires PHP 5.2.x, which could be a show stopper for some people. This would not be a problem once Symphony requirements actually change to 5.2.x as well... (I remember Alistair thinking loud about this.)

Do you think that Swiftmailer is the better choice? If you have any experience or information, I would be glad to hear this.

I also implemented Pommo for some clients, but find it very hard to send high quality email code. Once again the problem is WYSIWYG editors, plus a user interface completely different from the Symphony system (used on those websites)... If you are used to the code control you find in Symphony, it is hard to find your newsletter quality being, well, let's say "inconsistent".

I would love to build a newsletter system working with Symphony section content, putting code control (for pure text and HTML content) completely in Symphony pages. I think this is the best way to go in Symphony.

Michael; Swiftmailer is far better then phpmailer; give it a go, its very transparent and documented, so shouldn't take much time. its clean an clever OO , as symphony. (5.2; I would always code for future, not now..;-) I haven't used it yet but been lurking for years, since I totally agree with you regarding your remarks on pommo and need for symphony integrated newsletter management. Also pommo doesn't offer tracking and bounce management. Phpmailnglist is oldskool. The only other good alternatives are zendmail (zend framework) and djangomail using gmail smtp, but lets stick with symphony and pure php on this forum ;-)

Also there is a discussion here in how to pull indesign xml into symphony. Even in a more simple form, its interesting to give the client the option to fill out some fields, upload some pics, and have a monthly returning newsletter. Possibly even outputting it into pdf for printversion (parsers mentioned here too)

PS: pommo will pickup development, but hs been quiet fo ra long time

I spent some time looking at swiftmailer. I agree with you, it is indeed much better, and I already implemented it successfully.

Well, tracking and bounce management are interesting stuff, but first I will focus on actually sending content (using Symphony pages, see above -- this is the part that is already working pretty cool), double-opt-in procedures, newsletter and recipient management. And documentation, of course. There are some things to know if you want to do mass mailings successfully.

Thank you very much, newnomad. I will try and keep you updated.

Your welcome Michael, look forward to it. Indeed emailnewsletters is an art on its own; as for coding you need to forget everything you know about standards and code the 1990 way, tables and inline styles, as you know.

Sending as smtp via google is limited to a certain amount pe rmonth, but something worth exploring, also to limit spam flagging.

some resources: campaignmonitor free testing mailchimp more payed testing

Michael I need your advise in 'isolating' this utility, to make sure it doesn't interact with other templates. I need it to work on a node that is also used for displaying the text as (ninja modified) html on the same page. The problem is that your utility seems to also modify other nodes, beyond the path is given. I wonder wether the ninja template you start with also should have a mode, because i use a similar ninja template also. At least that is what I think. This is the error it produces;

XSLTProcessor::transformToXml(): runtime error: file       /opt/local/apache2/htdocs/nameofsite/workspace/utilities/html2markdowntext.xsl line 81 element element

and also it barks at all media namespaces, I indeed use those on th epage, but not in relation with the markdown display, therefore I think your extensions also works on stuff beyond the specified path.

XSLTProcessor::transformToXml(): xsl:element: The QName 'media:credit' has no namespace binding in scope in the stylesheet; this is an error, since the namespace was not specified by the instruction itself.

Basicly since I have a

<xsl:template match="text" >

In my templates I think either my or your templates get run first, but the second run doesnt have the original nodes anymore, but the modified ones. So I need to find a way of running the 2 templates seperately.

I tried adding a mode to all my templates with no success

Are you on irc sometimes?

OK I have added my selected node(text) to your ninja matches, and that solves it.

 <xsl:template match="text/*">

 <xsl:template match="text/@*">

Maybe it ould be usefull to add a param with rootname to the apply, and use it to specify those ninja matches?

Any other advise, do you recognize/undertand my problems, maybe you have another approach?

PS any progress on the mass mailer extension?

Hey, newnomad, I guess you are the first one to really use this XSL! Makes me happy.

Indeed I wondered if I should specifiy a (root) node in my XSL (to prevent the problems you have run into), but in my use case I did not need it, so I kept it like it was. Bad practice, maybe.

Your solution is the best I can think of at the moment. But I will think about a general solution for this. Indeed I love this "simply call the template and you are done" behaviour, which would be gone if you had to specify a root node. I am not sure if adding a mode for the basic ninja rules will work.

The mass mailer extension will take another three or four weeks, because I have a big (Symphony) project I am working on.

No, I'm not on IRC. My business does not allow spending too much time on other things. (I will have more time in summer.) But I visit the forum and the bug tracker at least once a day.

I used this utility on a recent project where users were editing Markdown text through front-end entry forms. Because the content in the DS to populate the textarea was HTML I used this to convert to Markdown.

I isolated the templates by adding a mode attribute to all of the match templates in the utility (i.e. all non-named templates). I use Allen's HTML manipulation technique for other applications as well, so needed to isolate the Markdown instance.

e.g. line 79 became:

<xsl:template match="*" mode="markdown">

My apply-templates to use the utility then become:

<textarea name="fields[profile]">
    <xsl:apply-templates select="critic/entry/profile" mode="markdown"/>
</textarea>

The ninja templates will only be applied when called with the specific mode :-)

Nick, I believe all the templates already have mode markdown, and the example apply too. So you are saying that adding mode markdown to those first 2 ninja templates in this utility did the trick?

Ah, I remembered incorrectly then. If the others already had a mode, I just added mode="markdown" to the two ninja templates as well. That will do the trick.

It means that other utilities can also use the ninja technique, but with their own modes.

Nope. Simply adding mode="markdown" to the ninja templates won't help. You won't be able to output for example "unparseable" HTML content (but only the "root" HTML element of your code block) if you set the parameter to be empty.

I will have to think about it a bit more. I am sure there must be a simple solution for this.

correct, the 2 solutions to your own templates interfering with this utility are;

  1. add a specific path (your node) to the utility 2 first match templates

  2. add mode="markdown" to these templates

Option 2 is universal and will be added to the utility I assume?

I've come up with a perfect solution, I think.

The ninja templates' intention is to care for HTML stuff like unparseable elements or image, anchor and table elements (depending on your configuration). This is why they now will have mode="html". All we need then is to call those ninja templates with the right mode, which at the same time simplifies the code and improves readability.

In version 1.0 we found things like:

<xsl:if test="$unparseables != 'strip'">
    <xsl:element name="{name()}">
        <xsl:apply-templates select="* | @* | text()"/>
    </xsl:element>
    <xsl:text>&#xA;&#xA;</xsl:text>
</xsl:if>

Which now can be written like this:

<xsl:if test="$unparseables != 'strip'">
    <xsl:apply-templates select="." mode="html"/>
    <xsl:text>&#xA;&#xA;</xsl:text>
</xsl:if>

I am very happy, since my tests did not show any parsing differences. The "scope" problems should be gone. Ufff.

(I really hope that mode="html" will not interfere with your templates! If it does, I could think about a different name for it.)

I have added version 1.1 to the first post (see above). Please make sure that you update the utility's configuration if you switch to the new version.

@ Michael-e any progress on the The mass mailer extension?

The answer is in this thread.

Short version: This may still take a while.

Create an account or sign in to comment.

Symphony • Open Source XSLT CMS

Server Requirements

  • PHP 5.3-5.6 or 7.0-7.3
  • PHP's LibXML module, with the XSLT extension enabled (--with-xsl)
  • MySQL 5.5 or above
  • An Apache or Litespeed webserver
  • Apache's mod_rewrite module or equivalent

Compatible Hosts

Sign in

Login details