4
votes

I'm using the Liquid templating engine to display a summarised series of posts - at the moment I have something along these lines:

{% for page in site.posts %}
  {{page.content | truncatewords: 100}}
{% endfor %}

The page content contains HTML, and using truncatewords can cause invalid HTML to be inserted in the output. I don't want to remove all of the HTML from the content (embedded videos and images should be visible), and ideally all I want is for the appropriate closing tags to be added.

I can see that merely truncating isn't going to achieve my expected outcome, so my question is: How can I truncate my HTML in order to output valid markup using Liquid?

Update

A very specific problem is that I have a code sample that's marked-up using Pigments. Now, if the truncation occurs in the middle of the code sample, it leaves several tags open, messing up the rest of the page. I'm looking for a way to truncate these posts without removing all of the code sample - just to truncate and close all open tags in the content body.

1

1 Answers

6
votes

OK, so after not being able to find much in the way of doing this on the web, I cooked up my own solution utilising Nokogiri and a depth-first traversal of the parsed HTML node tree.

TruncateHTML is a simple script that allows a snippet of HTML to be truncated while preserving a valid structure.