1
votes

This should be easy but I'm struggling to get this to work..

I'm using gatsby-source-wordpress to pull blog content into my gatsby site. The problem is that HTML entities are not decoded so instead of the '&' character, I see something like '&8466;'.

I want to implement a normalized function inside my gatsby-config.js file.

I found this function online that is supposed to remedy the HTML entities issue:

const decodeHTML = ({input}) => {
    let txt = document.createElement('textarea');
    txt.innerHTML = input;
    return txt.value;
}

I've tried importing it into my gatsby-config.js from a separate file; I've also placed the function directly in the gatsby-config.js file. Ideally, I'd like to import this function from a separate project file but that's not the main issue.

To get this function working, I've inlined it directly into my config file:

{
  resolve: `gatsby-source-wordpress`,
  options: {
    baseUrl: `peakwebsites.ca`,
    protocol: `https`,
    useACF: false,
    verboseOutput: false,
    hostingWPCOM: false,
    normalizer: function decodeHTML({entities}) {
      let txt = document.createElement('textarea');
      txt.innerHTML = entities;
      return txt.value;
    }
  }
},

But I'm running into this error:

success Downloading remote files - 7.158s - 205/205 28.64/s

 ERROR #11321  PLUGIN

"gatsby-source-wordpress" threw an error while running the sourceNodes lifecycle:

Cannot read property 'forEach' of undefined

  299 |       createNode,
  300 |       createContentDigest
> 301 |     }) => normalize.createNodesFromEntities({
      |                     ^
  302 |       entities,
  303 |       createNode,
  304 |       createContentDigest


 File: node_modules\gatsby-source-wordpress\gatsby-node.js:301:21

I'm not super familiar with the internals of how the Wordpress source plugin works so there might be some object property that I need to parse. I'm really not sure.

Does anyone have a solution for decoding HTML entities through a normalized function in gatsby-config ?

Thanks,

1

1 Answers

1
votes

I'm not familiar with gatsby-source-wordpress, but I feel like I have enough information to put you on the right path:

  1. You won't be able to access document from gatsby-config.js, it runs in Nodejs environment. Use a node library instead, a quick search turned up he.
const he = require('he')

const decode = input => he.decode(input)
  1. A quick look into gatsby-source-wordpress docs shows that entities is an array. I don't know what its shape looks like, you'd have to log it out to see what it is. Once you know how to access your HTML text, you can use the decode function above:
{
  resolve: 'gatsby-source-wordpress',
  options: {
    // ...other options
    normalizer: ({ entities }) => entities.map(entity => {
      /* access your raw html somehow, this is just a guess */
      if (entity.__type === 'wordpress_post_or_something') {
        entity.content = decode(entity.content)
      }
      return entity
    })
  } 
}


Alternatively, you can also add a new field to the nodes created by gatsby-source-wordpress via createNodeField in onCreateNode or createSchemaCustomization. The idea would be to get the html content from the node, decode it, then add it back to that node as a new field.

I'll be happy to remove/update this answer if someone more familiar with wordpress can help.