I'm looking at using GitHub Pages to host my blog and Jekyll to present it.
Presumably, whatever I commit will appear at <yourname>.github.io
through Jekyll and at https://github.com/<yourname>/<yourname>.github.io
in rawer form. See this page showing links to live sites and to the source repos used to construct them.
Advice on SEO suggests that duplicating content within and across domains is bad SEO practice. See this Google support page on duplication and this Moz page on issues with duplication both of which also offer possible solutions.
My question is two-fold:
- Is content duplication actually a problem for GitHub Pages in practice?
- If so, how does one apply solutions like canonical linking or
noindex
to the GitHub repo so that search engines know that your Jekyll site is the canonical one?
Update:
Might be worth noting that I uploaded a "hello world" index file to my GitHub Pages repo and then checked the source for the page on GitHub. The GitHub source already contains a canonical link
<link rel="canonical" href="https://github.com/guypursey/guypursey.github.io/blob/master/index.html" data-pjax-transient>
I assume it's this that would need changing for each file to point to the Jekyll version of the site but I can't see a setting in GitHub to handle it.