XML Sitemaps are read by search indexes to inform them of the pages to be crawled. It is a simple standard format. It is a good practice, particularly for large websites, to have an XML sitemap to help ensure that all the pages on the site are discovered by crawlers.
In the past I have relied on the Cultiv Search Engine Sitemap package to provide dynamically generated XML sitemaps for Umbraco sites. The original version was implemented in XLST and there is a Razor version that was released in 2011. While the Razor version apparently still works with a minor tweak for Umbraco 7, I decided to convert Cultiv’s Razor script into my own template for generating dynamic XML sitemaps. While it’s not packaged as an installable Umbraco package, the steps for setting this up are fairly simple.
Create a new Template
First, I create a new template named “XML Sitemap” with no parent layout. I create the template in the Umbraco back office and then add the .cshtml file to the project where I can edit it with Visual Studio. I’ve modified the original script so that it works as a View.
See code at https://gist.github.com/alindgren/1439022194a472d83ddf
Create a new Document Type
Next I create a new Document Type also named “XML Sitemap.” I deselect the option to create a matching template since I already created one. I select “XML Sitemap” as the only allowable and the default template. I also add a property called “Hide in XML Sitemap” (alias: hideInXmlSitemap) to the XML Sitemap document type and any other Document Types I might want to exclude from the sitemap. To make it polished, I select the Sitemap icon for the document type.
Create the Sitemap node
Before we create the content node we need to modify the Home Document Type so that it allows for the creation of child nodes of the type XML Sitemap. Then create the node under Home with the type of XML Sitemap and check the “Hide in XML Sitemap” property.
Update Robots.txt to specify XML Sitemap location
The sitemap should now be loadable at sitename.com/sitemap/. However, unless the sitemap is at /sitemap.xml, Google (and other search crawlers) won’t know where to find it. To make it discoverable, add a line like the following to robots.txt:
Note that according to the spec, this must be a full URL.