In case you didn't know already, sitemaps are very important to SEO strategy. Keeping your site updated and telling the search engines about those changes are essential. Good news is generating sitemaps is a breeze. There are thousands of free tools out there - just google sitemap generator. Open Sitemap generator is one of those tools and as the name suggest is open-source. This can be very challenging as your website grows and you start to get hundreds if not thousands of links. In my case, I run http://www.put3.com which is a classified ads website. It is starting to grow and the links are starting to pile up. I don't want to run a tool manually so I can update google about the changes.
Luckily for me, automation is built-in to open-classifieds! Open-classifieds uses a script from smart-consulting which is also open-source. This script allows open-classifieds to generate sitemaps and then submit it to google! Sweet!! However, nothing really comes easy. Sitemaps have to follow strict rules and if you don't, your sitemap becomes invalid and your google webmaster stats show that you have an error and no URLs being indexed which is probably not true but do you really want to take the chance? The error says to validate the sitemap first prior to re-submitting. Great - easy enough but how the heck do you validate? So I googled it an found numerous sites about validating your sitemap but I didn't find any site that would help me identify where the problems are and I know there was more than one. The good ones are not free so fuck them. I have to figure this out some other way.
After some research into this, sitemaps have to be UTF-8 compliant. I have no idea what that means. As far as I was concerned at the time, it stood for Ur Totally Fucked! - 8 times!! After analyzing the error some more, I kinda figured that the character " ; " (dot-comma - I don't know what its really called) is frowned upon by this UTF shit. The problem is really the links on the website are from the titles of the posting from the users. When they put a dash, the script converts it to a URL and converts the dash to a "ndash;" hence sitemap goes down! It's easy enough to edit the sitemap and search and replace those ";" using a text editor but the problem with that is that the next sitemap generated will have errors again. A more permanent fix is needed. Also, how do you know if that is the only thing causing the error? After reading more stuff about it, I found out that Internet Explorer is great at this. I opened my sitemap using IE and sure enough, it pointed to the errors. I was then able to identify ALL the errors and make the corrections in the database. Once those corrections were made, the script was able to create the UTF compliant XML links and IE saved the day!
There you go. If you are looking for a FREE sitemap validator, look no further than your desktop/laptop! - that is if you are using Windows of course. If you are on a MAC, well then UTF-8... ;-)