Not sure if this belongs here or in the Sitemap Protocol section.
What do you guys think of adding a new directive to the sitemap?
Something like <index_exclusive>url</index_exclusive>.
Including this in the sitemap would direct the bot to only index the
pages listed and ignore any other internal links found on the pages
listed.
It would have to be understood by the user that including this
directive would mean they are taking responsibility for telling the
bot explicitly which URL’s to index and that anything left out would
not be indexed.
I came up with this ( “bright idea” ) while trying to use the nofollow
to eliminate the duplication WMT “errors” in a PHPBB forum, but this
would apply to most any forum or blog.
So rather than peppering the nofollow all over the place, one entry in
the sitemap would direct the bot to ignore all of the different ways
provided to the user to jump to content and eliminate the duplicate
Title and Meta Tag “errors”.
> Not sure if this belongs here or in the Sitemap Protocol section.
> What do you guys think of adding a new directive to the sitemap?
> Something like <index_exclusive>url</index_exclusive>.
> Including this in the sitemap would direct the bot to only index the
> pages listed and ignore any other internal links found on the pages
> listed.
> It would have to be understood by the user that including this
> directive would mean they are taking responsibility for telling the
> bot explicitly which URL’s to index and that anything left out would
> not be indexed.
> I came up with this ( “bright idea” ) while trying to use the nofollow
> to eliminate the duplication WMT “errors” in a PHPBB forum, but this
> would apply to most any forum or blog.
> So rather than peppering the nofollow all over the place, one entry in
> the sitemap would direct the bot to ignore all of the different ways
> provided to the user to jump to content and eliminate the duplicate
> Title and Meta Tag “errors”.
That's something we've considered before and pretty much dropped. The
big problem is that it's just too easy to break things completely with
something like that. If you forget URLs in your Sitemap file or if you
forget to update your Sitemap file or even if you forget that you have
a Sitemap file, you could accidentally limit your site's indexing
without knowing it. I do however agree that the problem you mentioned
(duplicate content through URL parameters) is an important one - and
it's one that we (and all other search engines) are always working on
improving.
Thank you. I had figured this wasn’t the first time this came up; the
idea was hatched from thinking “there’s gotta be a better way”.
It would be complicated to implement and fraught with peril for the
uninformed user, I agree.
Don’t suppose I could persuade you guys to take a look at what I did
with my forum? Basically there are some global restrictions in the
robots and then using nofollow, limiting the “path” for the bot to the
sitemap urls generated with ( a modified ) GSitecrawler’s phpbb sample
project.
It seems to be working for the bot, but I have been there before.
> That's something we've considered before and pretty much dropped. The
> big problem is that it's just too easy to break things completely with
> something like that. If you forget URLs in your Sitemap file or if you
> forget to update your Sitemap file or even if you forget that you have
> a Sitemap file, you could accidentally limit your site's indexing
> without knowing it. I do however agree that the problem you mentioned
> (duplicate content through URL parameters) is an important one - and
> it's one that we (and all other search engines) are always working on
> improving.