Beginning on or about October 15, Googlebot appears to be adding
"index.html" to references to our site that only contain the domain
name (e.g., http://www.rlgsc.com").
This is clearly erroneous, as there are a variety of default names for
homepages. It is registered as a "Crawl error -- page not found". This
is clearly erroneous, as the URL as present in the referring www page
would have been correct, if the "index.html" had not been appended to
the URL. In fact, the home pages on the sites that we build is
typically "default.html".
> Beginning on or about October 15, Googlebot appears to be adding
> "index.html" to references to our site that only contain the domain
> name (e.g.,http://www.rlgsc.com").
> This is clearly erroneous, as there are a variety of default names for
> homepages. It is registered as a "Crawl error -- page not found". This
> is clearly erroneous, as the URL as present in the referring www page
> would have been correct, if the "index.html" had not been appended to
> the URL. In fact, the home pages on the sites that we build is
> typically "default.html".
> What referring page does it say has that url on it?
> On Oct 30, 6:35 am, Bob Gezelter wrote:
> > Beginning on or about October 15, Googlebot appears to be adding
> > "index.html" to references to our site that only contain the domain
> > name (e.g.,http://www.rlgsc.com").
> > This is clearly erroneous, as there are a variety of default names for
> > homepages. It is registered as a "Crawl error -- page not found". This
> > is clearly erroneous, as the URL as present in the referring www page
> > would have been correct, if the "index.html" had not been appended to
> > the URL. In fact, the home pages on the sites that we build is
> > typically "default.html".
> > What referring page does it say has that url on it?
> > On Oct 30, 6:35 am, Bob Gezelter wrote:
> > > Beginning on or about October 15, Googlebot appears to be adding
> > > "index.html" to references to our site that only contain the domain
> > > name (e.g.,http://www.rlgsc.com").
> > > This is clearly erroneous, as there are a variety of default names for
> > > homepages. It is registered as a "Crawl error -- page not found". This
> > > is clearly erroneous, as the URL as present in the referring www page
> > > would have been correct, if the "index.html" had not been appended to
> > > the URL. In fact, the home pages on the sites that we build is
> > > typically "default.html".
> Perhaps it had found that url in a previously cached copy of that or
> other pages from that site.
>> .. deleted in the interest of conserving bandwidth/space ...
> > - Show quoted text -
webado,
Unlikely. I am familiar with most of these pages from earlier
curiosity, and they were never a problem until now, and never had any
filename/type in the URL. Also, there never was a http://www.rlgsc.com/index.html page for them to have linked to in any event.
> > Perhaps it had found that url in a previously cached copy of that or
> > other pages from that site.
> >> .. deleted in the interest of conserving bandwidth/space ...
> > > - Show quoted text -
> webado,
> Unlikely. I am familiar with most of these pages from earlier
> curiosity, and they were never a problem until now, and never had any
> filename/type in the URL. Also, there never was ahttp://www.rlgsc.com/index.html > page for them to have linked to in any event.
I am having this same problem and my ranking in google has suddenly
tanked
I found a bunch of http://www.acidfanatic.com//index.html not found by
googlebot in the webmaster tools
You are killing my site google
www.acidfanatic.com has been around since 2001 do a search for the
word acidfanatic and there are over 5,000 references to my site and
yet it is being buried in the search listings for the keywords that it
is most relative for "acid loops" and "acid music" your search
algorithm is not working and your results are not relevant if they
exclude one of the most popular sites for it's genre from being found
> Beginning on or about October 15, Googlebot appears to be adding
> "index.html" to references to our site that only contain the domain
> name (e.g.,http://www.rlgsc.com").
> This is clearly erroneous, as there are a variety of default names for
> homepages. It is registered as a "Crawl error -- page not found". This
> is clearly erroneous, as the URL as present in the referring www page
> would have been correct, if the "index.html" had not been appended to
> the URL. In fact, the home pages on the sites that we build is
> typically "default.html".
Looking at your site, I don't see any technical issues which would
result in your site having trouble with regards to crawling, indexing
or ranking. In particular, the tests for /index.html absolutely do not
impact anything. Every site has lots of missing URLs (many external
links are broken for lots of sites, but we try to crawl those URLs
just in case). I wouldn't worry about us accessing /index.html; if
your site does not use it, we won't count it against you (that
wouldn't be very reasonable :-)).
John - it would be nice to see something comprehensive on the
index.html issue.
For example, there's the comment in the Google sitemap generator
writeup about the use of the subdirectory's date-last-modified for
index.html if index.html is not explicitly entered in the sitemap.
I'd like to know, for instance, what value for index.html's lastmod is
assumed if there's no explicit specification in a sitemap.
I've read the same thing in your message in another recent thread here
and wonder why there should be any connection between a few (odd they
are, that's right) 404-errors and the site being
"killed" (downgraded?) ... I don't really know, if you yourself
believe that, but if you do, think about it again.
Having said this (see above), I admit that it IS a nasty behaviour of
Google to 'invent' addresses just to show them as "errors" again in
the webmastertools ... what is this good for ^^ (if 'index.html' is
not there, so is 'foo.html' and foo_2.html and foo_3.html ...), where
does this stop?
You are doing a meta refresh from index.html to default.html .
Since there is no html link in the body, matching the meta refresh
destination, Googlebot is left assuming (rightly) that index.html
exists.
So you have now replaced a 404 with a 200 for an empty page, and no
conneciton to the rest of the site.
Redirections need to be done server-side, to be any good. And you must
not "fix" one problem (404, which a natural response for somethign
that doesnt' exist) by introducing another
> > > Perhaps it had found that url in a previously cached copy of that or
> > > other pages from that site.
> > >> .. deleted in the interest of conserving bandwidth/space ...
> > > > - Show quoted text -
> > webado,
> > Unlikely. I am familiar with most of these pages from earlier
> > curiosity, and they were never a problem until now, and never had any
> > filename/type in the URL. Also, there never was ahttp://www.rlgsc.com/index.html > > page for them to have linked to in any event.