Google Groups Home
Help | Sign in
Message from discussion python sitemap_gen.py MemoryError
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
cristina  
View profile
 More options Jul 3, 2:46 am
From: cristina
Date: Wed, 2 Jul 2008 09:46:15 -0700 (PDT)
Local: Thurs, Jul 3 2008 2:46 am
Subject: Re: python sitemap_gen.py MemoryError
Can you set the verbose attribute of
the <site> node in config.xml to 3
(highest level of diagnostic data for
when you run the sitemap generator),
to if you get more diagnostic messages
and
check if you get a diagnostic message
about the directory being walked at the time
before that error message in walk.

Are you using the directory nodes of
config.xml to walk your server file system?

Are you running the latest version 1.5
of the Python sitemap generator from the
link 'Read more about the Sitemap Generator' in
http://www.google.com/support/webmasters/bin/answer.py?answer=34634&t...

Another thing is that 2 millions+ URLs are quite a lot,
are you sure there are no duplicate URLs,
and that you want to list all these URLs
in your sitemaps?

Cristina.

On Jul 2, 4:15 pm, BadXAsh wrote:

> Hello all,

> I have something of a problem i was hoping the gods of the web that
> reside here could help me with. I'm making my sitemap for google, and
> my site is rather large (2 million+ pages) and when running my python
> script it starts off without a hitch. Works beautifully, that is
> untill it hits sitemap54.xml.gz... then without fail it crashes. Below
> is the message I get. (I cut the file path down to save space as your
> don't need to see the huge file path it goes through.)

> ---
> Writing Sitemap file "(file path)/sitemap53.xml.gz" with 50000 URLs
> Sorting and normalizing collected URLs.
> Writing Sitemap file "(file path)/sitemap54.xml.gz" with 50000 URLs
> Traceback (most recent call last):
> File "sitemap_gen.py", line 2208, in ?
> sitemap.Generate()
> File "sitemap_gen.py", line 1780, in Generate
> input.ProduceURLs(self.ConsumeURL)
> File "sitemap_gen.py", line 979, in ProduceURLs
> os.path.walk(self._path, PerDirectory, None)
> File "/usr/lib/python2.4/posixpath.py", line 298, in walk
> walk(name, func, arg)
> File "/usr/lib/python2.4/posixpath.py", line 298, in walk
> walk(name, func, arg)
> File "/usr/lib/python2.4/posixpath.py", line 298, in walk
> walk(name, func, arg)
> File "/usr/lib/python2.4/posixpath.py", line 298, in walk
> walk(name, func, arg)
> File "/usr/lib/python2.4/posixpath.py", line 298, in walk
> walk(name, func, arg)
> File "/usr/lib/python2.4/posixpath.py", line 290, in walk
> func(arg, top, names)
> File "sitemap_gen.py", line 974, in PerDirectory
> PerFile(dirpath, name)
> File "sitemap_gen.py", line 959, in PerFile
> consumer(url, False)
> File "sitemap_gen.py", line 1841, in ConsumeURL
> self._urls[hash] = 1
> MemoryError
> ---

> Anyone have any incite or work arounds to how i can free up the
> apparent memory that is gummed up by this process? Any help is
> greatfully appreciated!

> THANK YOU!


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2008 Google