Google Groups Home
Help | Sign in
Message from discussion python sitemap_gen.py MemoryError
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
BadXAsh  
View profile
 More options Jul 3, 6:39 am
From: BadXAsh
Date: Wed, 2 Jul 2008 13:39:16 -0700 (PDT)
Local: Thurs, Jul 3 2008 6:39 am
Subject: Re: python sitemap_gen.py MemoryError
Man I was very confident in that working. I Changed the verbose
attribute of the site nod in config.xml to 3, and it does say
something about the directory being walked at the very beginning of
the process, but then around sitemap 54 I received this message:
---
URL:  loc=[http://www.diyautoparts.com/search/parts/1985/dodge/aries/
air-check-valve.shtml]  lastmod=[2008-01-17T16:25:38Z]  changefreq=[]
priority=[]
URL:  loc=[http://www.diyautoparts.com/search/parts/1985/dodge/aries/
air-conditioning-accumulator.shtml]  lastmod=[2008-01-17T16:25:38Z]
changefreq=[]  priority=[]
Traceback (most recent call last):
  File "sitemap_gen.py", line 2206, in ?
    sitemap.Generate()
  File "sitemap_gen.py", line 1778, in Generate
    input.ProduceURLs(self.ConsumeURL)
  File "sitemap_gen.py", line 979, in ProduceURLs
    os.path.walk(self._path, PerDirectory, None)
  File "/usr/lib/python2.4/posixpath.py", line 298, in walk
    walk(name, func, arg)
  File "/usr/lib/python2.4/posixpath.py", line 298, in walk
    walk(name, func, arg)
  File "/usr/lib/python2.4/posixpath.py", line 298, in walk
    walk(name, func, arg)
  File "/usr/lib/python2.4/posixpath.py", line 298, in walk
    walk(name, func, arg)
  File "/usr/lib/python2.4/posixpath.py", line 298, in walk
    walk(name, func, arg)
  File "/usr/lib/python2.4/posixpath.py", line 290, in walk
    func(arg, top, names)
  File "sitemap_gen.py", line 974, in PerDirectory
    PerFile(dirpath, name)
  File "sitemap_gen.py", line 959, in PerFile
    consumer(url, False)
  File "sitemap_gen.py", line 1839, in ConsumeURL
    self._urls[hash] = 1
MemoryError
---

I am using the directory node this is what i have:

---
<directory
     path="(My Site Path)"
     url="http://www.diyautoparts.com/"
     default_file="index.html"
  />
---

(My Site Path) Is my long path that i actually have typed in but I'm
so paranoid i took it out just in case haha. Anyways, I do believe my
Python is up to date as my web hosting company takes care of that
server side software.

And yes those pages are all different one for each type of part we
carry, which is a lot, so they are not the same page over and over, if
that's what you mean.

Thanks for your help so far!

On Jul 2, 12:46 pm, cristina wrote:

> Can you set the verbose attribute of
> the <site> node in config.xml to 3
> (highest level of diagnostic data for
> when you run the sitemap generator),
> to if you get more diagnostic messages
> and
> check if you get a diagnostic message
> about the directory being walked at the time
> before that error message in walk.

> Are you using the directory nodes of
> config.xml to walk your server file system?

> Are you running the latest version 1.5
> of the Python sitemap generator from the
> link 'Read more about the Sitemap Generator' inhttp://www.google.com/support/webmasters/bin/answer.py?answer=34634&t...

> Another thing is that 2 millions+ URLs are quite a lot,
> are you sure there are no duplicate URLs,
> and that you want to list all these URLs
> in your sitemaps?

> Cristina.

> On Jul 2, 4:15 pm, BadXAsh wrote:

> > Hello all,

> > I have something of a problem i was hoping the gods of the web that
> > reside here could help me with. I'm making my sitemap for google, and
> > my site is rather large (2 million+ pages) and when running my python
> > script it starts off without a hitch. Works beautifully, that is
> > untill it hits sitemap54.xml.gz... then without fail it crashes. Below
> > is the message I get. (I cut the file path down to save space as your
> > don't need to see the huge file path it goes through.)

> > ---
> > Writing Sitemap file "(file path)/sitemap53.xml.gz" with 50000 URLs
> > Sorting and normalizing collected URLs.
> > Writing Sitemap file "(file path)/sitemap54.xml.gz" with 50000 URLs
> > Traceback (most recent call last):
> > File "sitemap_gen.py", line 2208, in ?
> > sitemap.Generate()
> > File "sitemap_gen.py", line 1780, in Generate
> > input.ProduceURLs(self.ConsumeURL)
> > File "sitemap_gen.py", line 979, in ProduceURLs
> > os.path.walk(self._path, PerDirectory, None)
> > File "/usr/lib/python2.4/posixpath.py", line 298, in walk
> > walk(name, func, arg)
> > File "/usr/lib/python2.4/posixpath.py", line 298, in walk
> > walk(name, func, arg)
> > File "/usr/lib/python2.4/posixpath.py", line 298, in walk
> > walk(name, func, arg)
> > File "/usr/lib/python2.4/posixpath.py", line 298, in walk
> > walk(name, func, arg)
> > File "/usr/lib/python2.4/posixpath.py", line 298, in walk
> > walk(name, func, arg)
> >File "/usr/lib/python2.4/posixpath.py", line 290, in walk
> > func(arg, top, names)
> > File "sitemap_gen.py", line 974, in PerDirectory
> > PerFile(dirpath, name)
> > File "sitemap_gen.py", line 959, in PerFile
> > consumer(url, False)
> > File "sitemap_gen.py", line 1841, in ConsumeURL
> > self._urls[hash] = 1
> > MemoryError
> > ---

> > Anyone have any incite or work arounds to how i can free up the
> > apparent memory that is gummed up by this process? Any help is
> > greatfully appreciated!

> > THANK YOU!


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2008 Google