Google Groups Home
Help | Sign in
Message from discussion python sitemap_gen.py MemoryError
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
BadXAsh  
View profile
 More options Jul 8, 11:30 pm
From: BadXAsh
Date: Tue, 8 Jul 2008 06:30:59 -0700 (PDT)
Local: Tues, Jul 8 2008 11:30 pm
Subject: Re: python sitemap_gen.py MemoryError
Well i ran just the '85 dodge aries folder and it ran perfectly,
finished in the blink of an eye. I also tried to run just the /parts/
directory, which is the single largest directory on my site. A page
for each and every part for every vehicle from 1965 thru 2007, so as
you can imagine it's rather large. Just walking along that directory
Sitemapping it alone failed in MemoryError as well. Though it went
past the '85 dodge aries it crashed out in the '86  year, Something
similar though is that it crashed when it reached mid page 55 of the
Sitemaps.

So I'm wondering can i set up a filter to break down the /parts/
directory and Map it in sections, say like in 10 to 20 years
increments,  i.e. /parts/1965 - 1985 I'm not to clear on the FILTERS
rules, it seems like i can't really specify directories i want
filtered. And alternitively, can i leave out the /parts/ directory
once i have that mapped so i can map the rest of the site and leave
out that directory?

Or should i just abandon all hope?? hehe Thank you for your help so
far Cristina!

On Jul 2, 6:31 pm, cristina wrote:

> Can you run the sitemap generator more than once
> for different config.xml files with different
> settings for the <directory> node,
> just to break the sitemaps for different sub-folders,
> to check if indeed the problem is memory leak
> because of the large number of URLs,
> and not some problem because of file system walking.

> For example first time run the sitemap generator
> for the directory where you got the error
> to check that this directory can be walked OK

>  <directory
>      path="/main_path/search/parts/1985/dodge/aries"
>      url="http://www.diyautoparts.com/search/parts/1985/dodge/aries/"
>      default_file="index.html"
>   />

> change default_file to index.shtml
> if the default home page is index.shtml
> After that run the sitemap generator for other
> non-overlapping directories,
> you can use if you want the <sitemap>
> nodes as well to aggregate sitemaps
> (you can use <sitemap> nodes in version 1.4,
> I am not sure if you can use them in version 1.5)

> It is not great, just to check that the
> problems are indeed because of memory leaks
> caused by the large number of URLs.

> Cristina.

> On Jul 2, 9:39 pm, BadXAsh wrote:

> > Man I was very confident in that working. I Changed the verbose
> > attribute of the site nod in config.xml to 3, and it does say
> > something about the directory being walked at the very beginning of
> > the process, but then around sitemap 54 I received this message:
> > ---
> > URL:  loc=[http://www.diyautoparts.com/search/parts/1985/dodge/aries/
> > air-check-valve.shtml]  lastmod=[2008-01-17T16:25:38Z]  changefreq=[]
> > priority=[]
> > URL:  loc=[http://www.diyautoparts.com/search/parts/1985/dodge/aries/
> > air-conditioning-accumulator.shtml]  lastmod=[2008-01-17T16:25:38Z]
> > changefreq=[]  priority=[]
> > Traceback (most recent call last):
> >   File "sitemap_gen.py", line 2206, in ?
> >     sitemap.Generate()
> >   File "sitemap_gen.py", line 1778, in Generate
> >     input.ProduceURLs(self.ConsumeURL)
> >   File "sitemap_gen.py", line 979, in ProduceURLs
> >     os.path.walk(self._path, PerDirectory, None)
> >   File "/usr/lib/python2.4/posixpath.py", line 298, in walk
> >     walk(name, func, arg)
> >   File "/usr/lib/python2.4/posixpath.py", line 298, in walk
> >     walk(name, func, arg)
> >   File "/usr/lib/python2.4/posixpath.py", line 298, in walk
> >     walk(name, func, arg)
> >   File "/usr/lib/python2.4/posixpath.py", line 298, in walk
> >     walk(name, func, arg)
> >   File "/usr/lib/python2.4/posixpath.py", line 298, in walk
> >     walk(name, func, arg)
> >   File "/usr/lib/python2.4/posixpath.py", line 290, in walk
> >     func(arg, top, names)
> >   File "sitemap_gen.py", line 974, in PerDirectory
> >     PerFile(dirpath, name)
> >   File "sitemap_gen.py", line 959, in PerFile
> >     consumer(url, False)
> >   File "sitemap_gen.py", line 1839, in ConsumeURL
> >     self._urls[hash] = 1
> > MemoryError
> > ---

> > I am using the directory node this is what i have:

> > ---
> > <directory
> >      path="(My Site Path)"
> >      url="http://www.diyautoparts.com/"
> >      default_file="index.html"
> >   />
> > ---

> > (My Site Path) Is my long path that i actually have typed in but I'm
> > so paranoid i took it out just in case haha. Anyways, I do believe my
> > Python is up to date as my web hosting company takes care of that
> > server side software.

> > And yes those pages are all different one for each type of part we
> > carry, which is a lot, so they are not the same page over and over, if
> > that's what you mean.

> > Thanks for your help so far!


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2008 Google