Gmail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
how to remove the same words in the paragraph
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  6 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
kylin  
View profile  
 More options Nov 4, 9:13 am
Newsgroups: comp.lang.python
From: kylin <huili.s...@gmail.com>
Date: Tue, 3 Nov 2009 14:13:45 -0800 (PST)
Local: Wed, Nov 4 2009 9:13 am
Subject: how to remove the same words in the paragraph
I need to remove the word if it appears in the paragraph twice. could
some give me some clue or some useful function in the python.

    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Andre Engels  
View profile  
 More options Nov 4, 9:33 am
Newsgroups: comp.lang.python
From: Andre Engels <andreeng...@gmail.com>
Date: Tue, 3 Nov 2009 23:33:59 +0100
Local: Wed, Nov 4 2009 9:33 am
Subject: Re: how to remove the same words in the paragraph

On Tue, Nov 3, 2009 at 11:13 PM, kylin <huili.s...@gmail.com> wrote:
> I need to remove the word if it appears in the paragraph twice. could
> some give me some clue or some useful function in the python.

Well, it depends a bit on what you call 'the same word' (In the
paragraph "Fly fly, fly!" does the word fly occur 0, 1, 2 or 3
times?), but the split() function seems a logical choice to use
whatever the answer to that question.

--
André Engels, andreeng...@gmail.com


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peter Otten  
View profile  
 More options Nov 4, 9:40 am
Newsgroups: comp.lang.python
Followup-To: comp.lang.python
From: Peter Otten <__pete...@web.de>
Date: Tue, 03 Nov 2009 23:40:26 +0100
Local: Wed, Nov 4 2009 9:40 am
Subject: Re: how to remove the same words in the paragraph

kylin wrote:
> I want to remove all the punctuation and no need words form a string
> datasets for experiment.
> I need to remove the word if it appears in the paragraph twice. could
> some give me some clue or some useful function in the python.
>>> para = u"""I need to remove the word if it appears in the paragraph

twice. could
... some give me some clue or some useful function in the python.
... """
>>> print "\n".join(sorted(set(para.translate(dict.fromkeys(map(ord,

".:,-"))).split())))
I
appears
clue
could
function
give
if
in
it
me
need
or
paragraph
python
remove
some
the
to
twice
useful
word

    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tim Chase  
View profile  
 More options Nov 4, 9:57 am
Newsgroups: comp.lang.python
From: Tim Chase <python.l...@tim.thechases.com>
Date: Tue, 03 Nov 2009 16:57:14 -0600
Local: Wed, Nov 4 2009 9:57 am
Subject: Re: how to remove the same words in the paragraph

kylin wrote:
> I need to remove the word if it appears in the paragraph twice. could
> some give me some clue or some useful function in the python.

Sounds like homework.  To fail your class, use this one:

 >>> p = "one two three four five six seven three four eight"
 >>> s = set()
 >>> print ' '.join(w for w in p.split() if not (w in s or s.add(w)))
one two three four five six seven eight

which is absolutely horrible because it mutates the set within
the list comprehension.  The passable solution would use a
for-loop to iterate over each word in the paragraph, emitting it
if it hadn't already been seen.  Maintain those words in set, so
your words know how not to be seen. ("Mr. Nesbitt, would you
please stand up?")

This also assumes your paragraph consists only of words and
whitespace.  But since you posted your previous homework-sounding
question on stripping out non-word/whitespace characters, you'll
want to look into using a regexp like "[\w\s]" to clean up the
cruft in the paragraph.  Neither solution above preserves non
white-space/word characters, for which I'd recommend using a
re.sub() with a callback.  Such a callback class might look
something like

 >>> class Dedupe:
...     def __init__(self):
...             self.s = set()
...     def __call__(self, m):
...             w = m.group(0)
...             if w in self.s: return ''
...             self.s.add(w)
...             return w
...
 >>> r.sub(Dedupe(), p)

where I leave the definition of "r" to the student.  Also beware
of case-differences for which you might have to normalize.

You'll also want to use more descriptive variable names than my
one-letter tokens.

-tkc


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tim Chase  
View profile  
 More options Nov 5, 2:09 am
Newsgroups: comp.lang.python
From: Tim Chase <python.l...@tim.thechases.com>
Date: Wed, 04 Nov 2009 09:09:12 -0600
Local: Thurs, Nov 5 2009 2:09 am
Subject: Re: how to remove the same words in the paragraph

>   Can we use inp_paragraph.count(iter_word) to make it simple ?

It would work, but the performance will drop off sharply as the
length of the paragraph grows, and you'd still have to keep track
of which words you already printed so you can correctly print the
first one.  So you might as well not bother with counting.

-tkc


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tim Chase  
View profile  
 More options Nov 9, 11:13 pm
Newsgroups: comp.lang.python
From: Tim Chase <python.l...@tim.thechases.com>
Date: Mon, 09 Nov 2009 06:13:30 -0600
Local: Mon, Nov 9 2009 11:13 pm
Subject: Re: how to remove the same words in the paragraph

> I think simple regex may come handy,

>   p=re.compile(r'(.+) .*\1')    #note the space
>   s=p.search("python and i love python")
>   s.groups()
>   (' python',)

> But that matches for only one double word.Someone else could light up here
> to extract all the double words.Then they can be removed from the original
> paragraph.

This has multiple problems:

 >>> p = re.compile(r'(.+) .*\1')
 >>> s = p.search("python one two one two python")
 >>> s.groups()
('python',)
 >>> s = p.search("python one two one two python one")
 >>> s.groups() # guess what happened to the 2nd "one"...
('python one',)

and even once you have the list of theoretical duplicates (by
changing the regexp to r'\b(\w+)\b.*?\1' perhaps), you still have
to worry about emitting the first instance but not subsequent
instances.

-tkc


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google