Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Python

How can I create customized classes that have similar properties as 'str'?

 

 

First page Previous page 1 2 Next page Last page  View All Python python RSS feed   Index | Next | Previous | View Threaded


fanglicheng at gmail

Nov 24, 2007, 2:31 AM

Post #1 of 33 (250 views)
Permalink
How can I create customized classes that have similar properties as 'str'?

I mean, all the class instances that equal to each other should be
reduced into only one instance, which means for instances of this
class there's no difference between a is b and a==b.

Thank you.
--
http://mail.python.org/mailman/listinfo/python-list


fanglicheng at gmail

Nov 24, 2007, 2:36 AM

Post #2 of 33 (243 views)
Permalink
Re: How can I create customized classes that have similar properties as 'str'? [In reply to]

I find myself frequently in need of classes like this for two reasons.
First, it's efficient in memory. Second, when two instances are
compared for equality only their pointers are compared. (I think
that's how Python compares 'str's.

On Nov 24, 6:31 pm, Licheng Fang <fanglich...@gmail.com> wrote:
> I mean, all the class instances that equal to each other should be
> reduced into only one instance, which means for instances of this
> class there's no difference between a is b and a==b.
>
> Thank you.

--
http://mail.python.org/mailman/listinfo/python-list


usenet-mail-0306.20.chr0n0ss at spamgourmet

Nov 24, 2007, 3:00 AM

Post #3 of 33 (243 views)
Permalink
Re: How can I create customized classes that have similar properties as 'str'? [In reply to]

Licheng Fang wrote:
> I mean, all the class instances that equal to each other should be
> reduced into only one instance, which means for instances of this
> class there's no difference between a is b and a==b.

If you only want that if "a == b" is True also "a is b" is True,
overload the is_ attribute of your class. Personally, I don't see
any advantage in this.

Regards,


Björn

--
BOFH excuse #352:

The cables are not the same length.

--
http://mail.python.org/mailman/listinfo/python-list


usenet-mail-0306.20.chr0n0ss at spamgourmet

Nov 24, 2007, 3:05 AM

Post #4 of 33 (242 views)
Permalink
Re: How can I create customized classes that have similar properties as 'str'? [In reply to]

Licheng Fang wrote:
> I find myself frequently in need of classes like this for two
> reasons. First, it's efficient in memory.

Are you using millions of objects, or MB size objects? Otherwise,
this is no argument.

BTW, what happens if you, by some operation, make a == b, and
afterwards change b so another object instance must be created?
This instance management is quite a runtime overhead.

> Second, when two instances are compared for equality only their
> pointers are compared.

I state that the object management will often eat more performance
than equality testing. Except you have a huge number of equal
objects. If the latter was the case you should rethink your program
design.

> (I think that's how Python compares 'str's.

Generally not. In CPython, just very short strings are created only
once.

>>> a=" "
>>> b=" "
>>> a is b
True
>>> a=" "
>>> b=" "
>>> a is b
False

Regards,


Björn

--
BOFH excuse #430:

Mouse has out-of-cheese-error

--
http://mail.python.org/mailman/listinfo/python-list


fanglicheng at gmail

Nov 24, 2007, 3:44 AM

Post #5 of 33 (242 views)
Permalink
Re: How can I create customized classes that have similar properties as 'str'? [In reply to]

On Nov 24, 7:05 pm, Bjoern Schliessmann <usenet-
mail-0306.20.chr0n...@spamgourmet.com> wrote:
> Licheng Fang wrote:
> > I find myself frequently in need of classes like this for two
> > reasons. First, it's efficient in memory.
>
> Are you using millions of objects, or MB size objects? Otherwise,
> this is no argument.

Yes, millions. In my natural language processing tasks, I almost
always need to define patterns, identify their occurrences in a huge
data, and count them. Say, I have a big text file, consisting of
millions of words, and I want to count the frequency of trigrams:

trigrams([1,2,3,4,5]) == [(1,2,3),(2,3,4),(3,4,5)]

I can save the counts in a dict D1. Later, I may want to recount the
trigrams, with some minor modifications, say, doing it on every other
line of the input file, and the counts are saved in dict D2. Problem
is, D1 and D2 have almost the same set of keys (trigrams of the text),
yet the keys in D2 are new instances, even though these keys probably
have already been inserted into D1. So I end up with unnecessary
duplicates of keys. And this can be a great waste of memory with huge
input data.

>
> BTW, what happens if you, by some operation, make a == b, and
> afterwards change b so another object instance must be created?
> This instance management is quite a runtime overhead.
>

I probably need this class to be immutable.

> > Second, when two instances are compared for equality only their
> > pointers are compared.
>
> I state that the object management will often eat more performance
> than equality testing. Except you have a huge number of equal
> objects. If the latter was the case you should rethink your program
> design.
>

Yeah, counting is all about equal or not.

> > (I think that's how Python compares 'str's.
>
> Generally not. In CPython, just very short strings are created only
> once.
>
> >>> a=" "
> >>> b=" "
> >>> a is b
> True
> >>> a=" "
> >>> b=" "
> >>> a is b
>

Wow, I didn't know this. But exactly how Python manage these strings?
My interpretator gave me such results:

>>> a = 'this'
>>> b = 'this'
>>> a is b
True
>>> a = 'this is confusing'
>>> b = 'this is confusing'
>>> a is b
False


> False
>
> Regards,
>
> Björn
>
> --
> BOFH excuse #430:
>
> Mouse has out-of-cheese-error

--
http://mail.python.org/mailman/listinfo/python-list


usenet-mail-0306.20.chr0n0ss at spamgourmet

Nov 24, 2007, 4:40 AM

Post #6 of 33 (237 views)
Permalink
Re: How can I create customized classes that have similar properties as 'str'? [In reply to]

Licheng Fang wrote:
> On Nov 24, 7:05 pm, Bjoern Schliessmann <usenet-

>> BTW, what happens if you, by some operation, make a == b, and
>> afterwards change b so another object instance must be created?
>> This instance management is quite a runtime overhead.
>
> I probably need this class to be immutable.

IMHO you don't need a change of Python, but simply a special
implementation (probably using metaclasses/singletons).

> Wow, I didn't know this. But exactly how Python manage these
> strings?

I don't know (use the source, Luke). :) Or perhaps there is a Python
Elder here that knows?

Regards,


Björn

--
BOFH excuse #165:

Backbone Scoliosis

--
http://mail.python.org/mailman/listinfo/python-list


samwyse at gmail

Nov 24, 2007, 4:54 AM

Post #7 of 33 (237 views)
Permalink
Re: How can I create customized classes that have similar properties as 'str'? [In reply to]

On Nov 24, 5:44 am, Licheng Fang <fanglich...@gmail.com> wrote:
> Yes, millions. In my natural language processing tasks, I almost
> always need to define patterns, identify their occurrences in a huge
> data, and count them. [...] So I end up with unnecessary
> duplicates of keys. And this can be a great waste of memory with huge
> input data.

create a hash that maps your keys to themselves, then use the values
of that hash as your keys.

>>> store = {}
>>> def atom(str):
global store
if str not in store:
store[str] = str
return store[str]

>>> a='this is confusing'
>>> b='this is confusing'
>>> a == b
True
>>> a is b
False
>>> atom(a) is atom(b)
True
--
http://mail.python.org/mailman/listinfo/python-list


bj_666 at gmx

Nov 24, 2007, 5:42 AM

Post #8 of 33 (238 views)
Permalink
Re: How can I create customized classes that have similar properties as 'str'? [In reply to]

On Sat, 24 Nov 2007 13:40:40 +0100, Bjoern Schliessmann wrote:

> Licheng Fang wrote:
>> On Nov 24, 7:05 pm, Bjoern Schliessmann <usenet-
>
>> Wow, I didn't know this. But exactly how Python manage these
>> strings?
>
> I don't know (use the source, Luke). :) Or perhaps there is a Python
> Elder here that knows?

AFAIK strings of length 1 and strings that would be valid Python
identifiers are treated this way.

Ciao,
Marc 'BlackJack' Rintsch
--
http://mail.python.org/mailman/listinfo/python-list


fanglicheng at gmail

Nov 24, 2007, 8:35 AM

Post #9 of 33 (236 views)
Permalink
Re: How can I create customized classes that have similar properties as 'str'? [In reply to]

On Nov 24, 9:42 pm, Marc 'BlackJack' Rintsch <bj_...@gmx.net> wrote:
> On Sat, 24 Nov 2007 13:40:40 +0100, Bjoern Schliessmann wrote:
> > Licheng Fang wrote:
> >> On Nov 24, 7:05 pm, Bjoern Schliessmann <usenet-
>
> >> Wow, I didn't know this. But exactly how Python manage these
> >> strings?
>
> > I don't know (use the source, Luke). :) Or perhaps there is a Python
> > Elder here that knows?
>
> AFAIK strings of length 1 and strings that would be valid Python
> identifiers are treated this way.
>
> Ciao,
> Marc 'BlackJack' Rintsch

Thanks. Then, is there a way to make python treat all strings this
way, or any other kind of immutable objects?
--
http://mail.python.org/mailman/listinfo/python-list


bdesth.quelquechose at free

Nov 24, 2007, 9:15 AM

Post #10 of 33 (236 views)
Permalink
Re: How can I create customized classes that have similar properties as 'str'? [In reply to]

Licheng Fang a écrit :
> I mean, all the class instances that equal to each other should be
> reduced into only one instance, which means for instances of this
> class there's no difference between a is b and a==b.

Here's a Q&D attempt - without any garantee, and to be taylored to your
needs.

_values = {} #id(instance) => value mapping
_instances = {} #hash(value) => instance mapping

class Value(object):
def __new__(cls, value):
try:
return _instances[hash(value)]
except KeyError:
instance = object.__new__(cls)
_values[id(instance)] = value
_instances[hash(value)] = instance
return instance

@apply
def value():
def fget(self):
return _values[id(self)]
def fset(self, ignore):
raise AttributeError("%s.value is read only" % type(self))
def fdel(self):
raise AttributeError("%s.value is read only" % type(self))
return property(**locals())


HTH
--
http://mail.python.org/mailman/listinfo/python-list


samwyse at gmail

Nov 24, 2007, 9:23 AM

Post #11 of 33 (236 views)
Permalink
Re: How can I create customized classes that have similar properties as 'str'? [In reply to]

On Nov 24, 10:35 am, Licheng Fang <fanglich...@gmail.com> wrote:
> Thanks. Then, is there a way to make python treat all strings this
> way, or any other kind of immutable objects?

The word generally used is 'atom' when referring to strings that are
set up such that 'a == b' implies 'a is b'. This is usually an
expensive process, since you don't want to do it to strings that are,
e.g., read from a file. Yes, it could be done only for string
constants, and some languages (starting with LISP) do this, but that
isn't what you (or most people) want. Whether you realize it or not,
you want control over the process; in your example, you don't want to
do it for the lines read from your file, just the trigrams.

The example that I gave does exactly that. It adds a fixed amount of
storage for each string that you 'intern' (the usual name given to the
process of generating such a string. Let's look at my code again:

>>> store = {}
>>> def atom(str):
global store
if str not in store:
store[str] = str
return store[str]

Each string passed to 'atom' already exists. We look to see if copy
already exists; if so we can discard the latest instance and use that
copy henceforth. If a copy does not exist, we save the string inside
'store'. Since the string already exists, we're just increasing its
reference count by two (so it won't be reference counted) and
increasing the size of 'store' by (an amortized) pair of pointers to
that same string.
--
http://mail.python.org/mailman/listinfo/python-list


samwyse at gmail

Nov 24, 2007, 9:38 AM

Post #12 of 33 (236 views)
Permalink
Re: How can I create customized classes that have similar properties as 'str'? [In reply to]

On Nov 24, 5:44 am, Licheng Fang <fanglich...@gmail.com> wrote:
> Yes, millions. In my natural language processing tasks, I almost
> always need to define patterns, identify their occurrences in a huge
> data, and count them. Say, I have a big text file, consisting of
> millions of words, and I want to count the frequency of trigrams:
>
> trigrams([1,2,3,4,5]) == [(1,2,3),(2,3,4),(3,4,5)]

BTW, if the components of your trigrams are never larger than a byte,
then encode the tuples as integers and don't worry about pointer
comparisons.

>>> def encode(s):
return (ord(s[0])*256+ord(s[1]))*256+ord(s[2])

>>> def trigram(s):
return [ encode(s[i:i+3]) for i in range(0, len(s)-2)]

>>> trigram('abcde')
[6382179, 6447972, 6513765]
--
http://mail.python.org/mailman/listinfo/python-list


steve at REMOVE-THIS-cybersource

Nov 24, 2007, 1:59 PM

Post #13 of 33 (235 views)
Permalink
Re: How can I create customized classes that have similar properties as 'str'? [In reply to]

On Sat, 24 Nov 2007 03:44:59 -0800, Licheng Fang wrote:

> On Nov 24, 7:05 pm, Bjoern Schliessmann <usenet-
> mail-0306.20.chr0n...@spamgourmet.com> wrote:
>> Licheng Fang wrote:
>> > I find myself frequently in need of classes like this for two
>> > reasons. First, it's efficient in memory.
>>
>> Are you using millions of objects, or MB size objects? Otherwise, this
>> is no argument.
>
> Yes, millions.


Oh noes!!! Not millions of words!!!! That's like, oh, a few tens of
megabytes!!!!1! How will a PC with one or two gigabytes of RAM cope?????

Tens of megabytes is not a lot of data.

If the average word size is ten characters, then one million words takes
ten million bytes, or a little shy of ten megabytes. Even if you are
using four-byte characters, you've got 40 MB, still a moderate amount of
data on a modern system.


> In my natural language processing tasks, I almost always
> need to define patterns, identify their occurrences in a huge data, and
> count them. Say, I have a big text file, consisting of millions of
> words, and I want to count the frequency of trigrams:
>
> trigrams([1,2,3,4,5]) == [(1,2,3),(2,3,4),(3,4,5)]
>
> I can save the counts in a dict D1. Later, I may want to recount the
> trigrams, with some minor modifications, say, doing it on every other
> line of the input file, and the counts are saved in dict D2. Problem is,
> D1 and D2 have almost the same set of keys (trigrams of the text), yet
> the keys in D2 are new instances, even though these keys probably have
> already been inserted into D1. So I end up with unnecessary duplicates
> of keys. And this can be a great waste of memory with huge input data.

All these keys will almost certainly add up to only a few hundred
megabytes, which is a reasonable size of data but not excessive. This
really sounds to me like a case of premature optimization. I think you
are wasting your time solving a non-problem.



[snip]
> Wow, I didn't know this. But exactly how Python manage these strings? My
> interpretator gave me such results:
>
>>>> a = 'this'
>>>> b = 'this'
>>>> a is b
> True
>>>> a = 'this is confusing'
>>>> b = 'this is confusing'
>>>> a is b
> False


It's an implementation detail. You shouldn't use identity testing unless
you actually care that two names refer to the same object, not because
you want to save a few bytes. That's poor design: it's fragile,
complicated, and defeats the purpose of using a high-level language like
Python.




--
Steven.
--
http://mail.python.org/mailman/listinfo/python-list


steve at REMOVE-THIS-cybersource

Nov 24, 2007, 2:17 PM

Post #14 of 33 (235 views)
Permalink
Re: How can I create customized classes that have similar properties as 'str'? [In reply to]

On Sat, 24 Nov 2007 04:54:43 -0800, samwyse wrote:

> create a hash that maps your keys to themselves, then use the values of
> that hash as your keys.
>
>>>> store = {}
>>>> def atom(str):
> global store
> if str not in store:
> store[str] = str
> return store[str]

Oh lordy, that's really made my day! That's the funniest piece of code
I've seen for a long time! Worthy of being submitted to the DailyWTF.

Samwyse, while I applaud your willingness to help, I think you actually
need to get some programming skills before doing so. Here's a hint to get
you started: can you think of a way to optimize that function so it does
less work?



--
Steven.
--
http://mail.python.org/mailman/listinfo/python-list


steve at REMOVE-THIS-cybersource

Nov 24, 2007, 2:19 PM

Post #15 of 33 (235 views)
Permalink
Re: How can I create customized classes that have similar properties as 'str'? [In reply to]

On Sat, 24 Nov 2007 12:00:25 +0100, Bjoern Schliessmann wrote:

> Licheng Fang wrote:
>> I mean, all the class instances that equal to each other should be
>> reduced into only one instance, which means for instances of this class
>> there's no difference between a is b and a==b.
>
> If you only want that if "a == b" is True also "a is b" is True,
> overload the is_ attribute of your class. Personally, I don't see any
> advantage in this.

No advantage? That's for sure. There is no is_ attribute of generic
classes, and even if there was, it would have no special meaning.

Identity testing can't be overloaded. If it could, it would no longer be
identity testing.


--
Steven.
--
http://mail.python.org/mailman/listinfo/python-list


george.sakkis at gmail

Nov 24, 2007, 2:58 PM

Post #16 of 33 (227 views)
Permalink
Re: How can I create customized classes that have similar properties as 'str'? [In reply to]

On Nov 24, 4:59 pm, Steven D'Aprano

<st...@REMOVE-THIS-cybersource.com.au> wrote:
> On Sat, 24 Nov 2007 03:44:59 -0800, Licheng Fang wrote:
> > On Nov 24, 7:05 pm, Bjoern Schliessmann <usenet-
> > mail-0306.20.chr0n...@spamgourmet.com> wrote:
> >> Licheng Fang wrote:
> >> > I find myself frequently in need of classes like this for two
> >> > reasons. First, it's efficient in memory.
>
> >> Are you using millions of objects, or MB size objects? Otherwise, this
> >> is no argument.
>
> > Yes, millions.
>
> Oh noes!!! Not millions of words!!!! That's like, oh, a few tens of
> megabytes!!!!1! How will a PC with one or two gigabytes of RAM cope?????
>

Comments like these make one wonder if your real life experience with
massive data matches even the one tenth of your self-importance and
need to be snarky in most of your posts.

To the OP: yes, your use case is quite valid; the keyword you are
looking for is "memoize". You can find around a dozen of recipes in
the Cookbook and posted in this list; here's one starting point:
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/413717.

HTH,
George
--
http://mail.python.org/mailman/listinfo/python-list


usenet-mail-0306.20.chr0n0ss at spamgourmet

Nov 24, 2007, 3:02 PM

Post #17 of 33 (227 views)
Permalink
Re: How can I create customized classes that have similar properties as 'str'? [In reply to]

Steven D'Aprano wrote:

> No advantage? That's for sure. There is no is_ attribute of
> generic classes, and even if there was, it would have no special
> meaning.

Argl, I confused the operator module's attributes with objects ;)

Regards,


Björn

--
BOFH excuse #378:

Operators killed by year 2000 bug bite.

--
http://mail.python.org/mailman/listinfo/python-list


hniksic at xemacs

Nov 24, 2007, 4:38 PM

Post #18 of 33 (227 views)
Permalink
Re: How can I create customized classes that have similar properties as 'str'? [In reply to]

samwyse <samwyse[at]gmail.com> writes:

> create a hash that maps your keys to themselves, then use the values
> of that hash as your keys.

The "atom" function you describe already exists under the name
"intern".
--
http://mail.python.org/mailman/listinfo/python-list


steve at REMOVE-THIS-cybersource

Nov 24, 2007, 4:42 PM

Post #19 of 33 (227 views)
Permalink
Re: How can I create customized classes that have similar properties as 'str'? [In reply to]

On Sat, 24 Nov 2007 14:58:50 -0800, George Sakkis wrote:

> On Nov 24, 4:59 pm, Steven D'Aprano
>
> <st...@REMOVE-THIS-cybersource.com.au> wrote:
>> On Sat, 24 Nov 2007 03:44:59 -0800, Licheng Fang wrote:
>> > On Nov 24, 7:05 pm, Bjoern Schliessmann <usenet-
>> > mail-0306.20.chr0n...@spamgourmet.com> wrote:
>> >> Licheng Fang wrote:
>> >> > I find myself frequently in need of classes like this for two
>> >> > reasons. First, it's efficient in memory.
>>
>> >> Are you using millions of objects, or MB size objects? Otherwise,
>> >> this is no argument.
>>
>> > Yes, millions.
>>
>> Oh noes!!! Not millions of words!!!! That's like, oh, a few tens of
>> megabytes!!!!1! How will a PC with one or two gigabytes of RAM
>> cope?????
>>
>>
> Comments like these make one wonder if your real life experience with
> massive data matches even the one tenth of your self-importance and need
> to be snarky in most of your posts.

I cheerfully admit to never needing to deal with "massive data".

However, I have often needed to deal with tens and hundreds of megabytes
of data, which IS NOT MASSIVE amounts of data to deal with on modern
systems. Which was my point.


> To the OP: yes, your use case is quite valid; the keyword you are
> looking for is "memoize". You can find around a dozen of recipes in the
> Cookbook and posted in this list; here's one starting point:
> http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/413717.

This has nothing, absolutely NOTHING, to do with memoization. Memoization
trades off memory for time, allowing slow functions to return results
faster at the cost of using more memory. The OP wants to save memory, not
use more of it.



--
Steven.
--
http://mail.python.org/mailman/listinfo/python-list


steve at REMOVE-THIS-cybersource

Nov 24, 2007, 5:12 PM

Post #20 of 33 (227 views)
Permalink
Re: How can I create customized classes that have similar properties as 'str'? [In reply to]

On Sun, 25 Nov 2007 01:38:51 +0100, Hrvoje Niksic wrote:

> samwyse <samwyse[at]gmail.com> writes:
>
>> create a hash that maps your keys to themselves, then use the values of
>> that hash as your keys.
>
> The "atom" function you describe already exists under the name "intern".

Not really. intern() works very differently, because it can tie itself to
the Python internals. Samwyse's atom() function doesn't, and so has no
purpose.


In any case, I'm not sure that intern() actually will solve the OP's
problem, even assuming it is a real and not imaginary problem. According
to the docs, intern()'s purpose is to speed up dictionary lookups, not to
save memory. I suspect that if it does save memory, it will be by
accident.

>From the docs:
http://docs.python.org/lib/non-essential-built-in-funcs.html

intern( string)
Enter string in the table of ``interned'' strings and return the interned
string - which is string itself or a copy. Interning strings is useful to
gain a little performance on dictionary lookup - if the keys in a
dictionary are interned, and the lookup key is interned, the key
comparisons (after hashing) can be done by a pointer compare instead of a
string compare. Normally, the names used in Python programs are
automatically interned, and the dictionaries used to hold module, class
or instance attributes have interned keys. Changed in version 2.3:
Interned strings are not immortal (like they used to be in Python 2.2 and
before); you must keep a reference to the return value of intern() around
to benefit from it.


Note the words "which is string itself or a copy". It would be ironic if
the OP uses intern to avoid having copies of strings, and ends up with
even more copies than if he didn't bother.

I guess he'll actually need to measure his memory consumption and see
whether he actually has a memory problem or not, right?


--
Steven.
--
http://mail.python.org/mailman/listinfo/python-list


hniksic at xemacs

Nov 24, 2007, 5:48 PM

Post #21 of 33 (228 views)
Permalink
Re: How can I create customized classes that have similar properties as 'str'? [In reply to]

Steven D'Aprano <steve[at]REMOVE-THIS-cybersource.com.au> writes:

> On Sun, 25 Nov 2007 01:38:51 +0100, Hrvoje Niksic wrote:
>
>> samwyse <samwyse[at]gmail.com> writes:
>>
>>> create a hash that maps your keys to themselves, then use the values of
>>> that hash as your keys.
>>
>> The "atom" function you describe already exists under the name "intern".
>
> Not really. intern() works very differently, because it can tie itself to
> the Python internals.

The exact implementation mechanism is subtly different, but
functionally intern is equivalent to the "atom" function.

> In any case, I'm not sure that intern() actually will solve the OP's
> problem, even assuming it is a real and not imaginary
> problem. According to the docs, intern()'s purpose is to speed up
> dictionary lookups, not to save memory. I suspect that if it does
> save memory, it will be by accident.

It's not by accident, it follows from what interning does. Interning
speeds up comparisons by returning the same string object for the same
string contents. If the strings you're working with tend to repeat,
interning will save some memory simply by preventing storage of
multiple copies of the same string. Whether the savings would make
any difference for the OP is another question.

> From the docs:
> http://docs.python.org/lib/non-essential-built-in-funcs.html
>
> intern( string)
> Enter string in the table of ``interned'' strings and return the interned
> string - which is string itself or a copy. [...]
>
> Note the words "which is string itself or a copy". It would be ironic if
> the OP uses intern to avoid having copies of strings, and ends up with
> even more copies than if he didn't bother.

That's a frequently misunderstood sentence. It doesn't mean that
intern will make copies; it simply means that the string you get back
from intern can be either the string you passed it or another
(previously interned) string object that is guaranteed to have the
same contents as your string (which makes it technically a "copy" of
the string you passed to intern).
--
http://mail.python.org/mailman/listinfo/python-list


george.sakkis at gmail

Nov 24, 2007, 9:55 PM

Post #22 of 33 (227 views)
Permalink
Re: How can I create customized classes that have similar properties as 'str'? [In reply to]

On Nov 24, 7:42 pm, Steven D'Aprano

> > To the OP: yes, your use case is quite valid; the keyword you are
> > looking for is "memoize". You can find around a dozen of recipes in the
> > Cookbook and posted in this list; here's one starting point:
> >http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/413717.
>
> This has nothing, absolutely NOTHING, to do with memoization. Memoization
> trades off memory for time, allowing slow functions to return results
> faster at the cost of using more memory. The OP wants to save memory, not
> use more of it.

If you bothered to click on that link you would learn that memoization
can be used to save space too and matches OP's case exactly; even the
identity tests work. Self-importance is bad enough by itself, even
without the ignorance, but you seem to do great in both.

George
--
http://mail.python.org/mailman/listinfo/python-list


__peter__ at web

Nov 25, 2007, 1:39 AM

Post #23 of 33 (220 views)
Permalink
Re: How can I create customized classes that have similar properties as 'str'? [In reply to]

Steven D'Aprano wrote:

>>>>> store = {}
>>>>> def atom(str):
>> global store
>> if str not in store:
>> store[str] = str
>> return store[str]
>
> Oh lordy, that's really made my day! That's the funniest piece of code
> I've seen for a long time! Worthy of being submitted to the DailyWTF.

Here's a script to show atom()'s effect on memory footprint:

$ cat atom.py
import sys
data = [1]*1000
items = []
cache = {}
if "-a" in sys.argv:
def atom(obj):
try:
return cache[obj]
except KeyError:
cache[obj] = obj
return obj
else:
def atom(obj):
return obj
try:
while 1:
items.append(atom(tuple(data)))
except MemoryError:
print len(items)
$ ulimit -v 5000
$ python atom.py
226
$ python atom.py -a
185742

So if you are going to submit Sam's function make sure to bundle it with
this little demo...

Peter
--
http://mail.python.org/mailman/listinfo/python-list


fanglicheng at gmail

Nov 25, 2007, 2:42 AM

Post #24 of 33 (219 views)
Permalink
Re: How can I create customized classes that have similar properties as 'str'? [In reply to]

On Nov 25, 5:59 am, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:
> On Sat, 24 Nov 2007 03:44:59 -0800, Licheng Fang wrote:
> > On Nov 24, 7:05 pm, Bjoern Schliessmann <usenet-
> > mail-0306.20.chr0n...@spamgourmet.com> wrote:
> >> Licheng Fang wrote:
> >> > I find myself frequently in need of classes like this for two
> >> > reasons. First, it's efficient in memory.
>
> >> Are you using millions of objects, or MB size objects? Otherwise, this
> >> is no argument.
>
> > Yes, millions.
>
> Oh noes!!! Not millions of words!!!! That's like, oh, a few tens of
> megabytes!!!!1! How will a PC with one or two gigabytes of RAM cope?????
>
> Tens of megabytes is not a lot of data.
>
> If the average word size is ten characters, then one million words takes
> ten million bytes, or a little shy of ten megabytes. Even if you are
> using four-byte characters, you've got 40 MB, still a moderate amount of
> data on a modern system.

I mentioned trigram counting as an illustrative case. In fact, you'll
often need to define patterns more complex than that, and tens of
megabytes of text may generate millions of them, and I've observed
they quickly ate up the 8G memory of a workstation in a few minutes.
Manipulating these patterns can be tricky, you can easily waste a lot
of memory without taking extra care. I just thought if I define my
pattern class with this 'atom' property, coding efforts could be
easier later.

>
> > In my natural language processing tasks, I almost always
> > need to define patterns, identify their occurrences in a huge data, and
> > count them. Say, I have a big text file, consisting of millions of
> > words, and I want to count the frequency of trigrams:
>
> > trigrams([1,2,3,4,5]) == [(1,2,3),(2,3,4),(3,4,5)]
>
> > I can save the counts in a dict D1. Later, I may want to recount the
> > trigrams, with some minor modifications, say, doing it on every other
> > line of the input file, and the counts are saved in dict D2. Problem is,
> > D1 and D2 have almost the same set of keys (trigrams of the text), yet
> > the keys in D2 are new instances, even though these keys probably have
> > already been inserted into D1. So I end up with unnecessary duplicates
> > of keys. And this can be a great waste of memory with huge input data.
>
> All these keys will almost certainly add up to only a few hundred
> megabytes, which is a reasonable size of data but not excessive. This
> really sounds to me like a case of premature optimization. I think you
> are wasting your time solving a non-problem.
>
> [snip]
>
> > Wow, I didn't know this. But exactly how Python manage these strings? My
> > interpretator gave me such results:
>
> >>>> a = 'this'
> >>>> b = 'this'
> >>>> a is b
> > True
> >>>> a = 'this is confusing'
> >>>> b = 'this is confusing'
> >>>> a is b
> > False
>
> It's an implementation detail. You shouldn't use identity testing unless
> you actually care that two names refer to the same object, not because
> you want to save a few bytes. That's poor design: it's fragile,
> complicated, and defeats the purpose of using a high-level language like
> Python.
>
> --
> Steven.

--
http://mail.python.org/mailman/listinfo/python-list


steve at REMOVE-THIS-cybersource

Nov 25, 2007, 2:53 AM

Post #25 of 33 (219 views)
Permalink
Re: How can I create customized classes that have similar properties as 'str'? [In reply to]

On Sun, 25 Nov 2007 10:39:38 +0100, Peter Otten wrote:

> So if you are going to submit Sam's function make sure to bundle it with
> this little demo...

Well Peter, I was going to reply with a comment about not changing the
problem domain (tuples of ints to trigrams from a text file for natural
language processing, that is, three character alphanumeric strings), and
that if you re-did your test with strings (as I did) you would see
absolutely no difference. What I was going to say was "Tuples aren't
interned. Short strings that look like identifiers are. Jumping through
hoops to cache things which are already cached is not productive
programming."

But then I dug a little deeper, and disassembled the code I was running,
and discovered that I was being fooled by the Python compiler's constant-
folding, and if I took steps to defeat the optimizer, the effect I was
seeing disappeared, and I got the same results as you.

Well. So I've learned something new: Python doesn't intern strings in the
way I thought it did. I don't quite know *how* it decides which strings
to intern and which ones not to, but at least I've learnt that what I
thought was true is not true.

So I offer my apology to Samwyse, the caching code isn't as redundant and
silly as it appears, and humbly tuck into this nice steaming plate of
crow.

Somebody pass the salt please.



--
Steven.
--
http://mail.python.org/mailman/listinfo/python-list

First page Previous page 1 2 Next page Last page  View All Python python RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.