It is currently Thu Sep 09, 2010 5:43 pm

Registration
Login Help

Advertise Here

Set.insert vs array << x unless array.include?(x)

FAQMembersRegisterLogin

Set.insert vs array << x unless array.include?(x)

Postby thomthom on Fri Nov 20, 2009 1:04 pm

I was curious of the difference between Set.insert vs array << x unless array.include?(x) when you want a collection with a unique set of values.

So I did some tests:

Test 1
A 10,000,000 times I tried to insert a number between 0-10.

t=Time.now;a=[];10000000.times{r=rand(10);a<<r unless a.include?(r)};puts Time.now - t
Result: 12.297

t=Time.now;a=Set.new;10000000.times{a.insert(rand(10))};puts Time.now - t
Result: 15.719

The array was faster here by three seconds. But that's with ten million iterations!


Test 2
But, when your collection of unique values increases:
t=Time.now;a=[];100000.times{r=rand(10000);a<<r unless a.include?(r)};puts Time.now - t
Result: 45.129

t=Time.now;a=Set.new;100000.times{a.insert(rand(10000))};puts Time.now - t
Result: 0.391

When your collection include a much higher number of elements - then the Array quickly starts to crumble. In this case, when the possible unique set was up to 100,000 the difference was 44.738 seconds!!!

I tried this in some of my scrips which on occasions require a unique set of entites (like collecting all vertices) and the speed difference was immense on larger models.

It doesn't take too many number of unique elements before Set becomes faster then array.include?.


Scaling
Just to illustrate how they scale:
Array.include?
t=Time.now;a=[];1000.times{r=rand(100);a<<r unless a.include?(r)};puts Time.now - t
0.0
t=Time.now;a=[];10000.times{r=rand(1000);a<<r unless a.include?(r)};puts Time.now - t
0.485
t=Time.now;a=[];100000.times{r=rand(10000);a<<r unless a.include?(r)};puts Time.now - t
42.906
t=Time.now;a=[];1000000.times{r=rand(100000);a<<r unless a.include?(r)};puts Time.now - t
Took too long. I guess it'd take ~4000.0s


Set.insert
t=Time.now;a=Set.new;1000.times{a.insert(rand(100))};puts Time.now - t
0.0
t=Time.now;a=Set.new;10000.times{a.insert(rand(1000))};puts Time.now - t
0.047
t=Time.now;a=Set.new;100000.times{a.insert(rand(10000))};puts Time.now - t
0.422
t=Time.now;a=Set.new;1000000.times{a.insert(rand(100000))};puts Time.now - t
4.937


Conclusion (not!)
Use the Set class when you require a unique set of values - unless your set of unique values is very small.
Set class is the rule - Array.include? is the exception.


Update
None of the above is the fastest. A better way has appeared. Collect everything into an array and apply .unique! at the end.
viewtopic.php?f=180&t=23760&p=222231#p222221
Thomas Thomassen — SketchUp Monkey
My Plugins
User avatar
thomthom
Global Moderator
 
Posts: 8578
Joined: Tue Nov 13, 2007 12:47 pm
Location: Trondheim, Norway
Name: Thomas Thomassen
Operating system: Windows7/Vista/XP/OSX10.4/10.5
SketchUp version: 6.4&7.1&8 Pro

Re: Set.insert vs array << x unless array.include?(x)

Postby thomthom on Fri Nov 20, 2009 1:10 pm

Also, Set.include? is faster than Array.include?
Thomas Thomassen — SketchUp Monkey
My Plugins
User avatar
thomthom
Global Moderator
 
Posts: 8578
Joined: Tue Nov 13, 2007 12:47 pm
Location: Trondheim, Norway
Name: Thomas Thomassen
Operating system: Windows7/Vista/XP/OSX10.4/10.5
SketchUp version: 6.4&7.1&8 Pro

Re: Set.insert vs array << x unless array.include?(x)

Postby Fredo6 on Fri Nov 20, 2009 7:00 pm

Tom,

very useful actually.
It really looks that the processsing in C is a lot faster than in Ruby.

Thanks

Fredo
User avatar
Fredo6
 
Posts: 955
Joined: Mon Nov 12, 2007 9:07 pm

Re: Set.insert vs array << x unless array.include?(x)

Postby thomthom on Fri Nov 20, 2009 7:07 pm

Fredo6 wrote:Tom,

very useful actually.
It really looks that the processsing in C is a lot faster than in Ruby.

Thanks

Fredo

Set class makes more processing in C?
What I'm wondering is, the Set class in SU Ruby is not the same Set class you get in Ruby 1.8.0. The Set class in Ruby 1.8.0 includes Enumerable and the set.rb looks to be a pure ruby class that mixes arrays and hashes. From my Ruby 1.8.0. installation, the Set class seem to be a pure Ruby implementation. But maybe the Set class that comes with SU is a C implementation...? Maybe that's why they replaced it?
Thomas Thomassen — SketchUp Monkey
My Plugins
User avatar
thomthom
Global Moderator
 
Posts: 8578
Joined: Tue Nov 13, 2007 12:47 pm
Location: Trondheim, Norway
Name: Thomas Thomassen
Operating system: Windows7/Vista/XP/OSX10.4/10.5
SketchUp version: 6.4&7.1&8 Pro

Re: Set.insert vs array << x unless array.include?(x)

Postby thomthom on Fri Nov 20, 2009 7:09 pm

I thought the speed difference was that the Set class used a more efficient Hash lookup as oppose to the Array functions that has to iterate the array every time... but that's just guesswork...
Thomas Thomassen — SketchUp Monkey
My Plugins
User avatar
thomthom
Global Moderator
 
Posts: 8578
Joined: Tue Nov 13, 2007 12:47 pm
Location: Trondheim, Norway
Name: Thomas Thomassen
Operating system: Windows7/Vista/XP/OSX10.4/10.5
SketchUp version: 6.4&7.1&8 Pro

Re: Set.insert vs array << x unless array.include?(x)

Postby Fredo6 on Fri Nov 20, 2009 9:08 pm

thomthom wrote:I thought the speed difference was that the Set class used a more efficient Hash lookup as oppose to the Array functions that has to iterate the array every time... but that's just guesswork...


You're right.
I personally use Hash whenever I want to store lists that have unique elements.
The list itself is obtained via Hash.values.

Maybe, as you are on this, you extend your benchmark to Hash and see how it compares with Set.

Fred
User avatar
Fredo6
 
Posts: 955
Joined: Mon Nov 12, 2007 9:07 pm

Re: Set.insert vs array << x unless array.include?(x)

Postby Chris Fullmer on Fri Nov 20, 2009 10:10 pm

Great info Thom! I had played around with sets and arrays before and I was not impressed with sets. But I did not experiment with amount of unique objects, which apparently has an adverse effect on arrays. Thanks,

Chris
Your intelligence will never fully compensate for my stupidity.
Cal Poly Pomona - BS Landscape Architecture
UC Berkeley - Masters of Landscape Architecture
All my Plugins I've written
chrisfullmer.com - Art Blog
User avatar
Chris Fullmer
Global Moderator
 
Posts: 4799
Joined: Wed Nov 21, 2007 3:21 am
Location: Davis, CA
Name: Chris Fullmer
Operating system: Windows Vista
SketchUp version: 7.1x

Re: Set.insert vs array << x unless array.include?(x)

Postby thomthom on Sat Nov 21, 2009 3:09 am

[quote="Fredo6"
Maybe, as you are on this, you extend your benchmark to Hash and see how it compares with Set.[/quote]
Will have a look at that.
Thomas Thomassen — SketchUp Monkey
My Plugins
User avatar
thomthom
Global Moderator
 
Posts: 8578
Joined: Tue Nov 13, 2007 12:47 pm
Location: Trondheim, Norway
Name: Thomas Thomassen
Operating system: Windows7/Vista/XP/OSX10.4/10.5
SketchUp version: 6.4&7.1&8 Pro

Re: Set.insert vs array << x unless array.include?(x)

Postby Nazz78 on Fri Feb 05, 2010 1:17 pm

How about using Array.uniq! method:

Test 1
t=Time.now;a=[];10000000.times{r=rand(10);a<<r unless a.include?(r)};puts Time.now - t
Result: 12.297
t=Time.now;a=Set.new;10000000.times{a.insert(rand(10))};puts Time.now - t
Result: 15.719
t=Time.now;a=[];10000000.times{r=rand(10);a<<r};a.uniq!; puts Time.now - t
Result: 7.753

Test 2
t=Time.now;a=[];100000.times{r=rand(10000);a<<r unless a.include?(r)};puts Time.now-t
Result: 40.97
t=Time.now;a=Set.new;100000.times{a.insert(rand(10000))};puts Time.now-t
Result: 0.377
t=Time.now;a=[];100000.times{r=rand(10000);a<<r};a.uniq!;puts Time.now-t
Result: 0.087
User avatar
Nazz78
 
Posts: 31
Joined: Thu Dec 20, 2007 6:08 pm
Name: Jernej Vidmar
Operating system: Vista x64
SketchUp version: 7

Re: Set.insert vs array << x unless array.include?(x)

Postby thomthom on Fri Feb 05, 2010 1:22 pm

I'll be damned!
Very interesting Jernej.

...looks like I need to do some more testing of my script and possibly refactor again.

So while the Array.include? is dead slow - the overhead of hash look-up is still faster than just adding everything into one big pile and so a single filtering afterwards...
Thomas Thomassen — SketchUp Monkey
My Plugins
User avatar
thomthom
Global Moderator
 
Posts: 8578
Joined: Tue Nov 13, 2007 12:47 pm
Location: Trondheim, Norway
Name: Thomas Thomassen
Operating system: Windows7/Vista/XP/OSX10.4/10.5
SketchUp version: 6.4&7.1&8 Pro

Re: Set.insert vs array << x unless array.include?(x)

Postby thomthom on Fri Feb 05, 2010 1:24 pm

looking at the .uniq! source code: http://ruby-doc.org/core/classes/Array.src/M002215.html

Code: Select all
/*
*  call-seq:
*     array.uniq! -> array or nil

*  Removes duplicate elements from _self_.
*  Returns <code>nil</code> if no changes are made (that is, no
*  duplicates are found).
*     
*     a = [ "a", "a", "b", "b", "c" ]
*     a.uniq!   #=> ["a", "b", "c"]
*     b = [ "a", "b", "c" ]
*     b.uniq!   #=> nil
*/

static VALUE
rb_ary_uniq_bang(ary)
    VALUE ary;
{
    VALUE hash, v, vv;
    long i, j;

    hash = ary_make_hash(ary, 0);

    if (RARRAY(ary)->len == RHASH(hash)->tbl->num_entries) {
        return Qnil;
    }
    for (i=j=0; i<RARRAY(ary)->len; i++) {
        v = vv = rb_ary_elt(ary, i);
        if (st_delete(RHASH(hash)->tbl, (st_data_t*)&vv, 0)) {
            rb_ary_store(ary, j++, v);
        }
    }
    RARRAY(ary)->len = j;

    return ary;
}
Thomas Thomassen — SketchUp Monkey
My Plugins
User avatar
thomthom
Global Moderator
 
Posts: 8578
Joined: Tue Nov 13, 2007 12:47 pm
Location: Trondheim, Norway
Name: Thomas Thomassen
Operating system: Windows7/Vista/XP/OSX10.4/10.5
SketchUp version: 6.4&7.1&8 Pro

Re: Set.insert vs array << x unless array.include?(x)

Postby thomthom on Fri Feb 05, 2010 1:26 pm

Jernej: how about larger iterations and higher number of random values?
Thomas Thomassen — SketchUp Monkey
My Plugins
User avatar
thomthom
Global Moderator
 
Posts: 8578
Joined: Tue Nov 13, 2007 12:47 pm
Location: Trondheim, Norway
Name: Thomas Thomassen
Operating system: Windows7/Vista/XP/OSX10.4/10.5
SketchUp version: 6.4&7.1&8 Pro

Re: Set.insert vs array << x unless array.include?(x)

Postby Nazz78 on Fri Feb 05, 2010 1:42 pm

thomthom wrote:Jernej: how about larger iterations and higher number of random values?


t=Time.now;a=Set.new;10000000.times{a.insert(rand(10000))};puts Time.now - t
Result: 37.911
t=Time.now;a=[];10000000.times{r=rand(10000);a<<r};a.uniq!; puts Time.now - t
Result: 8.282

Still a winner?
User avatar
Nazz78
 
Posts: 31
Joined: Thu Dec 20, 2007 6:08 pm
Name: Jernej Vidmar
Operating system: Vista x64
SketchUp version: 7

Re: Set.insert vs array << x unless array.include?(x)

Postby thomthom on Fri Feb 05, 2010 1:49 pm

It's refactoring time!

Nice find! :thumb:
Thomas Thomassen — SketchUp Monkey
My Plugins
User avatar
thomthom
Global Moderator
 
Posts: 8578
Joined: Tue Nov 13, 2007 12:47 pm
Location: Trondheim, Norway
Name: Thomas Thomassen
Operating system: Windows7/Vista/XP/OSX10.4/10.5
SketchUp version: 6.4&7.1&8 Pro

Re: Set.insert vs array << x unless array.include?(x)

Postby RickW on Mon Feb 08, 2010 9:34 pm

That's all great (using .uniq!) until you start dealing with Point3d objects :)
In that case, always use Set.
RickW
 
Posts: 746
Joined: Fri Nov 16, 2007 6:38 am
Location: Wichita, KS
Name: Rick Wilson
Operating system: WinXP
SketchUp version: >2

SketchUcation One-Liner Adverts

by Ad Machine on 5 minutes ago

Sculpteo makes your SketchUp designs real with 3D Printing
Ad Machine
Robot
 
Posts: 2010

Next

 

Advertise Here

Who is online

Users browsing this forum: No registered users and 1 guest