Quantcast

mono performance, 20x differential with Java (what am i doing wrong)

classic Classic list List threaded Threaded
38 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

mono performance, 20x differential with Java (what am i doing wrong)

Jonathan Shore
Hi,

I'm quite familiar with both the .NET and Java development environments, but only recently have begun to experiment with mono, so forgive me if I'm not clued-in.   

I specialize in numerical work that often involves a lot large-scale array manipulation for linear algebra, timeseries, etc.    My main production platforms are OSX and Linux.   I've been doing most of my work on the JVM over the past few years, though spent a couple of years with .NET when it was pre-release / pre-1.0.  

My main interest is in Ocaml, particularly the F# variant as the basis for my numerical work.

One of the first things I do when considering a platform is run benchmarks, as performance is critical for what I do.    Starting with C# I wrote a test to gauge the array-access overhead associated with the platform.  Without knowing how to tweak the mono runtime to turn on any particular optimisations, the results were quite poor for this specific test (see code at the end of this posting).


The test on my MacPro 2.6 Ghz / Snow Leopard with mono 2.6.1 gave the result of:

16 sec, 130 ms for 1000 iterations

the same code, modified just for IO, etc on the Java VM (without -server)  gave a runtime of:

 0 sec, 831 ms

changing the # of iterations to higher amounts did nothing to improve the ratio.   Java is 20x faster in this benchmark.

I could not find any documentation concerning settings for the -optimize flag on the mono VM, so perhaps there is a setting I should be using.   

Secondly, I saw the posting concerning the optional use of LLVM.  I have not been able to build mono on OSX as am having problems building glib.  I'm wondering whether anyone has a packaged up version of glib or better a packaged up version of mono with LLVM enabled.

I have heard only good things about LLVM performance, so hoping that this will help address this gap.   Hopefully I am doing something wrong here and the performance is much closer.   Test code below ...

regards

Jonathan
--


using System;

namespace Performance
{

public class ArrayTest
{
public static double test1 (double[] vec)
{
double sum = 0;
for (int i = 8 ; i < vec.Length ; i++)
{
vec[i] = 2*vec[i] - vec[i-1];
for (int j = 1 ; j < 8 ; j++)
sum += 1.3 * vec[j-1];
}
return sum;
}

public static void Main (string[] argv)
{
int iterations = argv.Length > 0 ? int.Parse(argv[0]) : 1000;
double[] vec = new double[100000];
for (int i = 0 ; i < vec.Length ; i++)
vec[i] = i;
DateTime Tstart = DateTime.Now;
Console.WriteLine ("starting performance test on " + iterations + " iterations");
double sum = 0;
for (int i = 0 ; i < iterations ; i++)
sum += test1 (vec);
DateTime Tend = DateTime.Now;
TimeSpan Tspan = Tend - Tstart;
Console.WriteLine ("ending performance test on " + iterations + " iterations, time: " + Tspan.Seconds + ":" + Tspan.Milliseconds);

Console.WriteLine ("result: " + sum);
}
}
}





_______________________________________________
Mono-list maillist  -  [hidden email]
http://lists.ximian.com/mailman/listinfo/mono-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: mono performance, 20x differential with Java (what am i doing wrong)

Diego Frata
Hello Jonathan,

I'm working on a computer that has a Intel Core 2 Duo CPU T5250 1.5 GHz (way slower than yours). I've tried the code below on .NET 4 Beta 2 (shame on me, my other computer died some days ago and I didn't install Mono) and I got worst results than you at a very first moment.

My first setup was the default one for VS2010. Release x86

starting performance test on 1000 iterations
ending performance test on 1000 iterations, time: 43:733
result: 2729781599,99818

Oops, I'm running a 64 bit OS, so I've compiled my application again targeting Release x64

starting performance test on 1000 iterations
ending performance test on 1000 iterations, time: 10:813
result: 2729781599,99818

That's a lot better, but I can speed up things a little bit introducing some unsafeness into the code:

        public unsafe static double test1(double* vec, int size)
        {
            double sum = 0;
            for (int i = 8; i < size; i++)
            {
                vec[i] = 2 * vec[i] - vec[i - 1];
                for (int j = 1; j < 8; j++)
                    sum += 1.3 * vec[j - 1];
            }

            return sum;
        }

        public static void Main(string[] argv)
        {
            int iterations = argv.Length > 0 ? int.Parse(argv[0]) : 1000;

            unsafe
            {
                int size = 100000;
                double* vec = stackalloc double[size];
                for (int i = 0; i < size; i++)
                    vec[i] = i;

                DateTime Tstart = DateTime.Now;
                Console.WriteLine("starting performance test on " + iterations + " iterations");

                double sum = 0;

                for (int i = 0; i < iterations; i++)
                    sum += test1(vec, size);


                DateTime Tend = DateTime.Now;
                TimeSpan Tspan = Tend - Tstart;
                Console.WriteLine("ending performance test on " + iterations + " iterations, time: " + Tspan.Seconds + ":" + Tspan.Milliseconds);

                Console.WriteLine("result: " + sum);
                Console.Read();
            }
        }


starting performance test on 1000 iterations
ending performance test on 1000 iterations, time: 5:571
result: 2729781599,99818

That's the best I could extract from a single threaded computation without changing your logic.

Try take a look at these things, maybe Mono is presenting the same behavior as .NET.


Sorry if all this was unhelpful and off-topic ;)

Diego Frata
[hidden email]


On Fri, Jan 29, 2010 at 12:00 AM, Jonathan Shore <[hidden email]> wrote:
Hi,

I'm quite familiar with both the .NET and Java development environments, but only recently have begun to experiment with mono, so forgive me if I'm not clued-in.   

I specialize in numerical work that often involves a lot large-scale array manipulation for linear algebra, timeseries, etc.    My main production platforms are OSX and Linux.   I've been doing most of my work on the JVM over the past few years, though spent a couple of years with .NET when it was pre-release / pre-1.0.  

My main interest is in Ocaml, particularly the F# variant as the basis for my numerical work.

One of the first things I do when considering a platform is run benchmarks, as performance is critical for what I do.    Starting with C# I wrote a test to gauge the array-access overhead associated with the platform.  Without knowing how to tweak the mono runtime to turn on any particular optimisations, the results were quite poor for this specific test (see code at the end of this posting).


The test on my MacPro 2.6 Ghz / Snow Leopard with mono 2.6.1 gave the result of:

16 sec, 130 ms for 1000 iterations

the same code, modified just for IO, etc on the Java VM (without -server)  gave a runtime of:

 0 sec, 831 ms

changing the # of iterations to higher amounts did nothing to improve the ratio.   Java is 20x faster in this benchmark.

I could not find any documentation concerning settings for the -optimize flag on the mono VM, so perhaps there is a setting I should be using.   

Secondly, I saw the posting concerning the optional use of LLVM.  I have not been able to build mono on OSX as am having problems building glib.  I'm wondering whether anyone has a packaged up version of glib or better a packaged up version of mono with LLVM enabled.

I have heard only good things about LLVM performance, so hoping that this will help address this gap.   Hopefully I am doing something wrong here and the performance is much closer.   Test code below ...

regards

Jonathan
--


using System;

namespace Performance
{

public class ArrayTest
{
public static double test1 (double[] vec)
{
double sum = 0;
for (int i = 8 ; i < vec.Length ; i++)
{
vec[i] = 2*vec[i] - vec[i-1];
for (int j = 1 ; j < 8 ; j++)
sum += 1.3 * vec[j-1];
}
return sum;
}

public static void Main (string[] argv)
{
int iterations = argv.Length > 0 ? int.Parse(argv[0]) : 1000;
double[] vec = new double[100000];
for (int i = 0 ; i < vec.Length ; i++)
vec[i] = i;
DateTime Tstart = DateTime.Now;
Console.WriteLine ("starting performance test on " + iterations + " iterations");
double sum = 0;
for (int i = 0 ; i < iterations ; i++)
sum += test1 (vec);
DateTime Tend = DateTime.Now;
TimeSpan Tspan = Tend - Tstart;
Console.WriteLine ("ending performance test on " + iterations + " iterations, time: " + Tspan.Seconds + ":" + Tspan.Milliseconds);

Console.WriteLine ("result: " + sum);
}
}
}





_______________________________________________
Mono-list maillist  -  [hidden email]
http://lists.ximian.com/mailman/listinfo/mono-list



_______________________________________________
Mono-list maillist  -  [hidden email]
http://lists.ximian.com/mailman/listinfo/mono-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: mono performance, 20x differential with Java (what am i doing wrong)

Jonathan Shore
Diego,  Thanks for your suggestions.   I adjusted to use an unsafe declaration around test1(), but get the same performance results.    I am wondering whether there is some optimisation mode I need to enable in the mono VM.    Anyone have an idea?

I did:

mcs -optimize -unsafe *.cs
mono ArrayTest.exe 1000

Result: 
starting performance test on 1000 iterations
ending performance test on 1000 iterations, time: 16:919

On Java VM this is < 1 second.

Are there some flags I can use on the mono VM to speed this up?    Would the LLVM version do significantly better?

Jonathan

On Jan 28, 2010, at 10:33 PM, Diego Frata wrote:

Hello Jonathan,

I'm working on a computer that has a Intel Core 2 Duo CPU T5250 1.5 GHz (way slower than yours). I've tried the code below on .NET 4 Beta 2 (shame on me, my other computer died some days ago and I didn't install Mono) and I got worst results than you at a very first moment.

My first setup was the default one for VS2010. Release x86

starting performance test on 1000 iterations
ending performance test on 1000 iterations, time: 43:733
result: 2729781599,99818

Oops, I'm running a 64 bit OS, so I've compiled my application again targeting Release x64

starting performance test on 1000 iterations
ending performance test on 1000 iterations, time: 10:813
result: 2729781599,99818

That's a lot better, but I can speed up things a little bit introducing some unsafeness into the code:

        public unsafe static double test1(double* vec, int size)
        {
            double sum = 0;
            for (int i = 8; i < size; i++)
            {
                vec[i] = 2 * vec[i] - vec[i - 1];
                for (int j = 1; j < 8; j++)
                    sum += 1.3 * vec[j - 1];
            }

            return sum;
        }

        public static void Main(string[] argv)
        {
            int iterations = argv.Length > 0 ? int.Parse(argv[0]) : 1000;

            unsafe
            {
                int size = 100000;
                double* vec = stackalloc double[size];
                for (int i = 0; i < size; i++)
                    vec[i] = i;

                DateTime Tstart = DateTime.Now;
                Console.WriteLine("starting performance test on " + iterations + " iterations");

                double sum = 0;

                for (int i = 0; i < iterations; i++)
                    sum += test1(vec, size);


                DateTime Tend = DateTime.Now;
                TimeSpan Tspan = Tend - Tstart;
                Console.WriteLine("ending performance test on " + iterations + " iterations, time: " + Tspan.Seconds + ":" + Tspan.Milliseconds);

                Console.WriteLine("result: " + sum);
                Console.Read();
            }
        }


starting performance test on 1000 iterations
ending performance test on 1000 iterations, time: 5:571
result: 2729781599,99818

That's the best I could extract from a single threaded computation without changing your logic.

Try take a look at these things, maybe Mono is presenting the same behavior as .NET.


Sorry if all this was unhelpful and off-topic ;)

Diego Frata
[hidden email]


On Fri, Jan 29, 2010 at 12:00 AM, Jonathan Shore <[hidden email]> wrote:
Hi,

I'm quite familiar with both the .NET and Java development environments, but only recently have begun to experiment with mono, so forgive me if I'm not clued-in.   

I specialize in numerical work that often involves a lot large-scale array manipulation for linear algebra, timeseries, etc.    My main production platforms are OSX and Linux.   I've been doing most of my work on the JVM over the past few years, though spent a couple of years with .NET when it was pre-release / pre-1.0.  

My main interest is in Ocaml, particularly the F# variant as the basis for my numerical work.

One of the first things I do when considering a platform is run benchmarks, as performance is critical for what I do.    Starting with C# I wrote a test to gauge the array-access overhead associated with the platform.  Without knowing how to tweak the mono runtime to turn on any particular optimisations, the results were quite poor for this specific test (see code at the end of this posting).


The test on my MacPro 2.6 Ghz / Snow Leopard with mono 2.6.1 gave the result of:

16 sec, 130 ms for 1000 iterations

the same code, modified just for IO, etc on the Java VM (without -server)  gave a runtime of:

 0 sec, 831 ms

changing the # of iterations to higher amounts did nothing to improve the ratio.   Java is 20x faster in this benchmark.

I could not find any documentation concerning settings for the -optimize flag on the mono VM, so perhaps there is a setting I should be using.   

Secondly, I saw the posting concerning the optional use of LLVM.  I have not been able to build mono on OSX as am having problems building glib.  I'm wondering whether anyone has a packaged up version of glib or better a packaged up version of mono with LLVM enabled.

I have heard only good things about LLVM performance, so hoping that this will help address this gap.   Hopefully I am doing something wrong here and the performance is much closer.   Test code below ...

regards

Jonathan
--


using System;

namespace Performance
{

public class ArrayTest
{
public static double test1 (double[] vec)
{
double sum = 0;
for (int i = 8 ; i < vec.Length ; i++)
{
vec[i] = 2*vec[i] - vec[i-1];
for (int j = 1 ; j < 8 ; j++)
sum += 1.3 * vec[j-1];
}
return sum;
}

public static void Main (string[] argv)
{
int iterations = argv.Length > 0 ? int.Parse(argv[0]) : 1000;
double[] vec = new double[100000];
for (int i = 0 ; i < vec.Length ; i++)
vec[i] = i;
DateTime Tstart = DateTime.Now;
Console.WriteLine ("starting performance test on " + iterations + " iterations");
double sum = 0;
for (int i = 0 ; i < iterations ; i++)
sum += test1 (vec);
DateTime Tend = DateTime.Now;
TimeSpan Tspan = Tend - Tstart;
Console.WriteLine ("ending performance test on " + iterations + " iterations, time: " + Tspan.Seconds + ":" + Tspan.Milliseconds);

Console.WriteLine ("result: " + sum);
}
}
}





_______________________________________________
Mono-list maillist  -  [hidden email]
http://lists.ximian.com/mailman/listinfo/mono-list




_______________________________________________
Mono-list maillist  -  [hidden email]
http://lists.ximian.com/mailman/listinfo/mono-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: mono performance, 20x differential with Java (what am i doing wrong)

Stifu

I could be wrong, but intensive operations like these may run faster with the
upcoming new garbage collector (coming in Mono 2.8).
I don't know if the new GC is currently stable enough for you to try it.


Jonathan Shore wrote:

>
> Diego,  Thanks for your suggestions.   I adjusted to use an unsafe
> declaration around test1(), but get the same performance results.    I am
> wondering whether there is some optimisation mode I need to enable in the
> mono VM.    Anyone have an idea?
>
> I did:
>
> mcs -optimize -unsafe *.cs
> mono ArrayTest.exe 1000
>
> Result:
> starting performance test on 1000 iterations
> ending performance test on 1000 iterations, time: 16:919
>
> On Java VM this is < 1 second.
>
> Are there some flags I can use on the mono VM to speed this up?    Would
> the LLVM version do significantly better?
>
> Jonathan
>
> On Jan 28, 2010, at 10:33 PM, Diego Frata wrote:
>
>> Hello Jonathan,
>>
>> I'm working on a computer that has a Intel Core 2 Duo CPU T5250 1.5 GHz
>> (way slower than yours). I've tried the code below on .NET 4 Beta 2
>> (shame on me, my other computer died some days ago and I didn't install
>> Mono) and I got worst results than you at a very first moment.
>>
>> My first setup was the default one for VS2010. Release x86
>>
>> starting performance test on 1000 iterations
>> ending performance test on 1000 iterations, time: 43:733
>> result: 2729781599,99818
>>
>> Oops, I'm running a 64 bit OS, so I've compiled my application again
>> targeting Release x64
>>
>> starting performance test on 1000 iterations
>> ending performance test on 1000 iterations, time: 10:813
>> result: 2729781599,99818
>>
>> That's a lot better, but I can speed up things a little bit introducing
>> some unsafeness into the code:
>>
>>         public unsafe static double test1(double* vec, int size)
>>         {
>>             double sum = 0;
>>             for (int i = 8; i < size; i++)
>>             {
>>                 vec[i] = 2 * vec[i] - vec[i - 1];
>>                 for (int j = 1; j < 8; j++)
>>                     sum += 1.3 * vec[j - 1];
>>             }
>>
>>             return sum;
>>         }
>>
>>         public static void Main(string[] argv)
>>         {
>>             int iterations = argv.Length > 0 ? int.Parse(argv[0]) : 1000;
>>
>>             unsafe
>>             {
>>                 int size = 100000;
>>                 double* vec = stackalloc double[size];
>>                 for (int i = 0; i < size; i++)
>>                     vec[i] = i;
>>
>>                 DateTime Tstart = DateTime.Now;
>>                 Console.WriteLine("starting performance test on " +
>> iterations + " iterations");
>>
>>                 double sum = 0;
>>
>>                 for (int i = 0; i < iterations; i++)
>>                     sum += test1(vec, size);
>>
>>
>>                 DateTime Tend = DateTime.Now;
>>                 TimeSpan Tspan = Tend - Tstart;
>>                 Console.WriteLine("ending performance test on " +
>> iterations + " iterations, time: " + Tspan.Seconds + ":" +
>> Tspan.Milliseconds);
>>
>>                 Console.WriteLine("result: " + sum);
>>                 Console.Read();
>>             }
>>         }
>>
>>
>> starting performance test on 1000 iterations
>> ending performance test on 1000 iterations, time: 5:571
>> result: 2729781599,99818
>>
>> That's the best I could extract from a single threaded computation
>> without changing your logic.
>>
>> Try take a look at these things, maybe Mono is presenting the same
>> behavior as .NET.
>>
>>
>> Sorry if all this was unhelpful and off-topic ;)
>>
>> Diego Frata
>> [hidden email]
>>
>>
>> On Fri, Jan 29, 2010 at 12:00 AM, Jonathan Shore
>> <[hidden email]> wrote:
>> Hi,
>>
>> I'm quite familiar with both the .NET and Java development environments,
>> but only recently have begun to experiment with mono, so forgive me if
>> I'm not clued-in.  
>>
>> I specialize in numerical work that often involves a lot large-scale
>> array manipulation for linear algebra, timeseries, etc.    My main
>> production platforms are OSX and Linux.   I've been doing most of my work
>> on the JVM over the past few years, though spent a couple of years with
>> .NET when it was pre-release / pre-1.0.  
>>
>> My main interest is in Ocaml, particularly the F# variant as the basis
>> for my numerical work.
>>
>> One of the first things I do when considering a platform is run
>> benchmarks, as performance is critical for what I do.    Starting with C#
>> I wrote a test to gauge the array-access overhead associated with the
>> platform.  Without knowing how to tweak the mono runtime to turn on any
>> particular optimisations, the results were quite poor for this specific
>> test (see code at the end of this posting).
>>
>>
>> The test on my MacPro 2.6 Ghz / Snow Leopard with mono 2.6.1 gave the
>> result of:
>>
>> 16 sec, 130 ms for 1000 iterations
>>
>> the same code, modified just for IO, etc on the Java VM (without -server)
>> gave a runtime of:
>>
>> 0 sec, 831 ms
>>
>> changing the # of iterations to higher amounts did nothing to improve the
>> ratio.   Java is 20x faster in this benchmark.
>>
>> I could not find any documentation concerning settings for the -optimize
>> flag on the mono VM, so perhaps there is a setting I should be using.  
>>
>> Secondly, I saw the posting concerning the optional use of LLVM.  I have
>> not been able to build mono on OSX as am having problems building glib.
>> I'm wondering whether anyone has a packaged up version of glib or better
>> a packaged up version of mono with LLVM enabled.
>>
>> I have heard only good things about LLVM performance, so hoping that this
>> will help address this gap.   Hopefully I am doing something wrong here
>> and the performance is much closer.   Test code below ...
>>
>> regards
>>
>> Jonathan
>> --
>> http://tr8dr.wordpress.com/
>>
>>
>> using System;
>>
>> namespace Performance
>> {
>>
>> public class ArrayTest
>> {
>>
>> public static double test1 (double[] vec)
>> {
>> double sum = 0;
>> for (int i = 8 ; i < vec.Length ; i++)
>> {
>> vec[i] = 2*vec[i] - vec[i-1];
>> for (int j = 1 ; j < 8 ; j++)
>> sum += 1.3 * vec[j-1];
>> }
>>
>> return sum;
>> }
>>
>> public static void Main (string[] argv)
>> {
>> int iterations = argv.Length > 0 ? int.Parse(argv[0]) : 1000;
>>
>> double[] vec = new double[100000];
>> for (int i = 0 ; i < vec.Length ; i++)
>> vec[i] = i;
>>
>> DateTime Tstart = DateTime.Now;
>> Console.WriteLine ("starting performance test on " + iterations + "
>> iterations");
>>
>> double sum = 0;
>> for (int i = 0 ; i < iterations ; i++)
>> sum += test1 (vec);
>>
>> DateTime Tend = DateTime.Now;
>> TimeSpan Tspan = Tend - Tstart;
>> Console.WriteLine ("ending performance test on " + iterations + "
>> iterations, time: " + Tspan.Seconds + ":" + Tspan.Milliseconds);
>>
>> Console.WriteLine ("result: " + sum);
>> }
>> }
>> }
>>
>>
>>
>>
>>
>> _______________________________________________
>> Mono-list maillist  -  [hidden email]
>> http://lists.ximian.com/mailman/listinfo/mono-list
>>
>>
>
>
> _______________________________________________
> Mono-list maillist  -  [hidden email]
> http://lists.ximian.com/mailman/listinfo/mono-list
>
>

--
View this message in context: http://old.nabble.com/mono-performance%2C-20x-differential-with-Java-%28what-am-i-doing-wrong%29-tp27366241p27372716.html
Sent from the Mono - General mailing list archive at Nabble.com.

_______________________________________________
Mono-list maillist  -  [hidden email]
http://lists.ximian.com/mailman/listinfo/mono-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: mono performance, 20x differential with Java (what am i doing wrong)

Jonathan Shore
Stifu,  I am not (or should not) be creating any garbage during the test (except the array to be later thrown away once).   The test is simply testing operations around reading and writing to any array.  

Now I know that Java and C# both use guards to ensure that array access is bounded.   In the case of java it is able to avoid the guards via range analysis.   With C# and unsafe, this should not even be a factor.  I am then perplexed as to what mono is doing.   What operations above array access and basic arithmetic are being done such that it is so much slower?

Is there a command line tool I could use to see what code was emitted?   Anyone have experience with LLVM with this sort of test?   (I've not been successful in compiling glib and therefore mono with that enabled)?


On Jan 29, 2010, at 9:28 AM, Stifu wrote:

>
> I could be wrong, but intensive operations like these may run faster with the
> upcoming new garbage collector (coming in Mono 2.8).
> I don't know if the new GC is currently stable enough for you to try it.
>
>
> Jonathan Shore wrote:
>>
>> Diego,  Thanks for your suggestions.   I adjusted to use an unsafe
>> declaration around test1(), but get the same performance results.    I am
>> wondering whether there is some optimisation mode I need to enable in the
>> mono VM.    Anyone have an idea?
>>
>> I did:
>>
>> mcs -optimize -unsafe *.cs
>> mono ArrayTest.exe 1000
>>
>> Result:
>> starting performance test on 1000 iterations
>> ending performance test on 1000 iterations, time: 16:919
>>
>> On Java VM this is < 1 second.
>>
>> Are there some flags I can use on the mono VM to speed this up?    Would
>> the LLVM version do significantly better?
>>
>> Jonathan
>>
>> On Jan 28, 2010, at 10:33 PM, Diego Frata wrote:
>>
>>> Hello Jonathan,
>>>
>>> I'm working on a computer that has a Intel Core 2 Duo CPU T5250 1.5 GHz
>>> (way slower than yours). I've tried the code below on .NET 4 Beta 2
>>> (shame on me, my other computer died some days ago and I didn't install
>>> Mono) and I got worst results than you at a very first moment.
>>>
>>> My first setup was the default one for VS2010. Release x86
>>>
>>> starting performance test on 1000 iterations
>>> ending performance test on 1000 iterations, time: 43:733
>>> result: 2729781599,99818
>>>
>>> Oops, I'm running a 64 bit OS, so I've compiled my application again
>>> targeting Release x64
>>>
>>> starting performance test on 1000 iterations
>>> ending performance test on 1000 iterations, time: 10:813
>>> result: 2729781599,99818
>>>
>>> That's a lot better, but I can speed up things a little bit introducing
>>> some unsafeness into the code:
>>>
>>>        public unsafe static double test1(double* vec, int size)
>>>        {
>>>            double sum = 0;
>>>            for (int i = 8; i < size; i++)
>>>            {
>>>                vec[i] = 2 * vec[i] - vec[i - 1];
>>>                for (int j = 1; j < 8; j++)
>>>                    sum += 1.3 * vec[j - 1];
>>>            }
>>>
>>>            return sum;
>>>        }
>>>
>>>        public static void Main(string[] argv)
>>>        {
>>>            int iterations = argv.Length > 0 ? int.Parse(argv[0]) : 1000;
>>>
>>>            unsafe
>>>            {
>>>                int size = 100000;
>>>                double* vec = stackalloc double[size];
>>>                for (int i = 0; i < size; i++)
>>>                    vec[i] = i;
>>>
>>>                DateTime Tstart = DateTime.Now;
>>>                Console.WriteLine("starting performance test on " +
>>> iterations + " iterations");
>>>
>>>                double sum = 0;
>>>
>>>                for (int i = 0; i < iterations; i++)
>>>                    sum += test1(vec, size);
>>>
>>>
>>>                DateTime Tend = DateTime.Now;
>>>                TimeSpan Tspan = Tend - Tstart;
>>>                Console.WriteLine("ending performance test on " +
>>> iterations + " iterations, time: " + Tspan.Seconds + ":" +
>>> Tspan.Milliseconds);
>>>
>>>                Console.WriteLine("result: " + sum);
>>>                Console.Read();
>>>            }
>>>        }
>>>
>>>
>>> starting performance test on 1000 iterations
>>> ending performance test on 1000 iterations, time: 5:571
>>> result: 2729781599,99818
>>>
>>> That's the best I could extract from a single threaded computation
>>> without changing your logic.
>>>
>>> Try take a look at these things, maybe Mono is presenting the same
>>> behavior as .NET.
>>>
>>>
>>> Sorry if all this was unhelpful and off-topic ;)
>>>
>>> Diego Frata
>>> [hidden email]
>>>
>>>
>>> On Fri, Jan 29, 2010 at 12:00 AM, Jonathan Shore
>>> <[hidden email]> wrote:
>>> Hi,
>>>
>>> I'm quite familiar with both the .NET and Java development environments,
>>> but only recently have begun to experiment with mono, so forgive me if
>>> I'm not clued-in.  
>>>
>>> I specialize in numerical work that often involves a lot large-scale
>>> array manipulation for linear algebra, timeseries, etc.    My main
>>> production platforms are OSX and Linux.   I've been doing most of my work
>>> on the JVM over the past few years, though spent a couple of years with
>>> .NET when it was pre-release / pre-1.0.  
>>>
>>> My main interest is in Ocaml, particularly the F# variant as the basis
>>> for my numerical work.
>>>
>>> One of the first things I do when considering a platform is run
>>> benchmarks, as performance is critical for what I do.    Starting with C#
>>> I wrote a test to gauge the array-access overhead associated with the
>>> platform.  Without knowing how to tweak the mono runtime to turn on any
>>> particular optimisations, the results were quite poor for this specific
>>> test (see code at the end of this posting).
>>>
>>>
>>> The test on my MacPro 2.6 Ghz / Snow Leopard with mono 2.6.1 gave the
>>> result of:
>>>
>>> 16 sec, 130 ms for 1000 iterations
>>>
>>> the same code, modified just for IO, etc on the Java VM (without -server)
>>> gave a runtime of:
>>>
>>> 0 sec, 831 ms
>>>
>>> changing the # of iterations to higher amounts did nothing to improve the
>>> ratio.   Java is 20x faster in this benchmark.
>>>
>>> I could not find any documentation concerning settings for the -optimize
>>> flag on the mono VM, so perhaps there is a setting I should be using.  
>>>
>>> Secondly, I saw the posting concerning the optional use of LLVM.  I have
>>> not been able to build mono on OSX as am having problems building glib.
>>> I'm wondering whether anyone has a packaged up version of glib or better
>>> a packaged up version of mono with LLVM enabled.
>>>
>>> I have heard only good things about LLVM performance, so hoping that this
>>> will help address this gap.   Hopefully I am doing something wrong here
>>> and the performance is much closer.   Test code below ...
>>>
>>> regards
>>>
>>> Jonathan
>>> --
>>> http://tr8dr.wordpress.com/
>>>
>>>
>>> using System;
>>>
>>> namespace Performance
>>> {
>>>
>>> public class ArrayTest
>>> {
>>>
>>> public static double test1 (double[] vec)
>>> {
>>> double sum = 0;
>>> for (int i = 8 ; i < vec.Length ; i++)
>>> {
>>> vec[i] = 2*vec[i] - vec[i-1];
>>> for (int j = 1 ; j < 8 ; j++)
>>> sum += 1.3 * vec[j-1];
>>> }
>>>
>>> return sum;
>>> }
>>>
>>> public static void Main (string[] argv)
>>> {
>>> int iterations = argv.Length > 0 ? int.Parse(argv[0]) : 1000;
>>>
>>> double[] vec = new double[100000];
>>> for (int i = 0 ; i < vec.Length ; i++)
>>> vec[i] = i;
>>>
>>> DateTime Tstart = DateTime.Now;
>>> Console.WriteLine ("starting performance test on " + iterations + "
>>> iterations");
>>>
>>> double sum = 0;
>>> for (int i = 0 ; i < iterations ; i++)
>>> sum += test1 (vec);
>>>
>>> DateTime Tend = DateTime.Now;
>>> TimeSpan Tspan = Tend - Tstart;
>>> Console.WriteLine ("ending performance test on " + iterations + "
>>> iterations, time: " + Tspan.Seconds + ":" + Tspan.Milliseconds);
>>>
>>> Console.WriteLine ("result: " + sum);
>>> }
>>> }
>>> }
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Mono-list maillist  -  [hidden email]
>>> http://lists.ximian.com/mailman/listinfo/mono-list
>>>
>>>
>>
>>
>> _______________________________________________
>> Mono-list maillist  -  [hidden email]
>> http://lists.ximian.com/mailman/listinfo/mono-list
>>
>>
>
> --
> View this message in context: http://old.nabble.com/mono-performance%2C-20x-differential-with-Java-%28what-am-i-doing-wrong%29-tp27366241p27372716.html
> Sent from the Mono - General mailing list archive at Nabble.com.
>
> _______________________________________________
> Mono-list maillist  -  [hidden email]
> http://lists.ximian.com/mailman/listinfo/mono-list

_______________________________________________
Mono-list maillist  -  [hidden email]
http://lists.ximian.com/mailman/listinfo/mono-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: mono performance, 20x differential with Java (what am i doing wrong)

Rolf Bjarne Kvinge
In reply to this post by Jonathan Shore

Hi,

Can you post the source code for the java sample? I have a hard time
understanding how the sample can run so fast with java, here I get the
following results:

C#: 3.346 seconds
C (compiled with gcc –O3, see attached source code): 1.132 seconds

Rolf


>Hi,
>
>I'm quite familiar with both the .NET and Java development environments,
but only recently have begun to experiment with mono, so forgive me if I'm
not clued-in.   
>
>I specialize in numerical work that often involves a lot large-scale array
manipulation for linear algebra, timeseries, etc.    My main production
platforms are OSX and Linux.   I've been doing most of my work on the JVM
over the >past few years, though spent a couple of years with .NET when it
was pre-release / pre-1.0.  
>
>My main interest is in Ocaml, particularly the F# variant as the basis for
my numerical work.
>
>One of the first things I do when considering a platform is run benchmarks,
as performance is critical for what I do.    Starting with C# I wrote a test
to gauge the array-access overhead associated with the platform.  Without
>knowing how to tweak the mono runtime to turn on any particular
optimisations, the results were quite poor for this specific test (see code
at the end of this posting).
>
>
>The test on my MacPro 2.6 Ghz / Snow Leopard with mono 2.6.1 gave the
result of:
>
> 16 sec, 130 ms for 1000 iterations
>
>the same code, modified just for IO, etc on the Java VM (without -server)
 gave a runtime of:
>
>  0 sec, 831 ms
>
>changing the # of iterations to higher amounts did nothing to improve the
ratio.   Java is 20x faster in this benchmark.
>
>I could not find any documentation concerning settings for the -optimize
flag on the mono VM, so perhaps there is a setting I should be using.   
>
>Secondly, I saw the posting concerning the optional use of LLVM.  I have
not been able to build mono on OSX as am having problems building glib.  I'm
wondering whether anyone has a packaged up version of glib or better a
packaged up >version of mono with LLVM enabled.
>
>I have heard only good things about LLVM performance, so hoping that this
will help address this gap.   Hopefully I am doing something wrong here and
the performance is much closer.   Test code below ...
>
>regards
>
>Jonathan
--
http://tr8dr.wordpress.com/


_______________________________________________
Mono-list maillist  -  [hidden email]
http://lists.ximian.com/mailman/listinfo/mono-list

a.c (752 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: mono performance, 20x differential with Java (what am i doing wrong)

Rodrigo Kumpera
In reply to this post by Jonathan Shore
The mono JIT by default doesn't perform arrays bounds check elimination as it's a pretty expensive computation.

You can try running it with --O=abcrem with enables it.

To look at the generated code of a given method use the MONO_VERBOSE_METHOD enviroment variable. Set it to the name of the method you want.


On Fri, Jan 29, 2010 at 12:00 AM, Jonathan Shore <[hidden email]> wrote:
Hi,

I'm quite familiar with both the .NET and Java development environments, but only recently have begun to experiment with mono, so forgive me if I'm not clued-in.   

I specialize in numerical work that often involves a lot large-scale array manipulation for linear algebra, timeseries, etc.    My main production platforms are OSX and Linux.   I've been doing most of my work on the JVM over the past few years, though spent a couple of years with .NET when it was pre-release / pre-1.0.  

My main interest is in Ocaml, particularly the F# variant as the basis for my numerical work.

One of the first things I do when considering a platform is run benchmarks, as performance is critical for what I do.    Starting with C# I wrote a test to gauge the array-access overhead associated with the platform.  Without knowing how to tweak the mono runtime to turn on any particular optimisations, the results were quite poor for this specific test (see code at the end of this posting).


The test on my MacPro 2.6 Ghz / Snow Leopard with mono 2.6.1 gave the result of:



_______________________________________________
Mono-list maillist  -  [hidden email]
http://lists.ximian.com/mailman/listinfo/mono-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: mono performance, 20x differential with Java (what am i doing wrong)

Jon Harrop
In reply to this post by Jonathan Shore
On Friday 29 January 2010 02:00:07 Jonathan Shore wrote:
> My main interest is in Ocaml, particularly the F# variant as the basis for
> my numerical work.

Note that F# uses ILX that Mono does not implement correctly, e.g. TCO. So F#
code is not yet reliable on Mono.

> One of the first things I do when considering a platform is run benchmarks,
> as performance is critical for what I do.

One of the first things I do when considering a platform is run tests, as
correctness is critical for what I do. Mono failed so I use .NET.

> I have heard only good things about LLVM performance, so hoping that this
> will help address this gap.

To really benefit from LLVM you need to design the VM properly from the ground
up. My HLVM project aims to do this:

  http://www.ffconsultancy.com/ocaml/hlvm/

I haven't benchmarked it against Mono but it is already thrashing Java on
numerical benchmarks:

http://flyingfrogblog.blogspot.com/2010/01/hlvm-on-ray-tracer-language-comparison.html

HLVM fully supports TCO and has an accurate GC.

> using System;
>
> namespace Performance
> {
>
> public class ArrayTest
> {
>
> public static double test1 (double[] vec)
> {
> double sum = 0;
> for (int i = 8 ; i < vec.Length ; i++)
> {
> vec[i] = 2*vec[i] - vec[i-1];

The above line is dead code. The JVM is probably eliminating it and .NET does
not. Removing this dead code by hand, I obtain the same result from .NET in
the same time that the JVM takes.

> for (int j = 1 ; j < 8 ; j++)
> sum += 1.3 * vec[j-1];
> }
>
> return sum;
> }

Porting solutions between languages can be bad science in the context of
benchmarks because it taints your results with the original language.

To build a useful benchmark you should set an irreducibly-complex problem to
solve and let people solve it freely in different languages using whatever
features and characteristics of the language or VM they choose.

For example, in the context of technical computing the JVM's lack of value
types is a crippling problem that afflicts everything from complex numbers
and low dimensional vectors to hash tables. I haven't tested it but you
should find that Mono's hash table implementation destroys the JVM's. Java's
generics are also crippled and it doesn't even support tail call
elimination...

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e
_______________________________________________
Mono-list maillist  -  [hidden email]
http://lists.ximian.com/mailman/listinfo/mono-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: mono performance, 20x differential with Java (what am i doing wrong)

Miguel de Icaza
In reply to this post by Jonathan Shore
Hello Jonathan,

    We were recently investigating some work on this area and there are
a couple of things that you can do to improve the execution speed for
computationally intensive code:

        * Compile Mono using the LLVM backend, I strongly recommend
          that you use Mono from SVN where a lot of new LLVM integration
          work has taken place.

        * Our Arrays-bounds-check elimination code is not as strong
          as it could be.    One thing that you can do, for tasks that
          will take days to run, and where you know that you will not
          get an out-of-range exception is to remove from the
          runtime arrays bounds checking.

    I recently did both of those, and it results in a 4x performance
improvement in SciMark and matches Java's performance.

    Mono needs more work in the area of arrays bounds checking
elimination as the above is not really a production solution (removing
the ABC checks from the runtime).   But it should be useful for those
that need this today.

    Get this patch:

        http://tirania.org/tmp/bounds-check-elim.diff

    Then build Mono with LLVM:

        http://www.mono-project.com/Mono_LLVM

    It will get you a 4x perf boost (the patch is for x86-64, if you are
in another platform you will need to remove the equivalent code).
Perhaps an interim solution is for us to add a -O=noboundscheck.


> Hi,
>
>
> I'm quite familiar with both the .NET and Java development
> environments, but only recently have begun to experiment with mono, so
> forgive me if I'm not clued-in.  
>
>
> I specialize in numerical work that often involves a lot large-scale
> array manipulation for linear algebra, timeseries, etc.    My main
> production platforms are OSX and Linux.   I've been doing most of my
> work on the JVM over the past few years, though spent a couple of
> years with .NET when it was pre-release / pre-1.0.  
>
>
> My main interest is in Ocaml, particularly the F# variant as the basis
> for my numerical work.
>
>
> One of the first things I do when considering a platform is run
> benchmarks, as performance is critical for what I do.    Starting with
> C# I wrote a test to gauge the array-access overhead associated with
> the platform.  Without knowing how to tweak the mono runtime to turn
> on any particular optimisations, the results were quite poor for this
> specific test (see code at the end of this posting).
>
>
>
>
> The test on my MacPro 2.6 Ghz / Snow Leopard with mono 2.6.1 gave the
> result of:
>
>
> 16 sec, 130 ms for 1000 iterations
>
>
> the same code, modified just for IO, etc on the Java VM (without
> -server)  gave a runtime of:
>
>
>  0 sec, 831 ms
>
>
> changing the # of iterations to higher amounts did nothing to improve
> the ratio.   Java is 20x faster in this benchmark.
>
>
> I could not find any documentation concerning settings for the
> -optimize flag on the mono VM, so perhaps there is a setting I should
> be using.  
>
>
> Secondly, I saw the posting concerning the optional use of LLVM.  I
> have not been able to build mono on OSX as am having problems building
> glib.  I'm wondering whether anyone has a packaged up version of glib
> or better a packaged up version of mono with LLVM enabled.
>
>
> I have heard only good things about LLVM performance, so hoping that
> this will help address this gap.   Hopefully I am doing something
> wrong here and the performance is much closer.   Test code below ...
>
>
> regards
>
>
> Jonathan
> --
> http://tr8dr.wordpress.com/
>
>
>
>
> using System;
>
>
> namespace Performance
> {
>
>
> public class ArrayTest
> {
>
> public static double test1 (double[] vec)
> {
> double sum = 0;
> for (int i = 8 ; i < vec.Length ; i++)
> {
> vec[i] = 2*vec[i] - vec[i-1];
> for (int j = 1 ; j < 8 ; j++)
> sum += 1.3 * vec[j-1];
> }
>
> return sum;
> }
>
>
> public static void Main (string[] argv)
> {
> int iterations = argv.Length > 0 ? int.Parse(argv[0]) : 1000;
>
> double[] vec = new double[100000];
> for (int i = 0 ; i < vec.Length ; i++)
> vec[i] = i;
>
> DateTime Tstart = DateTime.Now;
> Console.WriteLine ("starting performance test on " + iterations + "
> iterations");
>
> double sum = 0;
> for (int i = 0 ; i < iterations ; i++)
> sum += test1 (vec);
>
> DateTime Tend = DateTime.Now;
> TimeSpan Tspan = Tend - Tstart;
> Console.WriteLine ("ending performance test on " + iterations + "
> iterations, time: " + Tspan.Seconds + ":" + Tspan.Milliseconds);
>
>
> Console.WriteLine ("result: " + sum);
> }
> }
> }
>
>
>
>
>
>
>
>
> _______________________________________________
> Mono-list maillist  -  [hidden email]
> http://lists.ximian.com/mailman/listinfo/mono-list


_______________________________________________
Mono-list maillist  -  [hidden email]
http://lists.ximian.com/mailman/listinfo/mono-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: mono performance, 20x differential with Java (what am i doing wrong)

Jonathan Shore
In reply to this post by Jonathan Shore
Alan,  fair enough, I'll give that a shot.  It would be great to see support for SSE folded into the core VM.   Or is the idea to later (or currently) utilize GPUs as well?

On Jan 29, 2010, at 10:39 AM, Alan McGovern wrote:

Also, while I think of it. If you're doing a lot of repetitive calculations on arrays, you should look into using Mono.SIMD. The most recent example of perf benefits can be found here: http://blog.reblochon.org/2010/01/talk-teaser-image-processing-with.html .  A C# SIMD-optimised version of an image processing function turned out to be ~6.5x faster than the native C version. I'd assume that if the native C version was SIMD optimised aswell it'd be on par/faster than the mono implementation, but it does show the power of Mono.SIMD.

Alan.



_______________________________________________
Mono-list maillist  -  [hidden email]
http://lists.ximian.com/mailman/listinfo/mono-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: mono performance, 20x differential with Java (what am i doing wrong)

Alan McGovern
If you mean you want the SIMD extensions to be part of the .NET 5.0+ specification, then that may be an option in the future should someone decide that it's worth all the paperwork and time involved. As it stands, Mono is the only VM (that I know of) that allows you access the various SIMD instruction sets available on modern CPUs.

While GPU accellerated code may be possible in the future, that would have to built on top of whatever API the GPU exposes (CUDA or whatever). That would be completely separate to Mono.SIMD as this just exposes additional CPU instruction sets to .NET languages.

Alan.


On Fri, Jan 29, 2010 at 8:56 PM, Jonathan Shore <[hidden email]> wrote:
Alan,  fair enough, I'll give that a shot.  It would be great to see support for SSE folded into the core VM.   Or is the idea to later (or currently) utilize GPUs as well?

On Jan 29, 2010, at 10:39 AM, Alan McGovern wrote:

Also, while I think of it. If you're doing a lot of repetitive calculations on arrays, you should look into using Mono.SIMD. The most recent example of perf benefits can be found here: http://blog.reblochon.org/2010/01/talk-teaser-image-processing-with.html .  A C# SIMD-optimised version of an image processing function turned out to be ~6.5x faster than the native C version. I'd assume that if the native C version was SIMD optimised aswell it'd be on par/faster than the mono implementation, but it does show the power of Mono.SIMD.

Alan.




_______________________________________________
Mono-list maillist  -  [hidden email]
http://lists.ximian.com/mailman/listinfo/mono-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: mono performance, 20x differential with Java (what am i doing wrong)

Jonathan Shore
In reply to this post by Jon Harrop

On Jan 29, 2010, at 2:32 PM, Jon Harrop wrote:

> On Friday 29 January 2010 02:00:07 Jonathan Shore wrote:
>> My main interest is in Ocaml, particularly the F# variant as the basis for
>> my numerical work.
>
> Note that F# uses ILX that Mono does not implement correctly, e.g. TCO. So F#
> code is not yet reliable on Mono.
>

Jon, I saw your post about that on your blog some time ago.   Someone familiar with Mono claimed otherwise, was therefore uncertain as to whether was addressed or not.    I can live some some inefficiency in tail calls provided one does not get stack overflow or some other fatal issue.


>
>> I have heard only good things about LLVM performance, so hoping that this
>> will help address this gap.
>
> To really benefit from LLVM you need to design the VM properly from the ground
> up. My HLVM project aims to do this:
>
>  http://www.ffconsultancy.com/ocaml/hlvm/
>

I've seen your posts on this and is very impressive.    To be honest I would get more value out of a Ocaml variant wedded to the .NET platform.   There is just so much momentum and available libraries on the two major VMs (CLR and JVM), that would be a huge risk for me at the moment.    I also have a significant body of imperative VM-bound code that I need to get access to.    If HLVM could interact with java bytecode or .NET bytecode, would work for me.

> I haven't benchmarked it against Mono but it is already thrashing Java on
> numerical benchmarks:
>
> http://flyingfrogblog.blogspot.com/2010/01/hlvm-on-ray-tracer-language-comparison.html
>

Again, very impressive stuff.   Do you see bridging between the .NET world and your VM in the future?    For instance the IKVM project that maps Java bytecode to .NET built up a joint project with the mono team to provide the ability to run Java byetcode in mono.    A similar concept could be done in this setting.


> The above line is dead code. The JVM is probably eliminating it and .NET does
> not. Removing this dead code by hand, I obtain the same result from .NET in
> the same time that the JVM takes.

Yes, thanks realize this.  The benchmark is flawed.    I had meant to place i - j in the inner loop.       I'm going to test this again with the correction.  Will post results.

>
>> for (int j = 1 ; j < 8 ; j++)
>> sum += 1.3 * vec[j-1];
>> }
>>
>> return sum;
>> }
>
> To build a useful benchmark you should set an irreducibly-complex problem to
> solve and let people solve it freely in different languages using whatever
> features and characteristics of the language or VM they choose.
>

I've seen the language shootouts, but they appear flawed to me.   The length of the tests is too short to allow the Java or Mono VMs to reap benefits from optimisation.    The main problem I have found in VM environments (aside from the GC, boxing / unboxing) is array access.    One can work around issues of GC, boxing, etc, but cannot work around array issues if they exist.

Jonathan


_______________________________________________
Mono-list maillist  -  [hidden email]
http://lists.ximian.com/mailman/listinfo/mono-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: mono performance, 20x differential with Java (what am i doing wrong)

Jon Harrop
In reply to this post by Alan McGovern
On Friday 29 January 2010 21:23:42 Alan McGovern wrote:
> As it stands,
> Mono is the only VM (that I know of) that allows you access the various
> SIMD instruction sets available on modern CPUs.

LLVM?

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e
_______________________________________________
Mono-list maillist  -  [hidden email]
http://lists.ximian.com/mailman/listinfo/mono-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: mono performance, 20x differential with Java (what am i doing wrong)

Jonathan Shore
In reply to this post by Jonathan Shore
Diego,  I was not able to reproduce your results.   Maybe there is something about running mono in 32 bit mode on a 64 bit OS.  Don't know.    I am still struggling to get mono built on OSX.   With that done, should be able to see how 64bit / LLVM works.  Thanks

On Jan 29, 2010, at 11:08 AM, Diego Frata wrote:

Hi Jonathan,

I've runned the tests again just for fun, with the original code you've submitted, on a Windows XP 32bits with Mono, .NET and C++ (no Java here). The processor is an AMD Athlon Dual Core 4450B 2.30GHz.  Strangely, Mono lags a bit behind (but not 20x.).

-- NET 2.0
starting performance test on 1000 iterations
ending performance test on 1000 iterations, time: 1:469
result: 2729781599,99818

-- Mono 2.6.1
starting performance test on 1000 iterations
ending performance test on 1000 iterations, time: 3:907
result: 2729781599,99818

-- VC++ 10
starting performance test on 100000 iterations
ending performance test on 100000 iterations, time: 1:401 seconds
result: 272978160000.033600





_______________________________________________
Mono-list maillist  -  [hidden email]
http://lists.ximian.com/mailman/listinfo/mono-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: mono performance, 20x differential with Java (what am i doing wrong)

Jonathan Shore
In reply to this post by Miguel de Icaza

On Jan 29, 2010, at 1:43 PM, Miguel de Icaza wrote:

>
> * Our Arrays-bounds-check elimination code is not as strong
>  as it could be.    One thing that you can do, for tasks that
>  will take days to run, and where you know that you will not
>  get an out-of-range exception is to remove from the
>  runtime arrays bounds checking.
>

Thanks.   I can get by with "unsafe" for C# code.   That said, I am unsure if I will have any opportunities to do this in the context of F#.    Is it possible to turn off array bounds checking from VM args for my long running stuff?

>    I recently did both of those, and it results in a 4x performance
> improvement in SciMark and matches Java's performance.

nice!    Sounds very promising.    Would you plan to fold this into the distributed VM?

>

_______________________________________________
Mono-list maillist  -  [hidden email]
http://lists.ximian.com/mailman/listinfo/mono-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: mono performance, 20x differential with Java (what am i doing wrong)

Jon Harrop
In reply to this post by Jonathan Shore
On Friday 29 January 2010 21:28:58 Jonathan Shore wrote:

> On Jan 29, 2010, at 2:32 PM, Jon Harrop wrote:
> > On Friday 29 January 2010 02:00:07 Jonathan Shore wrote:
> >> My main interest is in Ocaml, particularly the F# variant as the basis
> >> for my numerical work.
> >
> > Note that F# uses ILX that Mono does not implement correctly, e.g. TCO.
> > So F# code is not yet reliable on Mono.
>
> Jon, I saw your post about that on your blog some time ago.   Someone
> familiar with Mono claimed otherwise, was therefore uncertain as to whether
> was addressed or not.

You should be able to verify my results easily: just run the 8-line example F#
program I gave and Mono will stack overflow.

> I can live some some inefficiency in tail calls provided one does not get
> stack overflow or some other fatal issue.

TCO is broken on Mono, not merely inefficient.

> >> I have heard only good things about LLVM performance, so hoping that
> >> this will help address this gap.
> >
> > To really benefit from LLVM you need to design the VM properly from the
> > ground up. My HLVM project aims to do this:
> >
> >  http://www.ffconsultancy.com/ocaml/hlvm/
>
> I've seen your posts on this and is very impressive.

Thanks.

> To be honest I
> would get more value out of a Ocaml variant wedded to the .NET platform.

Yes. F# is awesome but only on Windows/.NET and not on Mono.

> There is just so much momentum and available libraries on the two major VMs
> (CLR and JVM), that would be a huge risk for me at the moment.

I was actually disappointed with .NET's libraries in the context of technical
computing. I felt OCaml had better libraries and it turns out that .NET was
about as popular for technical computing as OCaml was when I started. The
main exception is WPF but you don't get that with Mono.

> I also
> have a significant body of imperative VM-bound code that I need to get
> access to.    If HLVM could interact with java bytecode or .NET bytecode,
> would work for me.

You should be able to compile plain numerical code from JVM/CIL to HLVM easily
enough, particularly when HLVM is more complete.

> > I haven't benchmarked it against Mono but it is already thrashing Java on
> > numerical benchmarks:
> >
> > http://flyingfrogblog.blogspot.com/2010/01/hlvm-on-ray-tracer-language-co
> >mparison.html
>
> Again, very impressive stuff.   Do you see bridging between the .NET world
> and your VM in the future?

No.

> For instance the IKVM project that maps Java
> bytecode to .NET built up a joint project with the mono team to provide the
> ability to run Java byetcode in mono.    A similar concept could be done in
> this setting.

That doesn't really interest me. F# is so far ahead now that everything else
is a toy in comparison from my point of view. HLVM is just a hobby project
designed to bring some of the benefits of F# to the open source world for fun
but it is a massive undertaking because the open source world doesn't even
have any reliable foundations like .NET, let alone decent libraries like WPF
built upon them. So I have to build everything from scratch myself. I'm not
even sure I will be able to use hardware acceleration due to the poor state
of OpenGL drivers on Linux.

> > To build a useful benchmark you should set an irreducibly-complex problem
> > to solve and let people solve it freely in different languages using
> > whatever features and characteristics of the language or VM they choose.
>
> I've seen the language shootouts, but they appear flawed to me.

I've found k-nucleotide, spectralnorm and regex-dna interesting but you have
to really study the code and be *very* careful when drawing conclusions.

For example, the Haskell implementation of k-nucleotide is only 20% slower
than OCaml but the Haskell code is a joke: where the OCaml uses its stdlib's
hash table, the Haskell had to do memory allocation manually using "malloc"
directly in order to work around serious design flaws in their garbage
collector. Not only is this hidden from the casual shootout reader but they
even comment their code with total nonsense like "Hash tables are not
generally used in functional languages" when what they really mean is "When
hash tables are a good solution, Haskell will suck uniquely even among
functional languages".

> The
> length of the tests is too short to allow the Java or Mono VMs to reap
> benefits from optimisation.

Yes. You can crank up the resolution on my ray tracer benchmark to make it
take as long as you like but I found it made little difference above the
value I used.

> The main problem I have found in VM
> environments (aside from the GC, boxing / unboxing) is array access.    One
> can work around issues of GC, boxing, etc, but cannot work around array
> issues if they exist.

You cannot work around boxing on the JVM because it lacks value types. Indeed,
that is a major advantage of .NET on the JVM that Mono should inherit.

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e
_______________________________________________
Mono-list maillist  -  [hidden email]
http://lists.ximian.com/mailman/listinfo/mono-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: mono performance, 20x differential with Java (what am i doing wrong)

Miguel de Icaza
In reply to this post by Jonathan Shore

> Thanks.   I can get by with "unsafe" for C# code.   That said, I am unsure if I will have any opportunities to do this in the context of F#.    Is it possible to turn off array bounds checking from VM args for my long running stuff?

The patch I included removes all bounds checking from the runtime,
regardless of the language.

>
> >    I recently did both of those, and it results in a 4x performance
> > improvement in SciMark and matches Java's performance.
>
> nice!    Sounds very promising.    Would you plan to fold this into the distributed VM?


_______________________________________________
Mono-list maillist  -  [hidden email]
http://lists.ximian.com/mailman/listinfo/mono-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: mono performance, 20x differential with Java (what am i doing wrong)

Jonathan Shore
In reply to this post by Jon Harrop

On Jan 29, 2010, at 7:19 PM, Jon Harrop wrote:

Jon, I saw your post about that on your blog some time ago.   Someone
familiar with Mono claimed otherwise, was therefore uncertain as to whether
was addressed or not.

You should be able to verify my results easily: just run the 8-line example F#
program I gave and Mono will stack overflow.

I can live some some inefficiency in tail calls provided one does not get
stack overflow or some other fatal issue.

TCO is broken on Mono, not merely inefficient.


As I have no familiarity with the Mono VM code, no idea what it would take to fix this.   Do any of the mono developers have a view on this?    I suppose we could lift Jon's code and put it into the bug tracking system ...


To be honest I
would get more value out of a Ocaml variant wedded to the .NET platform.

Yes. F# is awesome but only on Windows/.NET and not on Mono.


Hmm, very problematic for me ...


There is just so much momentum and available libraries on the two major VMs
(CLR and JVM), that would be a huge risk for me at the moment.

I was actually disappointed with .NET's libraries in the context of technical
computing. I felt OCaml had better libraries and it turns out that .NET was
about as popular for technical computing as OCaml was when I started. The
main exception is WPF but you don't get that with Mono.


I guess it depends where you come from.  First I'll have to be honest and say that I am new to Ocaml.    My FP background is Scheme and some dabbling in Haskell.    I had heard from real-world users of Ocaml (such as the Jane Street capital guys), that the depth of libraries for Ocaml is pretty shallow.    They've invested some years into building that up, but is private work largely.

Now if we are talking about numerical stuff, then yes, there is not much publicly available on either the CLR or JVM.    I was more referring to the tech libraries rather than scientific.    


I also
have a significant body of imperative VM-bound code that I need to get
access to.    If HLVM could interact with java bytecode or .NET bytecode,
would work for me.

You should be able to compile plain numerical code from JVM/CIL to HLVM easily
enough, particularly when HLVM is more complete.


I'll look forward to seeing that.   Are you implying that I would be able to take a bunch of java classes and make them available?    I guess it depends on what you mean by "plain numerical" code.

Will or does HLVM support the F# dialect of Ocaml as well?



That doesn't really interest me. F# is so far ahead now that everything else
is a toy in comparison from my point of view. HLVM is just a hobby project
designed to bring some of the benefits of F# to the open source world for fun
but it is a massive undertaking because the open source world doesn't even
have any reliable foundations like .NET, let alone decent libraries like WPF
built upon them. So I have to build everything from scratch myself. I'm not
even sure I will be able to use hardware acceleration due to the poor state
of OpenGL drivers on Linux.

Fair enough.   I recognize that you have accomplished quite a bit with the performance of your design.   However, as you allude to, it is quite another thing to enrich it to the point of being a broad-use platform.   For that you need a group of dedicated developers and the momentum to foster that community.  

The MS CLR and Mono may never have the specializations that you have done, for instance, make boxing / unboxing a non-issue (or at least a lot cheaper).    However, they have momentum and breadth.    Getting the best of both would be super, but I understand ... 



For example, the Haskell implementation of k-nucleotide is only 20% slower
than OCaml but the Haskell code is a joke: where the OCaml uses its stdlib's
hash table, the Haskell had to do memory allocation manually using "malloc"
directly in order to work around serious design flaws in their garbage
collector. Not only is this hidden from the casual shootout reader but they
even comment their code with total nonsense like "Hash tables are not
generally used in functional languages" when what they really mean is "When
hash tables are a good solution, Haskell will suck uniquely even among
functional languages".

I agree that benchmarks have to be studied carefully to see what is real and what is not.  In the end it is your own application that is the final measure.



You cannot work around boxing on the JVM because it lacks value types. Indeed,
that is a major advantage of .NET on the JVM that Mono should inherit.


I'm totally with you on .NET over the JVM.    Sun sat on the JVM and Java design for many years.    Catchup now is too late.





_______________________________________________
Mono-list maillist  -  [hidden email]
http://lists.ximian.com/mailman/listinfo/mono-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: mono performance, 20x differential with Java (what am i doing wrong)

Jon Harrop
On Saturday 30 January 2010 00:20:11 Jonathan Shore wrote:

> On Jan 29, 2010, at 7:19 PM, Jon Harrop wrote:
> >> Jon, I saw your post about that on your blog some time ago.   Someone
> >> familiar with Mono claimed otherwise, was therefore uncertain as to
> >> whether was addressed or not.
> >
> > You should be able to verify my results easily: just run the 8-line
> > example F# program I gave and Mono will stack overflow.
> >
> >> I can live some some inefficiency in tail calls provided one does not
> >> get stack overflow or some other fatal issue.
> >
> > TCO is broken on Mono, not merely inefficient.
>
> As I have no familiarity with the Mono VM code, no idea what it would take
> to fix this.

There are many different solutions. The simplest would be to use LLVM's fast
calling convention and tail calls as HLVM does.

> >> To be honest I
> >> would get more value out of a Ocaml variant wedded to the .NET platform.
> >
> > Yes. F# is awesome but only on Windows/.NET and not on Mono.
>
> Hmm, very problematic for me ...

That's why Microsoft did it. ;-)

> >> There is just so much momentum and available libraries on the two major
> >> VMs (CLR and JVM), that would be a huge risk for me at the moment.
> >
> > I was actually disappointed with .NET's libraries in the context of
> > technical computing. I felt OCaml had better libraries and it turns out
> > that .NET was about as popular for technical computing as OCaml was when
> > I started. The main exception is WPF but you don't get that with Mono.
>
> I guess it depends where you come from.  First I'll have to be honest and
> say that I am new to Ocaml.    My FP background is Scheme and some dabbling
> in Haskell.    I had heard from real-world users of Ocaml (such as the Jane
> Street capital guys), that the depth of libraries for Ocaml is pretty
> shallow.    They've invested some years into building that up, but is
> private work largely.
>
> Now if we are talking about numerical stuff, then yes, there is not much
> publicly available on either the CLR or JVM.    I was more referring to the
> tech libraries rather than scientific.

Yes. Technical libraries (e.g. graphing) are far more advanced on .NET. I was
referring only to numerical libraries like BLAS, LAPACK, FFTW and GSL.

> >> I also
> >> have a significant body of imperative VM-bound code that I need to get
> >> access to.    If HLVM could interact with java bytecode or .NET
> >> bytecode, would work for me.
> >
> > You should be able to compile plain numerical code from JVM/CIL to HLVM
> > easily enough, particularly when HLVM is more complete.
>
> I'll look forward to seeing that.   Are you implying that I would be able
> to take a bunch of java classes and make them available?    I guess it
> depends on what you mean by "plain numerical" code.

I mean code like your test1 function. That has an obvious direct translation
into HLVM code.

> Will or does HLVM support the F# dialect of Ocaml as well?

HLVM is designed to be a language agnostic VM so it could support either in
theory. In practice, I will probably create a new language and any others
will be ports done by other people. Currently, both OCaml and F# box tuples
which would be a disaster on HLVM because my GC is not optimized for
short-lived values. Objectively, F# should not box tuples either. In fact, if
Mono implemented TCO and structs correctly and its own F# then it could unbox
tuples and would see huge performance improvements as a consequence.

> > That doesn't really interest me. F# is so far ahead now that everything
> > else is a toy in comparison from my point of view. HLVM is just a hobby
> > project designed to bring some of the benefits of F# to the open source
> > world for fun but it is a massive undertaking because the open source
> > world doesn't even have any reliable foundations like .NET, let alone
> > decent libraries like WPF built upon them. So I have to build everything
> > from scratch myself. I'm not even sure I will be able to use hardware
> > acceleration due to the poor state of OpenGL drivers on Linux.
>
> Fair enough.   I recognize that you have accomplished quite a bit with the
> performance of your design.   However, as you allude to, it is quite
> another thing to enrich it to the point of being a broad-use platform.  
> For that you need a group of dedicated developers and the momentum to
> foster that community.

Yes. I don't think that will be a problem. So many people love OCaml but want
decent multicore support that they would leap on HLVM if only it had a decent
front end and a couple more features. Those are easy enough to implement, it
is just a question of me finding the time. :-)

> The MS CLR and Mono may never have the specializations that you have done,
> for instance, make boxing / unboxing a non-issue (or at least a lot
> cheaper).    However, they have momentum and breadth.    Getting the best
> of both would be super, but I understand ...

.NET has momentum and breadth that I can never hope to attain but Mono's level
of adoption seems entirely achievable to me.

> > You cannot work around boxing on the JVM because it lacks value types.
> > Indeed, that is a major advantage of .NET on the JVM that Mono should
> > inherit.
>
> I'm totally with you on .NET over the JVM.    Sun sat on the JVM and Java
> design for many years.    Catchup now is too late.

Yep. They've left a huge gap in the market for Mono though. :-)

Just to clarify my point, if you benchmark these Java and C# programs that put
10M floats into a hash table:

  import java.util.HashMap;
 
  public class Hashtbl {
    public static void main(String args[]){
      int n = 10000000;
      HashMap hashtable = new HashMap(n);
 
      for(int i=1; i<=n; ++i) {
        double x = i;
        hashtable.put(x, 1.0 / x);
      }
 
      System.out.println("hashtable(100.0) = " + hashtable.get(100.0));
    }
  }
 
  using System.Collections.Generic;
 
  public class Hashtbl {
    public static void Main(){
      int n = 10000000;
      Dictionary<double, double> hashtable = new Dictionary<double,
double>(n);
 
      for(int i=1; i<=n; ++i) {
        double x = i;
        hashtable[x] = 1.0 / x;
      }
 
      System.Console.WriteLine("hashtable(100.0) = " + hashtable[100.0]);
    }
  }

You'll find that Mono is 24x faster than Java in real time and 94x faster in
terms of CPU time:

  $ java -version
  java version "1.6.0_17"
  Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
  Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode)
  $ time java Hashtbl
  hashtable(100.0) = 0.01
 
  real    0m37.379s
  user    2m7.404s
  sys     0m2.788s
 
  $ mono --version
  Mono JIT compiler version 2.6 (tarball Fri Dec 18 02:02:28 GMT 2009)
  Copyright (C) 2002-2008 Novell, Inc and Contributors. www.mono-project.com
          TLS:           __thread
          GC:            Included Boehm (with typed GC and Parallel Mark)
          SIGSEGV:       altstack
          Notifications: epoll
          Architecture:  x86
          Disabled:      none
  $ time ./Hashtbl.exe
  hashtable(100.0) = 0.01
 
  real    0m1.555s
  user    0m1.360s
  sys     0m0.184s

Coupled with the fact that Java's FFI is disasterously slow as well and you've
got a ticking time bomb of crippling design flaws in the JVM that you will
not be able to escape from.

The moral: don't let Guy Steele drag you halfway to Lisp if you want
performance that doesn't suck. ;-)

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e
_______________________________________________
Mono-list maillist  -  [hidden email]
http://lists.ximian.com/mailman/listinfo/mono-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: mono performance, 20x differential with Java (what am i doing wrong)

Rodrigo Kumpera


On Sat, Jan 30, 2010 at 12:54 AM, Jon Harrop <[hidden email]> wrote:
On Saturday 30 January 2010 00:20:11 Jonathan Shore wrote:
> On Jan 29, 2010, at 7:19 PM, Jon Harrop wrote:
> >> Jon, I saw your post about that on your blog some time ago.   Someone
> >> familiar with Mono claimed otherwise, was therefore uncertain as to
> >> whether was addressed or not.
> >
> > You should be able to verify my results easily: just run the 8-line
> > example F# program I gave and Mono will stack overflow.
> >
> >> I can live some some inefficiency in tail calls provided one does not
> >> get stack overflow or some other fatal issue.
> >
> > TCO is broken on Mono, not merely inefficient.
>
> As I have no familiarity with the Mono VM code, no idea what it would take
> to fix this.

There are many different solutions. The simplest would be to use LLVM's fast
calling convention and tail calls as HLVM does.

Implementing proper TCO in mono requires 2 things. First is to change the managed calling convention to
have the caller pop its arguments. Then to lift any other few minor restrictions. Well probably have similar
restrictions as MS's, such as no TCO on synchronized method.

 
> >> There is just so much momentum and available libraries on the two major
> >> VMs (CLR and JVM), that would be a huge risk for me at the moment.
> >
> > I was actually disappointed with .NET's libraries in the context of
> > technical computing. I felt OCaml had better libraries and it turns out
> > that .NET was about as popular for technical computing as OCaml was when
> > I started. The main exception is WPF but you don't get that with Mono.
>
> I guess it depends where you come from.  First I'll have to be honest and
> say that I am new to Ocaml.    My FP background is Scheme and some dabbling
> in Haskell.    I had heard from real-world users of Ocaml (such as the Jane
> Street capital guys), that the depth of libraries for Ocaml is pretty
> shallow.    They've invested some years into building that up, but is
> private work largely.
>
> Now if we are talking about numerical stuff, then yes, there is not much
> publicly available on either the CLR or JVM.    I was more referring to the
> tech libraries rather than scientific.

Yes. Technical libraries (e.g. graphing) are far more advanced on .NET. I was
referring only to numerical libraries like BLAS, LAPACK, FFTW and GSL.

There is a not very usually explored bit of hidden performance in .NET which is
runtime code generation. It is trivial to produce code that does runtime algorithm
specialization. You can even do most of it at high level in C# using expression trees.



> Will or does HLVM support the F# dialect of Ocaml as well?

HLVM is designed to be a language agnostic VM so it could support either in
theory. In practice, I will probably create a new language and any others
will be ports done by other people. Currently, both OCaml and F# box tuples
which would be a disaster on HLVM because my GC is not optimized for
short-lived values. Objectively, F# should not box tuples either. In fact, if
Mono implemented TCO and structs correctly and its own F# then it could unbox
tuples and would see huge performance improvements as a consequence.

What's wrong with the way mono implement structs?



By the way, guys, the mono community is very welcoming to both external contribution
and feedback. So if you guys make a compelling case for TCO, maybe Miguel can get
someone to have it fixed.

Cheers,
Rodrigo


_______________________________________________
Mono-list maillist  -  [hidden email]
http://lists.ximian.com/mailman/listinfo/mono-list
12
Loading...