Hi I would like to know if you could point me to a document specifying the behavior of haskel with integer overflow. I mean when you have two integers, what happens when you multiply them? Is overflow checked?
jacob navia <ja...@nospam.org> writes: > I would like to know if you could point me to a document specifying > the behavior of haskel with integer overflow. I mean when you have two > integers, what happens when you multiply them? Is overflow checked?
Haskell has an Integer type, which is arbitrary precision, and an Int type, which is a machine int that can overflow with silent wraparound.
Paul Rubin <http://phr...@nospam.invalid> wrote: > jacob navia <ja...@nospam.org> writes: >> I would like to know if you could point me to a document specifying >> the behavior of haskel with integer overflow.
The usual document to look for in such a case is the Haskell98 Report.
>> I mean when you have two >> integers, what happens when you multiply them?
You can't multiply them directly, you first have to convert one of them to other type. Usually, you'd convert Int to Integer, and then multiply two Integers, which cannot overflow (though you can run out of memory).
>> Is overflow checked? > Haskell has an Integer type, which is arbitrary precision, and an Int > type, which is a machine int that can overflow with silent wraparound.
Also note that the precision of an Int isn't necessarily a multiple of eight bits. The Haskell98 Report gives a lower bound:
The finite-precision integer type Int covers at least the range [-2^29 , 2^29 - 1].
Dirk Thierbach <dthierb...@usenet.arcornews.de> writes: > You can't multiply them directly, you first have to convert one of them > to other type. Usually, you'd convert Int to Integer, and then multiply > two Integers, which cannot overflow (though you can run out of memory).
You can multiply Int by Int:
Prelude> let a = 7777777777 :: Int Prelude> a 7777777777 Prelude> a*a 5153594927266406881 Prelude> a*a*a -6296163071503143855
Compare results with:
Prelude> let b = 7777777777 :: Integer Prelude> b*b 60493827148395061729 Prelude> b*b*b 470507544440466392332359396433
Int is evil. I didn't even notice at first that the a*a result is wrong.
Mark T. B. Carroll <Mark.Carr...@aetion.com> wrote:
> Paul Rubin <http://phr...@NOSPAM.invalid> writes: >> Int is evil. I didn't even notice at first that the a*a result is >> wrong. > abs minBound :: Int was a bit of a surprise for me. (-:
Well, if one knows that two's complement is assymetric by design ... :-)
And it's a good idea to be *very* careful when handling extreme values in a given type, anyway. Which for me normally means converting them to a larger type at first opportunity. Unless I'm really sure all comparisons etc. work out, and the compiler is not going to introduce errors by "optimizing" the code. (No matter if C or Haskell).
> jacob navia <ja...@nospam.org> writes: >> I would like to know if you could point me to a document specifying >> the behavior of haskel with integer overflow. I mean when you have two >> integers, what happens when you multiply them? Is overflow checked?
> Haskell has an Integer type, which is arbitrary precision, and an Int > type, which is a machine int that can overflow with silent wraparound.
I asked because I have introduced an overflow checking option in my C compiler system lcc-win.
What is interesting is that the responsability of handling overflows apparently is the business of the programmer in Haskell, the same as in C.
The problem with that approach (hence my modification to my compiler) is that in most cases you do not EXPECT the overflow so you will NOT take the measures that *could* be taken because when writing code you do not think about all possible stuff.
In that sense, haskell is no different as C is now. I wanted to know what is the situation in "higher level" languages like Haskell to gather ideas. The answer is surprising.
jacob navia <ja...@nospam.org> wrote: > >> I would like to know if you could point me to a document specifying > >> the behavior of haskel with integer overflow. I mean when you have > >> two integers, what happens when you multiply them? Is overflow > >> checked?
> > Haskell has an Integer type, which is arbitrary precision, and an > > Int type, which is a machine int that can overflow with silent > > wraparound.
> I asked because I have introduced an overflow checking option in my C > compiler system lcc-win.
> What is interesting is that the responsability of handling overflows > apparently is the business of the programmer in Haskell, the same as > in C.
> The problem with that approach (hence my modification to my compiler) > is that in most cases you do not EXPECT the overflow so you will NOT > take the measures that *could* be taken because when writing code you > do not think about all possible stuff.
> In that sense, haskell is no different as C is now. I wanted to know > what is the situation in "higher level" languages like Haskell to > gather ideas. The answer is surprising.
If overflow should be an error condition, then it's simply wrong to use the Int type. Use a type, which offers the precision you need. If you are not sure, use Integer. If you _want_ to limit the precision of your caluclations for some reason, then your code needs to be aware of it anyway, so throwing exceptions or aborting the program is probably not the right way to go. Instead make a type, which implements the logic you need and turn it into a Num instance.
Ertugrul Söylemez <e...@ertes.de> writes: > If overflow should be an error condition, then it's simply wrong to use > the Int type. Use a type, which offers the precision you need. If you > are not sure, use Integer.
I find it amusing when novice programmers believe their main job is preventing programs from crashing... More experienced programmers realize that correct code is great, code that crashes could use improvement, but incorrect code that doesn't crash is a horrible nightmare.
There is a separate set of types (Data.Word) for machine words that have computer-like wraparound behavior. Int is supposed to represent integers but in a limited and optimized form. Integer has rather high overhead for the sake of supporting numbers so large that they don't occur in most programs. Int is what we'd normally use when the numbers are expected to be in range. In that situation, if the number is unexpectedly out of range, that is a genuine exception condition and throwing an exception is exactly the right thing to do. The alternative is nasty failures like the ones afflicting old C programs written with the entirely reasonable (at the time) expectation that file sizes would fit in 32-bit ints. It wasn't that long ago that we started actually seeing > 4GB files. It's better for those programs to throw exceptions than fail in weird and possibly exploitable ways.
Floating point arithmetic is supposed to be a limited, optimized representation of real numbers. When a float is out of range, there is an arithmetic exception. Ints should be treated similarly. The current mechanism is just evil.
> If you _want_ to limit the precision of your caluclations for some > reason, then your code needs to be aware of it anyway, so throwing > exceptions or aborting the program is probably not the right way to > go. Instead make a type, which implements the logic you need and > turn it into a Num instance.
And the trouble with this is that a lot of built-in functions like "take", "drop", and "length" expect and return Int values. So you need ugly conversions all over the place, where Int is a quite natural and reasonable choice, even though it has some chance of unexpected failure, say for taking the length of a lazy list that is larger than 2**32.
jacob navia <ja...@nospam.org> wrote: > What is interesting is that the responsability of handling overflows > apparently is the business of the programmer in Haskell, the same as > in C.
Not really. The practice in Haskell is that normally, you just use Integers, so you don't have to check for overflows because there are none. You can't do that in C (unless you use special libraries).
If you really care about performance, you can replace Integer's by Int's, but the trade-off is that in that case it's indeed the programmer's responsibility to make sure there are no overflows.
For example, by inspecting the code manually and calculating the minimum/maximum possible value, and restricting only the input values (say). Absence of side-effects makes that easy.
But automatic overflow checking would be contraproductive in that situation, since the only reason for using Int's is performance, and overflow checks everywhere would kill the performance gain.
Nevertheless, if for some perverse reason you'd still want this, you could just define your own CheckedInt type by introducing another Num instance. Another thing you can't do in C. :-)
> But automatic overflow checking would be contraproductive in that > situation, since the only reason for using Int's is performance, > and overflow checks everywhere would kill the performance gain.
> Nevertheless, if for some perverse reason you'd still want this, > you could just define your own CheckedInt type by introducing another > Num instance. Another thing you can't do in C. :-)
Maybe have the checking happen with Control.Exception.assert, so it gets fast automatically when you turn optimization on. At least it might be useful if your unit tests run on the unoptimized code are good enough.
We have some generic- functions now, but the number of things implemented in the standard libraries with Int -- IntSets, for example -- doesn't make it very convenient to just use CheckedInt everywhere instead.
> If overflow should be an error condition, then it's simply wrong to use > the Int type.
The problem, precisely, is that it is VERY difficult to know if overflow can be an error condition.
That is precisely the point here.
Requirements change. Quantities that would have fit in 31 bits do not fit any more. The file size goes beyond 1GB for instance, a table that handles an image grows too much to cope with a resolution of 6560 x 1900 (two monitors), etc etc.
Testing for overflow costs essentially zero. I do not understand why this is not done, since an absence of tests means that
a+b
can be WRONG. If the system traps on overflow at least we KNOW there is a problem instead of going on with wrong results!
> But automatic overflow checking would be contraproductive in that > situation, since the only reason for using Int's is performance, > and overflow checks everywhere would kill the performance gain.
No. In my implementation overflow checking is essentially free.
The generated code looks like this (for an x86 implementation)
(1) do the operation (add,subtract, etc) (2) jo errlab1 (jump on overflow to errlab1) (3) retlab1: (4) The rest of the program
(5) errlab1: call the error handler (6) if the error label returns goto retlab1 (i.e. ignore overflow)
Note that the check for overflow is a SINGLE INSTRUCTION (jo). This jump on overflow will be correctly predicted as not taken since it is a forward branch. This means that the pipeline will not be disturbed in any way.
Replacing all arithmetic operations with checked operations produces NO NOTICEABLE performance loss, i.e. the overhead is NOT MEASURABLE!
The myth of performance loss by checking for overflow is just that: A MYTH.
Mark T. B. Carroll <Mark.Carr...@aetion.com> wrote:
> Dirk Thierbach <dthierb...@usenet.arcornews.de> writes: >> Nevertheless, if for some perverse reason you'd still want this, >> you could just define your own CheckedInt type by introducing another >> Num instance. Another thing you can't do in C. :-) > We have some generic- functions now, but the number of things > implemented in the standard libraries with Int -- IntSets, for example > -- doesn't make it very convenient to just use CheckedInt everywhere > instead.
That's true. OTOH, these operations aren't likely to overflow (IntSets don't do arithmetic, for instance). And you could still convert between CheckedInt's and Int's for those operations, checking during the conversion. If it get's inconvienient, write a quick wrapper.
But anyway, I'm not advocating it to do it this way. The overhead alone is a good reason not to.
Paul Rubin <http://phr...@nospam.invalid> wrote: > Int is what we'd normally use when the numbers are expected to be in > range.
I don't, because I have been bitten by it in the past. It's a classical case of premature optimization.
So now I use Integer's by default everywhere, I convert the few Int's that still creep in as soon as possible. And I only switch to Int's if I want to tweak performance. And then I watch for possible overflows very carefully first.
> Floating point arithmetic is supposed to be a limited, optimized > representation of real numbers.
No, it's not optimized. It's a compromise to represent an uncountable datatype in a finite size, at all. You can't compare this situation to Int and Integer, because there is no equivalent to Integer for real numbers.
> When a float is out of range, there is an arithmetic exception.
Actually, the IEEE standard works hard to make it *not* an exception, but to go on with various pseudo-values as result. And that still doesn't take care of the numerical problems you have because of the finite precision. If you have a system of linear equations that's badly conditioned, the limited precision can make your result completely wrong. All without causing any exception.
> Ints should be treated similarly. The current mechanism is just evil.
Only if you use Int's when you're not supposed to :-)
jacob navia <ja...@nospam.org> wrote: > Dirk Thierbach a écrit : >> But automatic overflow checking would be contraproductive in that >> situation, since the only reason for using Int's is performance, >> and overflow checks everywhere would kill the performance gain. > No. In my implementation overflow checking is essentially free.
Even if the cost is low, it's the wrong approach, because you can do much better with the extreme cases: Either you use Integer's, and don't have to worry about overflows at all (but it's slower), or you use Int's, but then you'd better make damn sure that they are no overflows. In the last case, you want it as fast as possible, so you want manual control over the checks (with as few checks as necessary).
The in-the-middle-approach is IMHO pretty useless, because you combine the disadvantages of both cases: The program can fail with overflows, and it doesn't run as fast as it could.
> The generated code looks like this (for an x86 implementation) > (1) do the operation (add,subtract, etc) > (2) jo errlab1 (jump on overflow to errlab1)
That doesn't work if the representation isn't just a vanilla number. There's a reason the standard guarantees only 29 bits precision.
> The in-the-middle-approach is IMHO pretty useless, because you > combine the disadvantages of both cases: The program can fail > with overflows, and it doesn't run as fast as it could.
>> The generated code looks like this (for an x86 implementation)
>> (1) do the operation (add,subtract, etc) >> (2) jo errlab1 (jump on overflow to errlab1)
> That doesn't work if the representation isn't just a vanilla number. > There's a reason the standard guarantees only 29 bits precision.
Then you just test the 2 higher bits (or similar) test $0xC0000000,result jnz errlab1
The overhead would be 3 instructions (jo + 2 tests), without any pipeline turbulence, i.e. less than 0.1%...
Dirk Thierbach <dthierb...@usenet.arcornews.de> writes: > Even if the cost is low, it's the wrong approach, because you > can do much better with the extreme cases: Either you use Integer's, > and don't have to worry about overflows at all (but it's slower), > or you use Int's, but then you'd better make damn sure that they > are no overflows.
I just don't agree with this. You might be damn sure the ints won't overflow, but the runtime should check anyway, just like you shouldn't use array subscripts unless you're damn sure they're in bounds, but the runtime should check those too. Array subscript overruns are a notorious source of bugs in all kinds of code, and in each of them the implementer was damn sure that the subscripts wouldn't overrun. It's the same with int overflow.
In a supposedly safe language, static guarantees are great if you can get them. In their absence, dynamic checks can keep your program from running amuck. But mere strength of conviction on the programmer's part is never enough. If you want wraparound operations, use the Word types. The default types for very standard operations like "length" should not have unsafe behavior.
Note that if the compiler is sufficiently Int aware, it can lift quite a lot of overflow checks out of loops, similar to what's done with array bounds checks. That's probably harder to do with user-defined types that attempt to implement safe Ints. Safe should be the default and if necessary, unsafe can be available on explicit request.
jacob navia <ja...@nospam.org> wrote: > Dirk Thierbach a écrit : >> That doesn't work if the representation isn't just a vanilla number. >> There's a reason the standard guarantees only 29 bits precision.
I said EOT because you're obviously in love with the concept, so discussion is pretty futile. But this
> Then you just test the 2 higher bits (or similar) > test $0xC0000000,result > jnz errlab1
just doesn't work: As a counterexample to your code, try e.g. adding (-1) and (-1).
The overflow flag is the XOR of the carry coming into the sign bit with the carry going out of the sign bit. That's easy to implement on the gate level, so special instructions like x86 "jo" are cheap.
But these special instructions don't work when the sign bit is not the MSB of the register, or when the compiler uses C code as target language instead of emitting assembly (as some Haskell compilers do), or when you implement the checking directly in Haskell. In those cases, the costs are much higher, and you usually use a different method (like comparing the signs of the operands and the result).
Paul Rubin <http://phr...@nospam.invalid> wrote: > I just don't agree with this. You might be damn sure the ints won't > overflow, but the runtime should check anyway, just like you shouldn't > use array subscripts unless you're damn sure they're in bounds, but > the runtime should check those too.
But unlike Integer vs. Int, there's no way to avoid bad array subscripts, so it's comparing apples with oranges.
> In a supposedly safe language, static guarantees are great if you can > get them. In their absence, dynamic checks can keep your program from > running amuck. But mere strength of conviction on the programmer's > part is never enough.
It's enough if there is a safe alternative, and if you use Int's only if you really do want the performance gain (and as a payment, if you have checked the code). That's not that hard to do formally, you assign ranges to all Int arguments/results and then work forward or backward to ensure the ranges are conservative enough. And then you can insert dynamic checks at points where there matter.
Of course if you use Int's when you just have a hunch the values might be small enough, then yes, this isn't enough, and if you prefer testing to a formal proof, then yes, overflow checks everywhere are useful.
Dirk Thierbach <dthierb...@usenet.arcornews.de> writes: > But unlike Integer vs. Int, there's no way to avoid bad array subscripts, > so it's comparing apples with oranges.
Of course there is. You could use IntMaps instead of arrays, for example.
> Of course if you use Int's when you just have a hunch the values might > be small enough, then yes, this isn't enough, and if you prefer testing > to a formal proof, then yes, overflow checks everywhere are useful.
> But that's a question of the approach you take.
Reality is that while Haskell has more formal machinery than something like C, it is not Agda and it is full of partial functions. For example it's possible to attempt pattern matching or using "head" on an empty list, and so forth. The runtime does the only sane thing and throws exceptions when those things happen.
Dirk Thierbach <dthierb...@usenet.arcornews.de> writes: > The overflow flag is the XOR of the carry coming into the sign bit > with the carry going out of the sign bit. That's easy to implement on > the gate level, so special instructions like x86 "jo" are cheap.
> But these special instructions don't work when the sign bit is not > the MSB of the register,
I think the more normal implementation of tag bits is in the low order bits of the word, so you would just add normally and use "jo" to check for overflow.
> or when the compiler uses C code as target language instead of > emitting assembly (as some Haskell compilers do),
You'd tweak the generated code to generate the best instruction sequence via your favorite C compiler, and hope for the best (performance-wise, not correctness-wise) with other compilers.
> or when you implement the checking directly in Haskell. In those > cases, the costs are much higher, and you usually use a different > method (like comparing the signs of the operands and the result).
That's why the knowledge about low level overflow checking should be built into the compiler rather than leaving the poor user to his own devices.
Dirk Thierbach <dthierb...@usenet.arcornews.de> writes: > Not really. The practice in Haskell is that normally, you just > use Integers,
If that were really the practice, "length" would return an Integer and not an Int. The actual practice, as observed in the actual world, is that Ints are used all over the place where Integer would be safer but would cause an unacceptable performance hit.
Dirk Thierbach <dthierb...@usenet.arcornews.de> writes: > > Int is what we'd normally use when the numbers are expected to be in > > range.
> I don't, because I have been bitten by it in the past. It's a classical > case of premature optimization.
I don't know that I'd call something an optimization if the compiler and language are gravitate so much towards doing it by default. An optimization is where you go out of your way hairing up the code in the hope of making it faster. If you just do the simplest and most obvious thing, that's not an optimization. The problem is that in Haskell, doing the simplest and most obvious thing results in wrong code.
Dirk Thierbach <dthierb...@usenet.arcornews.de> wrote: > But automatic overflow checking would be contraproductive in that > situation, since the only reason for using Int's is performance, > and overflow checks everywhere would kill the performance gain.
You're missing the point, I think: x86 CPUs can do automatic overflow on int checking, it's just that the C standrd doesn't allow for it, so it's almost never used. There's no reason why Haskell couldn't use it (it probably exists on all platforms that Haskell exists on). The reason it doesn't is almost certainly because the early implementations where either interpreters written in C derived languages or copmilers that compiled to C code as an intermediate language.
> Nevertheless, if for some perverse reason you'd still want this, > you could just define your own CheckedInt type by introducing another > Num instance. Another thing you can't do in C. :-)
That might actually be something sensible to propose to the ghc crowd :)