158 entries in 0.465s
bvt: you mean, can use just
karatsuba?
bvt: re ~2.5x speedup: for 8-indeces-threshold asm comba makes up only ~25% of runtime, rest is spent in
karatsuba squaring
bvt: i agree. i played a bit with the code, arrived at slightly cleaner (and sub-1% faster) code for multiplication, seems that the performance is close to the hw limit (at least at 32 Indices Comba-
Karatsuba threshold)
a111: Logged on 2019-06-22 07:09 bvt: i do expect that 'saltmine season' is over now -- i've been working on asming
karatsuba squaring (ffa ch.12b) over this week, post expected tomorrow.
bvt: i do expect that 'saltmine season' is over now -- i've been working on asming
karatsuba squaring (ffa ch.12b) over this week, post expected tomorrow.
☟︎ bvt: no, i mean that particular paper. i initially wanted to implement a fft-like multiplication algorithm, but got interested in this
karatsuba/toom when reading the paper.
bvt: yes, actually in this
karatsuba/toom algo one can embed comba's trick, but the code would become even gnarlier
zx2c4: > At this time we will walk through the mechanics of our
Karatsuba multiplier, so as to cement in the reader’s head the correctness of the routine, and lay groundwork for the optimization which is introduced in Ch. 12B.
mod6: yeah, no. it's fine. i was happy to build/run. to me this is all very exciting/important stuff. back when alf was doing the
karatsuba and other things in the late summer/fall, this type of thing started to be contimplated for innerloop substitute.
apeloyee:
karatsuba's saves area only, not time
mircea_popescu: there's outright cleverness, like that guy's method to compute inside pi ; and then there's just sheer good fucking fortune, like
karatsuba.
mod6: with comba, for instance, as with others (ex.
karatsuba): i tend to go and read the papers on it, then review the code and see how ffa is doing it. etc.
a111: Logged on 2017-10-14 18:39 apeloyee: besides, "bernsteinan
karatsuba" requres carry-save arithmetic, otherwise it likely wins nothing. so not separate from comba rewrite.
apeloyee: besides, "bernsteinan
karatsuba" requres carry-save arithmetic, otherwise it likely wins nothing. so not separate from comba rewrite.
☟︎ apeloyee: can has link describing " bernsteinian
karatsuba"?
a111: Logged on 2017-10-08 20:43 apeloyee: not if it needs 8x more temp space << and
karatsuba totally doesn't.
apeloyee: not if it needs 8x more temp space << and
karatsuba totally doesn't.
☟︎ a111: Logged on 2017-10-07 00:38 asciilifeform: mod6: you will notice that the barrett in 'crc handbook' is more complicated : it shrinks the x and then compensates later. this relies on normalization , and constanttimeized incarnation of it would have to work as apeloyee described ( i'ma try it much later, once i see what can be had re speed strictly from having asymmetric
karatsuba instead of the current mega-waste )
apeloyee: 2 half products out of 3 on the first level of recursion, 4 of 9 on second, and 8 of 27 on third, assuming 64-bit words and unrealistic 2-fold speedup of comba for half-multiply, and no overhead in
karatsuba,
apeloyee: I mean, W_Mul doesn't do
karatsuba a111: Logged on 2017-10-07 21:14 apeloyee:
http://btcbase.org/log/2017-10-07#1722289 << and the point of doing
karatsuba is? you do 2 recursive calls to Mul_Karatsuba_TopOnly and one to Mul_Karatsuba. should've simply calculated upper_part(XLo*YHi), upper_part(YLo*XHi) and XHi*YHi
a111: Logged on 2017-10-07 21:14 apeloyee:
http://btcbase.org/log/2017-10-07#1722289 << and the point of doing
karatsuba is? you do 2 recursive calls to Mul_Karatsuba_TopOnly and one to Mul_Karatsuba. should've simply calculated upper_part(XLo*YHi), upper_part(YLo*XHi) and XHi*YHi
a111: Logged on 2017-08-17 20:30 asciilifeform: but in very other olds, apparently in an obscure article in '09 bernstein shows how to eliminate one of the middle-term additions of
karatsuba .
ave1: asciilifeform: but it works, and for me was a good way to review
karatsuba code
ave1: asciilifeform: are you still looking for a non-recursive
Karatsuba? or should it wait until it is fully crystalized?
mod6: just was curious if your readability issue could be resolved with pulling the paramaters of Mul_Karatsuba back out in similar fashion to the original. 'tis all.
mod6: Mul_Karatsuba(X(X'First .. X'First + K - 1),
mod6: Mul_Karatsuba(X0, Y0, P);
mod6: this might be extra-strength dumb, but... in your new power of 2 version, do you need to inline the Mul & Square of
karatsuba?