83 entries in 0.697s
mp_en_viaje: would be kinda interesting if
comba ends up snipped.
bvt: in case only asm is used, plain
comba may be fast enough as well
bvt: re ~2.5x speedup: for 8-indeces-threshold asm
comba makes up only ~25% of runtime, rest is spent in karatsuba squaring
bvt: i agree. i played a bit with the code, arrived at slightly cleaner (and sub-1% faster) code for multiplication, seems that the performance is close to the hw limit (at least at 32 Indices
Comba-Karatsuba threshold)
bvt: asciilifeform: depends on whether there will be a usecase for such item, but in any case shifts/adds should be simpler than
comba a111: Logged on 2019-04-15 21:30 bvt: the only smaller item in my pipeline is unrolling
comba for ch10 multiplication; and i'd like to give a shot a 'kill /dev/random and /dev/urandom' experiment, replacing them with buffers than can be directly fed from userspace, but this - even later.
bvt: the only smaller item in my pipeline is unrolling
comba for ch10 multiplication; and i'd like to give a shot a 'kill /dev/random and /dev/urandom' experiment, replacing them with buffers than can be directly fed from userspace, but this - even later.
☟︎ bvt: asciilifeform: a better attempt at this algorithm can involve using
comba at the lower level.
bvt: yes, actually in this karatsuba/toom algo one can embed
comba's trick, but the code would become even gnarlier
mircea_popescu: And in this here FFa post we will be taking
Comba Mult version x from y date and together with last week's X, Y and Z, and make this pile
mod6: with
comba, for instance, as with others (ex. karatsuba): i tend to go and read the papers on it, then review the code and see how ffa is doing it. etc.
mod6: and that's working pretty great. just now digging into
comba and other higher-level ones. not writing tests on those yet -- just studying, reading.
a111: Logged on 2017-10-14 18:39 apeloyee: besides, "bernsteinan karatsuba" requres carry-save arithmetic, otherwise it likely wins nothing. so not separate from
comba rewrite.
apeloyee: besides, "bernsteinan karatsuba" requres carry-save arithmetic, otherwise it likely wins nothing. so not separate from
comba rewrite.
☟︎ a111: Logged on 2017-08-10 02:43 asciilifeform: for simplicity, tested the case that actually happens in practice: on a 64bit box, any ffa width over 512 bits gives a strictly 8-wide
comba mult ocurrence
apeloyee: 2 half products out of 3 on the first level of recursion, 4 of 9 on second, and 8 of 27 on third, assuming 64-bit words and unrealistic 2-fold speedup of
comba for half-multiply, and no overhead in karatsuba,
apeloyee: and most products for which the
comba is called, are full products, not half products
apeloyee: see, it does three recursive calls, meaning the speedup is wholly dependent on the speedup of
comba for half-multiply
ave1: asciiliform: BTW maybe I missed it but why do
comba when L <= 8? does this come from speed testing?
a111: Logged on 2017-08-08 23:51 asciilifeform: it thereby follows that i could unroll
comba into explicit cases from 1 to 8 words
a111: Logged on 2017-08-08 21:28 asciilifeform: in other noose, mod6 , phf , et al :
http://btcbase.org/log/2017-07-10#1681208 nao 1.5s . ( this with karasbuba-squaring used in exp, and
comba-squaring used as base case in the former. )
ascii_modem: i also discovered how to write
comba such that the only branch is the loop branch
mod6: (contains completed karatsuba, but no
comba)
mod6: i need to integrate W_Mul_Comba & W_Add_D and do some of the other things suggested above.
a111: Logged on 2017-07-13 05:21 mod6: this is actually using karatsuba, haven't even integrated the new
comba code yet.
mod6: this is actually using karatsuba, haven't even integrated the new
comba code yet.
☟︎ mod6: im retarded, what is '
comba' ?
a111: Logged on 2017-05-21 00:47 asciilifeform: in other sads, it turns out that paul g.
comba ( of
comba's multiplication algo ) died in april.