BingoBoingo: Meanwhile there is a story floating around that Wikipedo does not load in Vzla after some stunt changing the President according to wikipedia to an opposition leader. At the same time girl on the ground in Vzla reports that wikipedo does indeed load.
BingoBoingo: The anglophone press is saying sophisticated SSL fuckery is involved in the block rather than the usual DNS shennanigans
BingoBoingo: I suspect USG.NSA.JAVAGURLZ is auditioning for sandbag duty
BingoBoingo: And that aircraft carrier mght be sinking sooner rather than later
BingoBoingo: ^ Yes I raised the alarm. Efforts to convince me to lower it are welcome. If you are outside my L1 you need to offer a stake and odds for your petition to be considered.
mircea_popescu: apologetic girl -- "today's the last of my hormone pills, might have something to do with it". deaf mp -- "whore moan pills ?!"
mircea_popescu: ahaha check him out asciilifeform ! MATHEMATICAL NOTATION!
bvt: re math, i found that htmling it would be unreadable, so decided to use images.
bvt: as far as code speed is concerned, i still don't know whether i fucked up something, or the algorithm is fundamentally slow after some bignum size
bvt: will provide more seals after i work with chapters 7-9 more.
mircea_popescu: bvt if you think about it, it has to be slower, because operations are fast and allocations slow.
☟︎ bvt: yes, and it's also true for memory accesses. however the number of executed instructions also increased a lot, which makes me suspect that there is something i missed
mircea_popescu: i suppose you might say "it shouldn't be x2, my expectation would be it's x2^1/2", but w/e.
bvt: i would not be surprised it was 20% slower, but 2x was surprising for me
bvt: also, the linked pdf contains one FFT-like algorithm i considered to implement (using walsh transform for convolution instead of fft). but it'd be even more complex
bvt: yes, actually in this karatsuba/toom algo one can embed comba's trick, but the code would become even gnarlier
bvt: re fft: i would wait until a use-case appears, to at least understand what are the requirements. do you have something in mind?
bvt: nope, but there were a few mentions of other algorithms in the logs, so i decided to have a look. don't remember how i arrived at this particular one
a111: Logged on 2019-01-20 15:49 mircea_popescu: bvt if you think about it, it has to be slower, because operations are fast and allocations slow.
bvt: no, i mean that particular paper. i initially wanted to implement a fft-like multiplication algorithm, but got interested in this karatsuba/toom when reading the paper.
mircea_popescu: there's that old joke re "i read good books twice but bad books i don't read at all" which very much applies : if only one knew before looking whether an algo is fast or slow!
bvt: mircea_popescu: thanks
bvt: asciilifeform: a better attempt at this algorithm can involve using comba at the lower level.
bvt: myeah, complexity does not go too well with both constant-time and fits-in-head.
bvt: will do. so far i had a very brief look at it
bvt: my understanding is that asmism would go only for lower-level ffa code, i.e. barret/modexp will remain as-is.
mircea_popescu: i suspect double-wide mul might be implemented by lookup tables.
bvt: i.e. W_Borrow/W_Carry still cause the overhead?
bvt: i'll be afk an hour or two
mircea_popescu: asciilifeform ie, splits the ints in bus width chunks and only does all 4 quadrants depending on results from the others.
mircea_popescu: nah, just, if you're multipling (for simplicity) 01 x 01, it won't do 0x1 0x0 0x1 parts
a111: Logged on 2019-01-19 13:50 mircea_popescu: aaand in wtf lulz,
http://btcbase.org/log/2019-01-14#1886738 item died again, same exact details, same exact item, jan 15th (ie, it ran for ~day). ima finally get my ass into gear and debug-run it.
mircea_popescu: apparently 200 reqs ~per 100ms~ (that's how long it takes to complete one) is not enough to bring it down.
☟︎☟︎ a111: Logged on 2019-01-20 16:23 asciilifeform: as i noted previously -- i do not expect to find any moar ~asymptotic~ speedups for ffa algos , such that are relevant to the sizes of numbers typically used in public key crypto
bvt: agreed. on x86_64 is compiled to ADD; SETC sequence, but who knows what happens on other arch/gcc version.
bvt: i wonder how this is a valid argument even there. if nothing reads the flag, there are no pipeline dependecies.
bvt: perhaps some old APL machines would qualify
bvt: actually, that fft-like multiplication algorithm was developed in APL
bvt: are such machines even build these days? i.e. for dsp, or other use-cases?
☟︎ bvt: would this actually be efficient on current arches? this would stress instruction decoder a lot
bvt: i guess i could do some experiments here. the immediate question is that ffa does plenty of FZ_Adds with different FZ'Length, so full unrolling would not really work (unless i miss something).
bvt: aha, i see. this would also involve lots function call inlining as well
bvt: writing such code really could use some program to generate the necessary amount of adds/subs
bvt: ave1's fix also fixed the 'undefined references' in static lib with gcc-6.3 for me
bvt: you can't verify output of 'write 4096 adds' by hand, though
☟︎ bvt: this is true, rigth.