tmsr - Search: from:asciilifeform

20600+ entries in 0.008s

asciilifeform: this gives the 'write block on disk only 1nce' behaviour.

asciilifeform: *experimental trb, err

asciilifeform: in asciilifeform's o(1) tx indexer ( will be welded to an experimental bdb once i get the mmap thing resolved ) there's a 2-level storage -- a 'write-once' o(1) index for blox of age N ( N can be 100-500 in practice ), and a much smaller rewritable one kept strictly in ram ( for 'recent' blox, where the longest chain is potentially movable ) ☟︎

asciilifeform: (tho, will add, the retarded design of bdb, gives early death on ssd. in sane indexer there is never any reason to touch any single block moar than once )

asciilifeform: not unless yer using rubbish ssd, at any rate

asciilifeform: nah

asciilifeform concurs that there's 0 point in debug of bdb, it's medicine for a corpse

asciilifeform: i'd look at the realloc counter on the disk, betcha it's started to turn.

asciilifeform: ( there -- found this twice, to date. )

asciilifeform: mircea_popescu: that's the interesting imho part. ftr i have yet to witness a corrupted-but-openable bdb on machine other than where ssd was found to have rotted.

asciilifeform will bbl

asciilifeform: a fast rsatron is important mainly in light of fast rejection of crapola sent by enemy, rather than for payload per se. ☟︎

asciilifeform: ( a single FG , recall, yields ~7kB/s at room temp )

asciilifeform: i'll add , for completeness of thread, that if yer ~sending~, rather than receiving, rsa packets, your bottleneck will be ~rng~ long before it could ever be the arithmetron per se

asciilifeform: the inner loop or 2, tho, definitely can.

asciilifeform: ( you won't be unrolling an entire 4096bit modexp, on any plausible irons... )

asciilifeform: there's an obvious limit to what you can unroll and still fit in any plausible cache tho.

asciilifeform: ( if you have an add-with-carry instr )

asciilifeform: and besides, the wall wouldn't even have that many bricks -- on a 64bit bus, a 4096-bit addition is 64 instructions long

asciilifeform: it's no moar complicated than to count bricks in a wall.

asciilifeform: bvt: why not ? they all will look same neh

asciilifeform: not having used gcc5+ , i never saw this bug

asciilifeform: one would still want to audit the output of any such thing tho, by hand.

asciilifeform: bvt: a typical macroassembler would work for the purpose.

asciilifeform: ( ave1 discovered how to guarantee working inlining, and this gave 'free' ~2x speedup )

asciilifeform: bvt: even the current (ch11 and after) ffa relies on a gnat with working forced-inlining

asciilifeform: but this limitation would also be true of any hypothetical arithmetic iron.

asciilifeform: granted, an unrolled ffa would operate on a fixed width (e.g. 8192) of primary fz.

asciilifeform: none of the lengths depend on the actual contents of the user input.

asciilifeform: just as you can de-recursivize the karatsuba etc

asciilifeform: therefore you can unroll.

asciilifeform: bvt: all of the lengths are deterministically known from the primary fz width.

asciilifeform: the skipping itself is expensive enuff on iron with cache/branchpredictor, that he loses rather than wins from it.

asciilifeform: koch's turd, despite being implemented in c, with no bounds checks, actually loses to ch14 ffa , for inputs of same ~width~ -- despite fact that he doesn't constanttime and thereby gets to skip massive work

asciilifeform: ( consider the http://www.loper-os.org/?p=2906#selection-864.0-883.127 experiment, for instance. )

asciilifeform: the penalty from having any branches, of whatever kind, anywhere at all, on pc iron -- is substantial

asciilifeform: i found last yr, for instance, that unrolled comba ( still in ada ) gives 20-25% speedup.

asciilifeform: bvt: i expect one would trivially get a 10-20x speedup over the ordinary ffa, esp. if the item still fits in l1

asciilifeform: ideally one would unroll ~all~ of the loops ( e.g. instead of looping through the words of a bignum, would e.g. add-with-carry on immediately consecutive words with stream of add instrs, etc )

asciilifeform: ( this is not a contradiction in terms, it is possible to implement whole thing, with same constant-time algos, by hand asm )

asciilifeform: before considering to bake irons, it is worth to see what a 100%-asmic ffa would give.

asciilifeform: anyway pretty sure we had this thread, is in the logs somewhere.

asciilifeform: 1G/s link in principle delivers 262144 4096bit packets /sec ( in practice, many fewer, on acct of overhead )

asciilifeform: err, 1.7

asciilifeform: nao, it isn't as if the current ffa, with 2.7sec 4096-bit modexp, is immediately usable to eat packets at line rate. but that part at least theoretically parallelizes ( i.e. a rack fulla multicore boxen running ffa, can theoretically eat packets at line rate... )

asciilifeform: ( is why all previously published rsatrons , entirely unsuitable -- if there's any leakage at all via timing, enemy trivially derives yer key )

asciilifeform: helps to recall that the problem which originally prompted asciilifeform to write ffa, is a (currently hypothetical) application where rsa sigs are carried in ~individual packets~

asciilifeform: without any exotic systolicisms etc

asciilifeform: ftr i suspect that entirely ordinary algos, such as are seen in the current ffa, would already give ~line-rate~ (i.e. , 4096 modexp faster than 1G/s nic can give you new inputs to modexp on ) if implemented in iron properly.

asciilifeform: more or less entirely opposite approach from what's wanted for crypto.

asciilifeform: ( and defo not in constant time )

asciilifeform: bvt: there's a large market of various voodoo for dsp, but afaik it all operates 'inexactly'

asciilifeform: they potentially win, but only on custom iron really

asciilifeform: often called 'systolic' algos

asciilifeform: bvt: i found a buncha these, when digging

asciilifeform: so even vertical microcode would win

asciilifeform: in simple o(n) bignum operations like addition, the cost of instruction decoding for each consecutive 'add' , is substantial % of the cost

asciilifeform: bvt: possibly the bolix machines also ( they did it in vertical microcode, iirc, tho, and in nonconstant time unsurprisingly )

asciilifeform: bvt: notion was, you gotta stop the pipe if something ~were~ to read it

asciilifeform: could simply have optimized 'take these-here N words and those-there M words, and put bignum addition in memory starting at O, and overflow flag in P ' or similarly

asciilifeform: and no it dun have to have gigantic bus width, necessarily

asciilifeform: i find it interesting that -- afaik -- nobody's ever built iron that was specifically optimized for bignum

asciilifeform: !#s riscv

asciilifeform: .g.,

asciilifeform: e

asciilifeform: misguided folx ~continue~ to build these, with the excuse given being 'pipeline'

asciilifeform: bvt: 1 of the reasons why ada doesn't offer e.g. addition-with-overflow , is that there is an abundance of sad iron where there isn't even physically a carry flag.

asciilifeform: ( ada standard btw trivially allows for types where this holds true automatically , i.e. throws exception for overflow. but this is not only massively unconstanttime but the overhead is gigantic )

asciilifeform: gcc's knob seems to be geared for scenarios where the overflow is an error condition, rather than expected.

asciilifeform: without any jumps

asciilifeform: a correct asmism would simply read the carry flag and put the value where it belongs (e.g. in fz_add, into the next addition, in comba -- into the accumulator; etc)

asciilifeform: therefore that approach is completely verboten.

asciilifeform: btw, must also add to bvt's http://btcbase.org/log/2019-01-20#1888517 >> https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html admits that : 'The compiler will attempt to use hardware instructions to implement these built-in functions where possible, like conditional jump on overflow after addition, conditional jump on carry etc.' , i.e. there is not a guarantee that the thing dun introduce cond jumps. ☝︎

asciilifeform: ( '1st commandment' of ffa : thou shalt not branch on seekrit bits. '2nd commandment' -- thou shalt not index memory by seekrit bits ... )

asciilifeform: chances are that it wouldn't, tho, given how the table still has to be indexed via fz_mux in order to prevent variant (i.e. nonconstanttime) memory indexing

asciilifeform: ( in all fairness, a large -- e.g. 8bit -- window, ~could~ win, but massively multiplies the memory requirement for the thing )

asciilifeform: possibly i'ma do a writeup on the subj, once errything else is fielded.

asciilifeform: the overhead eats the winnings.

asciilifeform: i prolly oughta add to the http://btcbase.org/log/2019-01-20#1888508 thing : 1 of the items which seemed like a speedup, but in actual practice sucked, was the use of (constant-time) 2 (ditto 4) -bit windows for modexp ( iirc apeloyee suggested ) ☝︎

asciilifeform: ... disk rot ?

asciilifeform: mircea_popescu: lobe << that's still interesting - invites q of ~why~ had bad lobe

asciilifeform brb,teatime

asciilifeform: 1 annoying aspect of 'iron ffa'-gedankenexperiment, is that none of the available fpga ( either 'ice40' series, or the evil ones ) are anywhere near big enuff to prototype with. it'd have to be simulated a la http://www.loper-os.org/?p=2593 , slowly, and then straight to silicon.

asciilifeform has quite thick binder of curated material on subj, for the hypothetical day that we start baking irons

asciilifeform: ( the gate count, that is )

asciilifeform: note that the 'cube' observation only applies if you're going for a single-clock-cycle iron multer. otherwise it grows as square of bitness.

asciilifeform: mircea_popescu: possibly, it'll have to be tested when asciilifeform or somebody else can be arsed

asciilifeform: the # of gates req'd, is a cube.

asciilifeform: ( the 64x64 iron multer in amd/intel, possibly surprisingly, is in fact constant time, in all boxes i've tested to date )

asciilifeform: mircea_popescu: it is conceivable that the ones currently sold are constant time , i simply haven't tried'em.

asciilifeform: ( not to mention, a, e.g., 64 bit multiplication table, with extant logic density, would be approx the size of solar system )

asciilifeform: mircea_popescu: that wouldn't be constant-time...

asciilifeform: bvt: correct

asciilifeform: mircea_popescu: expand ?

asciilifeform: ( otherwise you end up eating the overhead from the existing portable carry calculation mechanism for no reason )

asciilifeform: same goes for add/sub.

asciilifeform: in a hypothetical asmistic branch of ffa, you'd want to implement whole comba in asm, rather than merely word mul

asciilifeform: recent pc irons offer 128x128bit iron mul. but i have not investigated it, and it specifically gotta be tested for constancy of time

asciilifeform: bvt: i'd prefer asmisms to c imports ( which not only ugly and compiler-dependent, but i suspect destroy performance with overhead )

asciilifeform: but i dun currently have such a thing.