600+ entries in 0.063s
: then you can say "hey folks, we're done with the ffa
that i know how to do, item is shelved for now, will pick it up if i wise up"
: now you're in the position where you complain "no cash for pizarro" and "no time for ffa
: do we have to do the 'do you want CORRECT ffa
or fast ffa
' thrd again.
: for instance ? 'on what burned cycles?' 3h on xray. 22 on pizarro (iron what-do, and learning how to mircea_popescuate forums, which not ready for field of yet). 30+ on ffa
: asciilifeform so what am i putting in this month's report, " alf was gonna do ffa
but went away to play with xrays instead" ?
: mircea_popescu: i've had occasion to move the stack limit ( when tested ffa
with massive fz widths, recall , all allocations are on stack ) but not otherwise
: ( as asciilifeform recently discovered re ffa
. it dun thread, and gcc correctly snips out all pertinent coad )
: asciilifeform's 1st step in writing ffa
, recall, was to conceive of (and prove) an arithmetic workaround for above. that, right off the bat, cost ~10fold cpu.
: ( i dun currently use, but it's definitely going into the final revs of ffa
: ( e.g., ffa
, built yes on arm64 using ave1's system, but it's spartan to the point of starvation featurewise)
: one could actually ffa
on that ( if somehow find one ! )
: ( btw the obj reason why none of this is visible in ffa
-- there aint any tasks! so even in ljmp mode it shits out same coad )
: ( to 0 measurable diff in ffa
, oddly enuff, but on e.g. tiny micros might make a diff.. )
nao wonders if there's a seekrit chest fulla ffa
speedup in this dig
: mircea_popescu: i've had to up stack depth on 1 occasion before -- when testing ffa
with ridiculously wide bitnesses ( recall, item runs 100% stackistically )
: mircea_popescu: ffa
built with inlining switched off is actually ok test for ~that~ imho
will re-play ffa
benchmarks on the longjmp gnat, once the latter's built, but doesn't expect to find any measurable diff
: a linear process can still be heavy enuff (e.g. a 8192-bit primality test on ffa
) that you wouldn't want to wait for it to finish when killing a thread
: if you have cppola ( or even, e.g., an asm-enhanced ffa
) in yer loop, it won't be poll-killable.
: Logged on 2019-02-12 14:13 asciilifeform: http://btcbase.org/log/2019-02-12#1895234
uses exceptions strictly as 'fucking stop whole program (and if it's running on a micro, whole machine, and flash 'dead!' lamp) right nao!' , so won't impact. my understanding is that it'd impact only speed of the ~exceptions~, longjump is slower cuz it crosses pages -- cachaistically.
: ( ffacalc
~does~ have a handler, strictly for trapping invalid cmdline args, ln. 75. )
: ( and since nobody asked 'where exactly does ffa
use exceptions? i dun see any throws' -- answr is, ~all~ ada coad where bounds checks are enabled, theoretically 'uses' exception, if you break a bounds check what do you suppose happens.)
: ( in ffa
, exceptions are a 'catch fire' condition, and drop into the last-chance handler, but in moar complicated proggy, with, say, devices, you may want to actually handle and keep working )
: btw this is why i sewed ffa
into a linkable lib. ~it~ can still be built with restrictions even if running inside a proggy with tasks etc
: Logged on 2019-02-12 13:18 bvt: i expect it will be slower, but it won't hurt to do the check. the impact will depend on how exceptions are used (i don't think it can have any impact on ffa
, for example). but i don't have enough experience with it to provide any numbers
uses exceptions strictly as 'fucking stop whole program (and if it's running on a micro, whole machine, and flash 'dead!' lamp) right nao!' , so won't impact. my understanding is that it'd impact only speed of the ~exceptions~, longjump is slower cuz it crosses pages -- cachaistically. ☝︎☟︎
: i expect it will be slower, but it won't hurt to do the check. the impact will depend on how exceptions are used (i don't think it can have any impact on ffa
, for example). but i don't have enough experience with it to provide any numbers ☟︎
intends eventually to actually try ffa
on a micro with deliberately-glitchy environment, e.g. inside that xray oven, and see how this goes in real life
: Logged on 2019-02-05 14:49 asciilifeform: on that subj, attentive ffa
reader will notice in certain places asciilifeform marked in comment 'cosmic ray resistance' . this indicates mechanisms where there are two or more separate pieces that ensure a correct computation (or death with alarm bells) if somehow bit flips , when this is inexpensive.
: mircea_popescu: i cant speak for eucrypt, but ffa
is at least as much experiment in 'can program be written to make sense?' as it is arithm lib
: yes "object oriented" verbosity is ridoinculous. nevertheless there is A LOT of meta involved that no ida ever sucks out, cuz it's not machine-accessible. which is why both eucrypt and ffa
published chapters look like they do.
: education is this process whereby people are sharpened, not changed. if girl has it in her to outwrite your ffa
, she conceivably will, and if not, she will not. why's this something i'm to fret about ?
: and then b) why and wherefore is "work" defined in terms of "ffa
improvements" ? thing's not even supposed to be ~improvable~.
: and yes if/when nicoleci or new one etc sends in an optimization for ffa
arithm, i'ma take it back.
: we already iirc did the a:'so how many solved the ffa
puzzlers' m : 'i wouldn't waste a precious trained gurl on such dirty works' thrd, dun have to replay.
: on hardwarized ffa
variants, this output is to be connected to either an actual bell, or at least red 'sad' lamp.
: on that subj, attentive ffa
reader will notice in certain places asciilifeform marked in comment 'cosmic ray resistance' . this indicates mechanisms where there are two or more separate pieces that ensure a correct computation (or death with alarm bells) if somehow bit flips , when this is inexpensive. ☟︎
: truly vintage (ada-83) won't eat ffa
, there is use of 'in out' parameters.
: re ada2012, that was actually a good q, and i'll answer it for the log : ffa
in fact uses preconditions, a 2012 knob.
: diana_coman: posssibly i oughtn't to graduate'em till they solve all the ffa
: verisimilitude: as a concrete example : you will find that ffa
uses an unmoving hinge for karatsuba multiplication. consequently all numbers are required to occupy a space that is a power-of-two bits wide. but from this you get a 3-4x simpler mechanism.
in particular is intended as , among other things, a didactic demonstration of what means 'fits-in-head'.
: i intend to port ffa
to msdos, for instance, and don't expect that gnat will be building it ~on~ dos box.
: Alright; I'll keep that in mind when I am finally able to study your FFA
: verisimilitude: in 2016 i found that i gotta write a safety-critical arithmetic system (nao known as ffa
) and found that i cannot in good conscience do it in anyffin but ada.
: also unlike e.g. rms, asciilifeform even washes, even tho washing dun even directly bake any ffa
: "i can't do any ffa
work because i'm working on a manner in which to do it that wouldn't produce the idle inquiry of loc 6 months in"
: for ffa
, the only os knobs you need are 1) a means for getting commandline string 2) equiv. of putchar 3) equiv. of getchar 4) a means to read bytes from a FG .
: re ffa
in particular, if yer testing on something truly exotic (in re not having a posix layer) you will have to change a coupla lines in os.adb where i/o happens
: incidentally i expect that results from 'does ffa
build' carry over to errybody who is currently using a ffa
-derived gpr config ( this includes diana_coman , iirc , and possibly phf )
: incidentally, even old-fashioned barbaric adacorpse gnat was well-behaved re '-static' flag -- there's an os that dun even support static link (crapple) and it will properly barf there, rather than silently build dynamic ( incidentally can still build a runnable ffa
there, fwiw, in debug mode -- see readme.txt )
: diana_coman: he said 'threading model' so i assumed he found some ffa
-style fascistic constraint to put on it, lol
: Not bad, just noticed last nite ave1's site back up so in process of building musl gnat today then on to revisiting ffa
: I'm even spending a few spare cycles on FFA
: yay, m-r on ffa
; now I'll really have to schedule ffa
feast during breaks from work, lol
: i don't understand how anyone can read the logs with one eye, this shit's exponential: i'm behind on ffa
, so the recent work on e.g. gcd or miller-rabin is particularly slow going.
: in so far as anyone can tell, it's troo, and then 64bit witnesses suffice for 4096bit candidates. but not only relies on riemann, but dun win anyffin in ffa
, where very small and very large number eat same cpu.
: btw, imho it's a good example of 'ffa
-style' narrowing of problem domain ( there is no good reason for ancient tx to be rewritable )
: may recall, 1st draft of ffa
used genericism, i removed it
: note that mmap is not a front burner item currently for asciilifeform - i dun need it in ffa
, will come back to it after.
: bvt: even the current (ch11 and after) ffa
relies on a gnat with working forced-inlining
: granted, an unrolled ffa
would operate on a fixed width (e.g. 8192) of primary fz.
: i guess i could do some experiments here. the immediate question is that ffa
does plenty of FZ_Adds with different FZ'Length, so full unrolling would not really work (unless i miss something).
: koch's turd, despite being implemented in c, with no bounds checks, actually loses to ch14 ffa
, for inputs of same ~width~ -- despite fact that he doesn't constanttime and thereby gets to skip massive work
: bvt: i expect one would trivially get a 10-20x speedup over the ordinary ffa
, esp. if the item still fits in l1
: before considering to bake irons, it is worth to see what a 100%-asmic ffa
: nao, it isn't as if the current ffa
, with 2.7sec 4096-bit modexp, is immediately usable to eat packets at line rate. but that part at least theoretically parallelizes ( i.e. a rack fulla multicore boxen running ffa
, can theoretically eat packets at line rate... )
: helps to recall that the problem which originally prompted asciilifeform to write ffa
, is a (currently hypothetical) application where rsa sigs are carried in ~individual packets~
: ftr i suspect that entirely ordinary algos, such as are seen in the current ffa
, would already give ~line-rate~ (i.e. , 4096 modexp faster than 1G/s nic can give you new inputs to modexp on ) if implemented in iron properly.
: ( '1st commandment' of ffa
: thou shalt not branch on seekrit bits. '2nd commandment' -- thou shalt not index memory by seekrit bits ... )
: Logged on 2019-01-20 16:23 asciilifeform: as i noted previously -- i do not expect to find any moar ~asymptotic~ speedups for ffa
algos , such that are relevant to the sizes of numbers typically used in public key crypto
: 1 annoying aspect of 'iron ffa
'-gedankenexperiment, is that none of the available fpga ( either 'ice40' series, or the evil ones ) are anywhere near big enuff to prototype with. it'd have to be simulated a la http://www.loper-os.org/?p=2593
, slowly, and then straight to silicon.
: in a hypothetical asmistic branch of ffa
, you'd want to implement whole comba in asm, rather than merely word mul
: my understanding is that asmism would go only for lower-level ffa
code, i.e. barret/modexp will remain as-is.