log☇︎
45300+ entries in 0.027s
asciilifeform would really instead like a cpu where erry instruction takes 1 tick, fuck errything. but presently dunhave.
mircea_popescu: in no case do i know of anyone who has actual data re such things as "ok, so manual claims, but NUMBERS for this penalty"
mircea_popescu: nevertheless, the point has legs -- what we've done here very much can be done ands re-done, and with cuntoo/ada-gnat/etc stack spitting out statics, it might even come close.
mircea_popescu: not likely a scalar table.
asciilifeform: ( the rub is that ticks depend heavily on context, on a reorderistic cpu )
asciilifeform would kill for an accurate tick table for opteron. but none exists ( possibly when i have msdos gnat, can bake one..)
mircea_popescu: kinda what i understood, that cmp USED TO BE expensive, but is no longer.
asciilifeform: i was answering q of 'why does gcc put out this odd form for a computed goto'.
asciilifeform: mircea_popescu: the 'test' vs 'cmp' thing is microscopic , so arguably red herring here
asciilifeform: seems like the diff only comes into play when there is 1 or moar ?
mircea_popescu: leaving compactness aside for the moment
mircea_popescu: if the zcx's cmp WERE slower than sjlj's test, then we should see the latter be faster on 0 handlers than the former!
mircea_popescu: gimme a break, that first instance ?
a111: Logged on 2019-02-14 21:53 diana_coman: mircea_popescu and anyone else following along, here's the data from a set of runs with handlers from 0 to 3: http://p.bvulpes.com/pastes/9Cstd/?raw=true
asciilifeform: mircea_popescu: do i misread the http://btcbase.org/log/2019-02-14#1896636 run ? ☝︎
mircea_popescu: she corrected the numbers, it's 90 for zcx 93 for sjlj at the best.
asciilifeform: ( btw the obj reason why none of this is visible in ffa -- there aint any tasks! so even in ljmp mode it shits out same coad )
asciilifeform: diana_coman does seem to have crafted a testism where it does
mircea_popescu: well, i won't trust my own understanding of asm and contemporary cpus as far as i can throw it ; but if indeed the operands in zcx impl were slower, you'd see it take less time!!!1
asciilifeform: whereas if you didn't touch it, it then is
asciilifeform: mno, if you lock the pipe, you lock the pipe, that reg aint available for reorderism
mircea_popescu: yeah but the pipe is built such that this is also ~0 cost electrically
asciilifeform: (so it actually locks the pipe and waits for both operands to become available)
mircea_popescu: i agree with THAT part.
asciilifeform: test simply ORs the bits in register together (and this happens by default when you load it, cuz it's costless electrically)
asciilifeform: mircea_popescu: opposite. cmp is ~3x slower than 'test'
mircea_popescu: one part of the problem might be that sjlj comes from a time before, when insanities like that snippet above were standard. but no time since the millenium do you see it instead of the cmp etc.
asciilifeform: on sane people planet i could determine this by reading the motherfucking docs
asciilifeform: i sawed it open to try an' see how thefuq the longjmp thing actually worx
mircea_popescu: well so then what are we disagreeing here about.
asciilifeform: at least where routine appears in a task
mircea_popescu: which is more compact ; and perhaps quicker too ?
mircea_popescu: (did i identify the same segment correctly ?)
asciilifeform: it's 'optimal' in the sense that it fucks the branch predictor less than an always-computed-jump
asciilifeform: mircea_popescu: it's what gcc does for small (i dun recall threshhold) computed switch
mircea_popescu: are you basically saying this sub eax, 1 ; test eax, eax ; jz loc_601 is optimal approach ?
asciilifeform prolly oughta have read the orig adb prior to doing this
asciilifeform: aah lol i'm a tard :
asciilifeform: the rest, i will omit, but also end up in the various unwindisms, aborts, stubs.
asciilifeform: var_B8 contains some flag. valid values are 0, 1, 2, 3. if 0, we go to loc_490. if 1, loc_601. if 2, loc_62d. if 3, loc_659. if above 4, program dies, cpu executes ud2 (guaranteed bomb) .
mircea_popescu: one that's due to the method, and the other that's due to the fact zcx was a lot narrowly-er massaged
mircea_popescu: rather, i can't shake this impression that sjlj saddles us with two segments of overhead
mircea_popescu: what's being tested ?
asciilifeform: i.e. loox like is a flag that triggers an unwind
diana_coman: I'm atm doing the inventory of ave1's versions of gnat scripts and apparently even 2018-05-29 relies on downloading stuff that meanwhile moved/vanished as they always do; moreover, I have the darned stuff , now need to figure out how to cut out the download and just point the script at local source, ugh
asciilifeform: cuz it's the 2nd half of a conditional ?
mircea_popescu: im sorry. why jz rather than jmp or w/e
mircea_popescu: why jz rather tthan sub ?
mircea_popescu: loc_44E: << this entire thing\
asciilifeform is prolly doomed to actually vivisect the gnat backend at some pt, prolly sooner rather than later
asciilifeform currently wondering wtf https://www.felixcloutier.com/x86/ud is doing in there
asciilifeform: ( will guess, tho i do not presently know, that these trigger unwind of stack )
asciilifeform: also loox like the ljmp variant puts abortism stub in erry proc (that appears in a task, that is)
mircea_popescu: why ty!
mircea_popescu: (the 52 is cuz i took the 13 items and multiplied by 4, forgetting that these are actually byte alligned not 64-bit alligned)
mircea_popescu: what does it do with the rest of the frame, from the bytes we see to the 184 ?
asciilifeform: cuz it keeps the where to longjmp in'ere.
mircea_popescu: no i know where you got the sub param from, what im asking is,
asciilifeform: mircea_popescu: the stack frame.
asciilifeform: i was discussing diana_coman's much earlier empirical find, that on ljmp stack fills faster per same # of calls. this here is why.
mircea_popescu: the observation that perhaps sjlj is not actually as tightly optimized as zcx is trying to percolate through my brain
mircea_popescu: look here : lines 1 through 9 in zcx add up to 13 bytes, yes ?
asciilifeform: ( to 0 measurable diff in ffa, oddly enuff, but on e.g. tiny micros might make a diff.. )
asciilifeform: but indeed the ljmp variant craps out slightly bulkier coad across the board
asciilifeform: the stack frame, that is
asciilifeform: ( before anyone asks, the 'unwind resume' variants are extern stdlib symbols, and i haven't looked to see how they differ yet )
asciilifeform: for thread-completeness : http://www.loper-os.org/pub/misc/zcx_proc_flow.png http://www.loper-os.org/pub/misc/ljmp_proc_flow.png
mircea_popescu: ftr that's 52 bytes (ha-HA!) vs 60 bytes.
diana_coman: hm, doesn't look that bad
mircea_popescu: so really just pushes two more regs is all.
asciilifeform: at the risk of log clutter, will put ftr :
asciilifeform: though, interestingly, only in the unit 'procs' , which actually contains exceptionisms
asciilifeform: diana_coman: btw your 'lick the 9v' intuitive observation earlier was correct, on ljmp variant the default stack frame indeed longer, 184byte vs 40
diana_coman: asciilifeform, the name of the tarball is correct; you'll have to change the name of the dir /put them separate
asciilifeform: diana_coman: the 'zcx' tarball contains a 'ljmp_calls' dir, same as other 1. which is correct ?
diana_coman: asciilifeform, here are the latest aka proc calls with 3 handlers per proc: ossasepia.com/available_resources/bins_calls_sjlj_adacoregnat.tar.gz and ossasepia.com/available_resources/bins_calls_zcx_adacoregnat.tar.gz ; let me know if you want anything else
diana_coman: asciilifeform, any preference re "pair of bins" i.e. the procedure calls or the loops of yest?
a111: Logged on 2019-02-11 01:33 hanbot: diana_coman fwiw i ran into a few broken internal links on ossasepia today on account of their still pointing to dianacoman.com, see http://ossasepia.com/2018/03/08/eucrypt-compilation-sheet/ fo' instance.
diana_coman: in other things, re http://btcbase.org/log/2019-02-11#1894896 -> this should now be fully sorted i.e. IP change for dianacoman.com propagated as far as I can see + redirection working fine for any link so please let me know if you still encounter trouble with any dead links; if you use only hosts (no DNS) then simply adding dianacoman.com on same IP as ossasepia should work seamlessly ☝︎
diana_coman: mircea_popescu and anyone else following along, here's the data from a set of runs with handlers from 0 to 3: http://p.bvulpes.com/pastes/9Cstd/?raw=true ☟︎
diana_coman: ah, that'd explain it, wouldn't it: by the time "programming" is direct translation of fuzzing into code, it'd possibly speed up, yes
mircea_popescu: prolly a bunch of try()catch semantics in "all programs"
diana_coman: I still don't see the boo-boo of docs i.e ~"all programs should see a great improvement running zcx" or how was it
diana_coman: sorry; I should know by now to not hurry up with data report even if it's just 2 runs
mircea_popescu: aok. i can take my tin foil off now.
diana_coman: all those tests are on Adacore's 2016 gnat, yes
diana_coman: no, fat-fingered it, 0 instead of 9 i.e it was 0.93 not 0.03; sorry about that; still running atm the 1 handler with sjlj and then will move on to 2 and 3 handlers
asciilifeform: this does not contradict the hypothesis re handlers tho ( i have only the 'last chance' handler ). it does suggest that sjlj does not speed up ordinary calls substantially tho.
asciilifeform finds that (using pre-ave1 gnat, where i currently can --RTS=sjlj ) no detectable diff in mod ex
asciilifeform goes to test..
mircea_popescu: ^even handled sjlj is not really that bad, 7us per call far far from end of world.
mircea_popescu: (in fairness though, no program ever does the sort of calling insanity we do here, so irl this may be very mild indeed)
mircea_popescu: it really blew my fucking mind! ZERO COST, they said!!!
asciilifeform nao wonders if there's a seekrit chest fulla ffa speedup in this dig
diana_coman goes to run and will be back with proper data
mircea_popescu: but that 0.9 vs 0.03 is popping the fuck out.
diana_coman: and yes, then I'll do with 2 and 3 handlers too
diana_coman: ugh, either I fat-fingered there or what; let me run that again ; (and possibly /me should really stop getting data *other* than in a nice plain table)
mircea_popescu: ie, what the docs don't say is the juciest bit at all : if you do not have extra handlers, zcx is MASSACRING you on calls.
mircea_popescu: specifically stated, this program takes to run : 1 with sjlh, no handlers ; 30 (up 3000%) with zcx, irrespective of handler count ; 5295 (a further 200% up) with sjlj and one extra handler.