593 entries in 0.217s
bvt: ('denver' is arm at frontend and vliw inside, dynamic jit tries to continuously improve translation: if you have a loop, the 1st, 100th and 1000th loop iterations can execute totally different vliw code)
bvt: http://btcbase.org/log/2019-03-09#1901064 << well, IMO nvidia's "denver" managed to beat even them. ☝︎
bvt: http://btcbase.org/log/2019-03-09#1901061 << iirc there were restriction on what regs can be used as base and index; another example of isa ugliness is MOV http://archive.is/w0IAC#selection-607.0-945.2 ☝︎☟︎
bvt: i.e. like this http://p.bvulpes.com/pastes/k88Le/?raw=true
bvt: ugh, *but it is missing under...
bvt: but it does but is missing under /usr/portage/...
bvt: i.e. there is really directory structure /cuntoo/portage/profiles/root/cuntoo/build/...
bvt: i confirm that it is both in genesis and is a valid path. i'm testing live cuntoo though, have no access to bootstrap env currently
bvt: something like i.e. /cuntoo/portage/profiles/root/cuntoo/build/usr/portage/profiles/features/musl/use.mask
bvt: trinque: i also confirm that under /cuntoo/portage/profiles there is a directory structure that corresponds to my bootstrap environment
bvt: mircea_popescu: i guess the message did not parse correctly. i'm not proposing c the programming language; i meant something similar to "option c" from the list of possibilities you outlined in thread http://btcbase.org/log/2019-02-17#1897638 ☝︎
bvt: http://btcbase.org/log/2019-03-01#1899820 << will have a look at mes; so far potential deadline -- next weekend. ☝︎☟︎
bvt: i do not expect to see a lot of exceptions from sjlj
bvt: re sjlj/zcx: i guess i'd stick with a variant of c (http://btcbase.org/log/2019-02-17#1897694): sjlj by default, use zcx only when facing high overhead from sjlj AND where this is possible due to threading model. ☝︎
bvt: http://btcbase.org/log/2019-02-17#1897636 << as long as i need userspace C code to run, I'll be happy to use musl. it's not bugs-free, but its bugs have very different nature than glibc's. i see no reason for keeping glibc on non-toiletboxes. ☝︎
bvt: http://btcbase.org/log/2019-02-23#1898712 << i would also like to have a deeper look at this 'mes' item. is this ok? ☝︎
bvt: i can try killing that code to see what happens -- i don't even understand why zcx won't just work, and there is no information on this problem in the whole internet ☟︎
bvt: ftr, disable-aborts-on-zcx commit from 2003: http://archive.is/qXpQs
bvt: asciilifeform: no, did not. my attempt was to use polling pragma, but mircea_popescu made it clear that it's not an option at all.
bvt: diana_coman: it seems so, the code for ignoring aborts on zcx was added in 2003 and not touched since that time, so i agree with "broken by design"
bvt: also, at least for vxworks, adacore for certified profiles supports only sjlj, while using zcx for non-certified use-cases ☟︎
bvt: zcx is a recent item, https://www.usenix.org/legacy/events/osdi2000/wiess2000/full_papers/dinechin/dinechin_html/index.html
bvt: hello. i also tried to find information on why zcx is broken, but not sjlj -- did not find anything specific
bvt: i have also seen a one guy at suckless complaining about this back in the day: https://lists.suckless.org/dev/1605/28871.html ☟︎
bvt: but version 4.9 looked healthy (i.e. plain c, did not see any cpp code there). i had a look at a single file, though, so this is no guarantee.
bvt: i have seen reports of this, but never verified myself. my understanding is that there is a slow c++zation of gcc: i backported one gcc patch for my home system from 6 to 4.9, this involved removing c++ chunks.
bvt: it seems to be quite a new item, iirc less than a year old. i'd expect it does not support arm. can't say anything about bugs without trying out.
bvt: not yet, but can give it a shot sometime next week
bvt: hi, asciilifeform: did you see this https://www.gnu.org/software/mes/ ?
bvt: trinque: booted just fine after messing with kernel config a bit, which is expected from a previously not used machine
bvt: nope, zcx. sjlj can the next step
bvt: ran the tests for exceptions race, libgcc is fine in gcc4.9, locks are in place, so it seems that it is indeed another gcc5ism
bvt: hello. quick report: i bootstrapped avegnat using asciilifeform's tar (thanks!), had to change the the paths in linker scripts and la-files, will make a post on that.
bvt: trinque: genesis (does not verify) from successful cuntoo deployment: http://bvt-trace.net/src/genesis-14.02.2019.vpatch ☟︎
bvt: thanks asciilifeform, downloading
bvt: static version should be fine. then would also try it on cuntoo (have it running, the genesis signature still does not match).
bvt: thanks!
bvt: improved beyond repair?
bvt: maybe there are two similar issues that are both 'cured' by switching to dynamic linking, but currently i don't think so. i'm using zcx runtime for these tests.
bvt: it goes from crashing once in 3-5 runs to crashing once in approx 1000; however i've also seen deadlocks, which may be worse stuff to deal with than an honest crash.
bvt: diana_coman: it did not, as it was a clear hack to just make things work http://bvt-trace.net/src/gthr-disable-weak.diff
bvt: i don't think it gcc5-specific, the patch against this problem that i've seen was written for gcc 4.8
bvt: however i don't understand if this is possible only to achieve only for gnat components, and when i do it globally, i get 'undefined symbol' when linking with libgcc
bvt: they disabled weak symbols for c++/fortran components, http://port70.net/~nsz/musl/gcc-5.3.0/0004-gthr.patch
bvt: i still have no solution for this, afaik musl authors solved the problem for fortran and c++, but gnat seems to lack equivalent knob
bvt: i.e. with static linking, all locks are compiled into noops.
bvt: (still tested only with gnat2017, but this is a different story; i see no reason to believe that ave1gnat does not have the same issue)
bvt: http://bvt-trace.net/src/test_task_exceptions.tgz - test source for those who want to test. to be run as 'while true; do echo -n .; ./adatests >/dev/null || break; done' -- should not take too long to have it segfault.
bvt: which is broken with static linking due to usual 'creativeness' of gnu folx https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libgcc/gthr-posix.h;h=88cbc23937ec20b15b35c5adb7f9983282c6f084;hb=HEAD#l247
bvt: unwinding stack for exceptions requires taking locks https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libgcc/unwind-dw2-fde.c;h=24b4ecee68c17e1701c4482580e449b03a4e6fe9;hb=HEAD#l1046
bvt: http://btcbase.org/log/2019-02-12#1895598 << i'd like to briefly report on this: at this point i'm sure that there is a race condition. i'll do a post tomorrow morning on it, but short summary is ☝︎
bvt: hello. ave1: doesn't makefile.in in gcc-interface tweak it with sed to the necessary value? ☟︎
bvt: it won't we poll-killable if the whole loop is in cpp/asm. if just a linear code - should not be too bad; sjlj should still work better.
bvt: asciilifeform: sure, cpp code won't be instrumented. the 'polled' mode was iirc in the guts of the ancient linux threads implementation. ☟︎
bvt: may be. tbh i don't think i can reason on this q based on only the test code.
bvt: you can enable polling per scope/loop, don't have to do it globally. re cpp things - true.
bvt: musl-cross-make
bvt: no, they explicitly tell in the docs that you have to rebuild the runtime to use it on other systems
bvt: yes, the only reason i did not switch yet is the convenience of m-c-m for development (i.e. getting i386 and aarch64 builds ready using it took me a few minutes of config time)
bvt: built myself, based on ave1s patches and musl-cross-make (https://github.com/richfelker/musl-cross-make/ , i had some experience with it so did the work rather quickly).
bvt: myeah, i will drop gnat2017. did not manage to make ave1's running quickly today, will switch on ~thursday
bvt: ^ i got some really weird results here, would appreciate if someone tested the last snippet (constraint_error) on ave1's gnat.
bvt: i expect it will be slower, but it won't hurt to do the check. the impact will depend on how exceptions are used (i don't think it can have any impact on ffa, for example). but i don't have enough experience with it to provide any numbers ☟︎
bvt: during the gnat build, the sjlj runtime is built, so it should be possible to switch to it and test. ☟︎
bvt: mircea_popescu: i will become blog post, but when i get to the home machine.
bvt: diana_coman: i agree; in the abort signal handler, there is a snippet of code that ignores the signal (given ZXC exception handling model).
bvt: not really, just writing from the salt mine.
bvt: hello, thanks for voice
bvt: this is true, rigth.
bvt: you can't verify output of 'write 4096 adds' by hand, though ☟︎
bvt: ave1's fix also fixed the 'undefined references' in static lib with gcc-6.3 for me
bvt: writing such code really could use some program to generate the necessary amount of adds/subs
bvt: aha, i see. this would also involve lots function call inlining as well
bvt: i guess i could do some experiments here. the immediate question is that ffa does plenty of FZ_Adds with different FZ'Length, so full unrolling would not really work (unless i miss something).
bvt: would this actually be efficient on current arches? this would stress instruction decoder a lot
bvt: are such machines even build these days? i.e. for dsp, or other use-cases? ☟︎
bvt: https://www.dyalog.com/uploads/conference/dyalog12/presentations/U23_MultidigitAlgorithms/Optimization%20of%20parallel%20multi-digit%20algorithms.pdf
bvt: actually, that fft-like multiplication algorithm was developed in APL
bvt: perhaps some old APL machines would qualify
bvt: i wonder how this is a valid argument even there. if nothing reads the flag, there are no pipeline dependecies.
bvt: agreed. on x86_64 is compiled to ADD; SETC sequence, but who knows what happens on other arch/gcc version.
bvt: i'll be afk an hour or two
bvt: i.e. W_Borrow/W_Carry still cause the overhead?
bvt: http://p.bvulpes.com/pastes/q0ffh/?raw=true - some things can be done with gcc specific features, but i guess asming is cleaner ☟︎
bvt: my understanding is that asmism would go only for lower-level ffa code, i.e. barret/modexp will remain as-is.
bvt: will do. so far i had a very brief look at it
bvt: myeah, complexity does not go too well with both constant-time and fits-in-head.
bvt: asciilifeform: a better attempt at this algorithm can involve using comba at the lower level.
bvt: mircea_popescu: thanks
bvt: :-)
bvt: no, i mean that particular paper. i initially wanted to implement a fft-like multiplication algorithm, but got interested in this karatsuba/toom when reading the paper.
bvt: nope, but there were a few mentions of other algorithms in the logs, so i decided to have a look. don't remember how i arrived at this particular one
bvt: re fft: i would wait until a use-case appears, to at least understand what are the requirements. do you have something in mind?
bvt: yes, actually in this karatsuba/toom algo one can embed comba's trick, but the code would become even gnarlier
bvt: also, the linked pdf contains one FFT-like algorithm i considered to implement (using walsh transform for convolution instead of fft). but it'd be even more complex
bvt: asciilifeform: yw
bvt: i would not be surprised it was 20% slower, but 2x was surprising for me
bvt: yes, and it's also true for memory accesses. however the number of executed instructions also increased a lot, which makes me suspect that there is something i missed
bvt: will provide more seals after i work with chapters 7-9 more.
bvt: http://bvt-trace.net/vpatches/ffa_ch4_ffacalc.kv.vpatch.bvt.sig http://bvt-trace.net/vpatches/ffa_ch5_egypt.kv.vpatch.bvt.sig http://bvt-trace.net/vpatches/ffa_ch6_simplest_rsa.kv.vpatch.bvt.sig