asciilifeform: mircea_popescu: review the mr algo , it is actually surprisingly easy to ffaize, just replace all 'return true' with flag := flag OR true, etc
asciilifeform: then all ACCEPTED primes took exactly same number of cpu clocks, to produce.
asciilifeform: apeloyee: each individual test has to be fixedtime though. then -- yes.
asciilifeform: apeloyee: i see your point. either we dispense with the sieve, or decide to count from the moment after sieve.
asciilifeform: *variability allowed not in the test, but in output
asciilifeform: there must be no variability in the time the ~test~ takes.
asciilifeform: apeloyee: no contradiction. the variability of time is in the ~test~, not the output result , which naturally will vary depending on what rng gave you
asciilifeform: because on pc most of the wait time is for memory access.
asciilifeform: ( as in above case with knuth divider )
asciilifeform: so far almost all of my theoretical predictions re which optimizations will be worth the effort, were wrong
asciilifeform: so how do you propose to multiply anything modulo 2^(k+64) ?
asciilifeform: ( karatsuba, i will note for n00bz, parallelizes , but i deliberately omitted parallelization logic because i want ffa buildable on msdos and for machines with 1 cpu )
asciilifeform: a 2sec modexp is already a wholly fine replacement for koch's gpg, say.
asciilifeform: if ffa can be made to do 4096b modexp in 0.5s on typical comp, that gives ~1byte/msec purersa payload. which is enough for many purposes, e.g. voice.
asciilifeform: apeloyee: theoretically. but cache locality win from smaller memory segment sometimes gives surprising winning. the example above, for instance, gives 2x speedup rather than my predicted 25%.
asciilifeform: in the end might even release different variants that have different complexity tradeoffs.
asciilifeform: and then bernsteinian karatsuba, possibly, and whatever else i can think of.
asciilifeform: which i will also make, and decide if it was worth the cost
asciilifeform: it is! but much smaller than, for instance, the secretshift-barrett.
asciilifeform: ( unrolled comba would have explicit unrolled cases for 1,2,...,8-word operands )
asciilifeform: for instance unrolled comba wins 20-25% speed, but i did not use it in place of the generic because it is longer and harder to read.
asciilifeform: apeloyee: my strategy so far was to introduce moving parts very, very reluctantly ( started with egyptian multiplier, for example ) when there is absolutely no choice.
asciilifeform: in tito's case , and for that matter kim ir sen's -- 'throne is mine, i won it as partizan commander in the war, took no payola from foreign devils' was tru. but how did the shoemaker get ~his~ throne