log☇︎
96 entries in 0.167s
asciilifeform: in the flagship branch, it comes into play as base case when 8 or fewer words in multiplicand
asciilifeform: bvt: what i meant is, the optimal asmistic base case is not necessarily comba. could be 'micro' karatsuba, or even straight schoolbook mul. for so long as asm.
asciilifeform: !#s from:asciilifeform karatsuba
asciilifeform: verisimilitude: as a concrete example : you will find that ffa uses an unmoving hinge for karatsuba multiplication. consequently all numbers are required to occupy a space that is a power-of-two bits wide. but from this you get a 3-4x simpler mechanism.
asciilifeform: just as you can de-recursivize the karatsuba etc
asciilifeform: there was another fella (since wandered off) who posted last yr an attempt at de-recursivizing karatsuba ( it's definitely in the logs, can't find it just yet tho )
asciilifeform: same effect applies to karatsuba, and errywhere else.
asciilifeform: bvt, mircea_popescu : http://bvt-trace.net/2019/01/experiment-n-way-split-karatsuba/comment-page-1/#comment-6
asciilifeform: diana_coman: where this appears, it is a concession to 1) speed 2) brevity of code . where the indexing of the array is known, because it is a local buffer, it is referred to explicitly; invocations X'First, X'Last, etc cost measurable cpu, it turned out. but where indexing is not known ( e.g. if operand may be part of a karatsuba ) , there it is unavoidably X'First etc.
asciilifeform: diana_coman: where at all possible, i tried to use ~simpler~ algo than what was given by knuth. e.g. karatsuba without moving pivots.
asciilifeform: the other thing, if one is writing a proggy where only 1 ffawidth is used, it is possible to unroll all of the loops, derecursivize karatsuba, etc. and get straight line of instructions
asciilifeform: btw it is possible to dispense with FZ_LoMul , and simply discard upper chunk of ordinary karatsuba product. this costs ~10% cpuwise, and may be justified for some proggies.
asciilifeform: meanwhile, in spam lulz, http://www.eghtesadban.com/events/5560171/finite-field-arithmetic-chapter-12b-karatsuba-redux-part-2-of-2 ☟︎
asciilifeform: e.g. the karatsuba 'sandwiches' took coupla ~days~ of this.
asciilifeform: ( http://btcbase.org/patches/ffa_ch10_karatsuba/tree/ffa/ffacalc/cmdline.ads + http://btcbase.org/patches/ffa_ch10_karatsuba/tree/ffa/ffacalc/cmdline.adb for the curious )
asciilifeform: ( there was not an existing 'profile' corresponding to the degree of 'fascism' asciilifeform wanted , hence the bulk of http://btcbase.org/patches/ffa_ch10_karatsuba/tree/ffa/libffa/restrict.adc item )
asciilifeform: >> http://btcbase.org/patches/ffa_ch10_karatsuba << ty phf !
asciilifeform: and, to nobody's great surprise, i suspect, the ideal karatsuba threshhold turns out to differ for 'soft' vs 'iron' word-mul variant.
asciilifeform: this is the karatsuba ch. quite possibly the most complicated in the whole series. and no it can't be split.
asciilifeform: or for another example, take the ugliness and 'pointericity' of the traditional 'pivoting' form of karatsuba. which i killed by forcing all FZ bitnesses to be powers of 2.
asciilifeform: makes karatsuba cut cleanly.
asciilifeform: plenty of what i suspect 'kindergarten-level' (in hindsight, like karatsuba) -- unsolved.
asciilifeform: e.g. karatsuba
asciilifeform: ( multiplication parallelizes 3 ways , with karatsuba, and N ways with toom-cook )
asciilifeform: the 2 things i did not forbid (though wanted to) were recursion (can't... karatsuba) and implicit-loops (can't... array assignments)
asciilifeform: to simplify karatsuba and other algos.
asciilifeform: i still think that it makes sense to do this only after every other bolt is as tight as physically possible -- bernsteinian karatsuba, unrolled comba, etc
asciilifeform: ( this is quite attainable using the ffa we have ~today~, supposing one allows karatsuba to split 3way on 3 cores of whatever chip you have )
asciilifeform: http://btcbase.org/log/2017-10-08#1723064 << gcd does not need a karatsuba, the karatsubatron can be doing something else while gcd happens ☝︎
asciilifeform: ( karatsuba assumes length always divisible by 2 )
asciilifeform: ( karatsuba, i will note for n00bz, parallelizes , but i deliberately omitted parallelization logic because i want ffa buildable on msdos and for machines with 1 cpu )
asciilifeform: and then bernsteinian karatsuba, possibly, and whatever else i can think of.
asciilifeform: ( mainly, i suspect, by recognizing masses of 0 in karatsuba and returning 0 when they get mul'd )
asciilifeform: ( if it isn't obvious from where the error comes : observe the 3 Karatsuba_Term additions. in ordinary K., they walk over the upper half of XYLo ( lower half of result.) but in TopOnly K. we lose XYLo, so that carryolade is lost. )
asciilifeform: now! this procrusted-karatsuba is only used for the barrettron, so theoretically could compensate for that 3 with 3 additional subtractor-muxes. and still win ~4x speedup vs last night's . but this is mega-ugly.
asciilifeform: ( Karatsuba_Term is same for both )
asciilifeform: http://wotpaste.cascadianhacker.com/pastes/TgRkm/?raw=true << ordinary karatsuba, for convenient comparison.
asciilifeform: ^ this 'upper half only' karatsuba works, but the answer is always off by 0 to 3, because the carries from the bottom halves are ( recursively ) lost. somehow gotta be finessed.
asciilifeform: mod6: you will notice that the barrett in 'crc handbook' is more complicated : it shrinks the x and then compensates later. this relies on normalization , and constanttimeized incarnation of it would have to work as apeloyee described ( i'ma try it much later, once i see what can be had re speed strictly from having asymmetric karatsuba instead of the current mega-waste ) ☟︎
asciilifeform: incidentally ~95% of the work ffa does in modexp, now, is multiplication. which means that there is further 20-25% speedup waiting to be had when i get bernsteinian optimization for karatsuba ( haven't yet figured it out, he buried it deep in a paper , as if he were an alchemist, quite cryptically ) and another 10-20% optimization if we move to unrolled comba ( see august thread. )
asciilifeform: ( barrett with 8192b barretoids, i.e. 16384bit mult via ordinary symmetric karatsuba with simple brutal slice , rather than apeloyee's shift )
asciilifeform: it was about 100x slower than comba ( current basecase ) and ~150x slower than karatsuba+comba ( currently what we use )
asciilifeform: one by jebelean, 'Exact Division with Karatsuba Complexity' , possibly ffaizable.
asciilifeform: this is why it was so tricky to implement karatsuba.
asciilifeform: karatsuba terms a+b+c ( k. squaring, for simplicity. mult. has four of'em ). we want a+b+c mod m.
asciilifeform: ( this is the 'new' karatsuba, simplified, and demands powers-of-two-bitnessed operands. should be easier to read than old.)
asciilifeform: what we do not have -- and don't have because it doesn't actually work -- is the karatsuba cut for modular mult.
asciilifeform: and yes it'd be 133337 if it worked, we would have a direct modular equivalent of karatsuba
asciilifeform: 1) mircea_popescu describes algo for mod. 2) turns out exactly knuths's, that is in existing ffa 3) describes 'do it to each term of a+b+c in karatsuba' 4) this dun work, if it worked we would be bragging about the new 133337 recursive modular mult algo we've got
asciilifeform: ( one answer is that at the very bottom of the barrel, where karatsuba hits the basecase comba mult, we permit division )
asciilifeform: ( nao if only modulus were a distributive operation ! then could take mod for each of the 3 addition arguments inside karatsuba, and we'd be golden )
asciilifeform: http://wotpaste.cascadianhacker.com/pastes/uLQBe/?raw=true << karatsuba, for reference ☟︎
asciilifeform: ( because if you are doing anything other than karatsuba for mult of whatever variety, you're sunk )
asciilifeform: the approach i've been (futile, so far) taking is, to find a way to interleave modularization into karatsuba
asciilifeform: but in very other olds, apparently in an obscure article in '09 bernstein shows how to eliminate one of the middle-term additions of karatsuba . ☟︎
asciilifeform: but instead we can simply let Karatsuba_Term return the carry, and say c1, c2, c3 and in the end we ripple out c1+c2+(-1^subtractenable) .
asciilifeform: in other noose, how come nobody noticed that we can accumulate the carry in Karatsuba_Term, and ripple it out once instead of 3 times per firing ...
asciilifeform: ( specifically karatsuba's 3 prongs, are independent )
asciilifeform: ^ for readers who wondered why karatsuba is the 1 routine in ffa ~not~ inlined... think.
asciilifeform: despite 50% reduction in temp space used by karatsuba mult and square
asciilifeform: as for the other thing, right now we have a 'classical' karatsuba that permits odd splits
asciilifeform: recall also that, since we have karatsuba, cost goes up with W logarithmically, rather than quadratically
asciilifeform: if instead of 'mult of 64' we had 'powers of 2', we could dispense with the odd split in karatsuba ☟︎
asciilifeform: it would simplify many routines, in particular karatsuba
asciilifeform: recall, constanttime karatsuba did not (afaik) publicly exist before i posted it...
asciilifeform: there is not an equiv of karatsuba for it
asciilifeform: http://btcbase.org/log/2017-08-08#1695463 << i realized that this might not be true : the (empirically found, but seems to hold on all of my iron) threshhold for karatsuba +ev is > 8 words : ☝︎
asciilifeform: *karatsuba
asciilifeform: btw if you're actually doing something that doesn't need constanttime, you can simply put the obvious check-for-zero in the karatsuba and get 2-9000x boost for mul. ☟︎
asciilifeform: i am considering including a karatsuba squaring case - for 2x speed boost; but that's definitely it.
asciilifeform: not built for speed. but for simplicity. the only concessions to optimization (karatsuba, etc) are strictly what was necessary to get to where rsa happens in merely annoying, rather than geological, time .
asciilifeform: http://btcbase.org/log/2017-07-13#1682308 << comba is an O(N^2) multer. it beats the classical one simply because it doesn't need a scratch buffer of length N. it goes as the base case in karatsuba, not instead of it ☝︎
asciilifeform: generalization of karatsuba, but pretty useless for fitsinhead rsatron imho
asciilifeform: same figure sans karatsuba : 3.7s.
asciilifeform: in other noose ! nao we have comba's algo multiplier as basecase in karatsuba (currently threshold 8 words) , and http://btcbase.org/log/2017-06-21#1673165 becomes now 7.5sec ☝︎☟︎
asciilifeform: ( 8192bit exponentiation: ~10min with egyptological mul; 20.5s was with first-grade mul; 17.7 with karatsuba posted today ) ☟︎
asciilifeform: ( you physically cannot have karatsuba or anything of the kind if you cannot make FZs of several sizes. however the F remains, it is still impermissible to involve FZs of variant bitnesses in any arithmetical operation when invoking . )
asciilifeform: and, unrelatedly, karatsuba parallelizes (into 3 forks) without any substantial effort, so can also speed up whatever the bare bone speed ends up being, 3x
asciilifeform: and then i look in shitpile of paperz and find that some d00d actually did this, called it 'subtractive karatsuba', used in some microcontroller,
asciilifeform: and realized, while doing this, that in fact you don't need 2k+2 bits for the karatsuba intermediates, you can do instead of (x0+x1)*(y0+y1) , (x0-x1)*(y0-y1), and then you don't need to propagate carries, but only take absolutevalue and xor the borrows to see if gotta invert the resulting term ☟︎☟︎
asciilifeform: btw iirc this is an ~actual~ exercise in knuth -- to show that you still need karatsuba on a box where add(word,word) takes same time as mult(word,word)
asciilifeform: karatsuba et al weren't even working with computers. but wrestling with question of how much, abstractly, work, does multiplication per se entail.
asciilifeform: i'll review karatsuba here, ftr. suppose you have 2 L-bit numbers, X and Y, to multiply. you can multiply X*Y the usual way, is O(N^2). but instead anatoly alexeevich k. shows us that you can cut X into X0,X1, ceil(x/2) and floor(x/2) bits, respectively, and same to Y, -> Y0,Y1, and then you only gotta do THREE multiplications, X0*Y0, X1*Y1, (X0+X1)*(Y0+Y1)
asciilifeform: (fixed-space karatsuba)
asciilifeform: *karatsuba
asciilifeform sat with notebook on the grass near usg capitol, good weather, and was aaalmost getting somewhere in re fixed-time karatsuba, when suddenly massive RUSSIA OUT OF UKRAINE!1111 troop of 'protesters', drums, правый сектор ( ukronazi!111 ) flagz, etc, etc , marched past.
asciilifeform is haunted by the notion that constant-time karatsuba gotta exist SOMEWHERE
asciilifeform: nao we need a non-recursive (! , SPARK won't allow recursion nor is it constanttimesafe to have any), constant-time karatsuba...
asciilifeform: probably doomed to karatsuba
asciilifeform: http://www.mi.ras.ru/~karatsuba << subj.
asciilifeform: ( karatsuba, that is. knuth afaik zombies along still. )
asciilifeform: phf: karatsuba's algo is in knuth
asciilifeform: !s karatsuba anthem
asciilifeform: one of my first job interviews out of uni. telephone. a fellow from one of the giant gov. contractors was really intrigued that i know x86 asm., have reversed crud for money. i ask him 'what's the job'. he: automated reversing. me: of what. he: ever hear of karatsuba's algo? me: sure. bignum mult. him: well, we wanna find encryption softs on terrorist drives! ☟︎☟︎
asciilifeform: benkay: what, Modified Karatsuba Anthem?
asciilifeform: '... the judge then ruled that the death of a keyholder at the hands of his fork is to be considered 'assisted suicide.' when he finished belting out the last verse of Modified-Karatsuba Anthem, his secretary drew her Nagant... '