log☇︎
210200+ entries in 0.137s
Framedragger: ahh, yeah okay, back-to-back you mean exactly that, not having to allocate 1MB per block.
Framedragger: yeah i forget sometimes. fixed block length is nice for this...
asciilifeform: because you can point inside'em in the index.
asciilifeform: now you can store them back-to-back
asciilifeform: this also abolishes the cost of 'budget 1MB per block'
asciilifeform: see, we store the blocks linearly
asciilifeform: Framedragger: the 1024bt thing works in all cases, it simply means that you don't have to point to beginnings of blocks, but to a place very near the tx
Framedragger: yeah, given actual tx amounts.. 250mn vs. 2^32
asciilifeform: considering that we have, what, 200 mil tx; and l1 table gives you 4billion slots.
asciilifeform: this form will almost certainly suffice for another decade of trb
Framedragger: right! ahh that's nice. (so just to clarify, the 1024 byte block trick wouldn't work if there's a collision (unless additional budget / w/e))
asciilifeform: the most basic setup has an l1 which immediately proceeds to the 'slow' table (where you have to look at multiple offsets in case of collision, to find your tx)
asciilifeform: 'look at next' can also mean 'go to l3', if you're willing to budget another 16G.
asciilifeform: and if set, you gotta look at the next machine word (each entry has a certain amount of space for collisions)
asciilifeform: where if top bit of the entry is not set, you have your where-to-go-to-find-tx-on-disk result
asciilifeform: if you're 'rich', it can be a table entirely like l1
asciilifeform: which can take 2 basic forms, depending on your storage budget
asciilifeform: it goes to l2 table
Framedragger: in this case.
Framedragger: bear with my slowness, can you clarify how it looks like if there's a collision in the initial lookup?
asciilifeform: point being, to point at a tx sitting on a multi-TB disk, you don't need to be able to address it ~bytewise~
asciilifeform: which costs us 0 because no disk made in last 20 yrs reads any less than 4kB on 1 seek.
asciilifeform: so long as we are ok with having to chug through 1023 bytes of 'not here yet!'
asciilifeform: Framedragger: alignment in this case means that we can point ~inside~ blocks, supposing they are stored linearly on disk
deedbot: http://phuctor.nosuchlabs.com/gpgkey/1B44E66975AB5EE7B22A779D4B31F07CE8CE336A45DE02B74BAD206055C62130 << Recent Phuctorings. - Phuctored: 1711...0703 divides RSA Moduli belonging to '93.90.180.151 (ssh-rsa key from 93.90.180.151 (13-14 June 2016 extraction) for Phuctor import. Ask asciilifeform or framedragger on Freenode, or email fd at mkj dot lt) <ssh...lt>; ' (Unknown DE)
deedbot: http://phuctor.nosuchlabs.com/gpgkey/1B44E66975AB5EE7B22A779D4B31F07CE8CE336A45DE02B74BAD206055C62130 << Recent Phuctorings. - Phuctored: 1594...7037 divides RSA Moduli belonging to '93.90.180.151 (ssh-rsa key from 93.90.180.151 (13-14 June 2016 extraction) for Phuctor import. Ask asciilifeform or framedragger on Freenode, or email fd at mkj dot lt) <ssh...lt>; ' (Unknown DE)
asciilifeform: so we dispense with the O(N) dumb search inside-block
asciilifeform: or, rather, to where-to-start-looking-for-tx
asciilifeform: then a 31bit (remember, highest bit means 'go to l2 table') offset gives you a 2^^41 bit space in which your table entry can point to the beginning of a block.
asciilifeform: btw Framedragger , we can do 1 better than 'block index'
asciilifeform: (and thereby a 16GB l1 table would give you wholly in-memory index for ~all tx lookups)
asciilifeform: it is statistically possible that we don't even have ONE collision yet ☟︎
asciilifeform: ok, now question for ben_vulpes , trinque , mircea_popescu , et al : anybody got a quick c proggy that will eat blkcut's blocks and produce a linear list of tx ? i'd like to actually calculate the current 32bit tabular collision rate.
asciilifeform: ideally you would even have separate disks for blocks and tx indices.
asciilifeform: from the one you are running from
asciilifeform: also this is probably obvious, but you want separate disk for this
asciilifeform: ...rather than to have any serious number crunching in kernelspace
asciilifeform: the Right Thing would probably be to have a very simple kernel driver that takes a specially-marked disk partition and gives userland trb linear use of it, as plain array ☟︎☟︎
Framedragger: have separate service taking care of that? i mean, kernel driver is this kind of 'externality', too (and also ring0)
asciilifeform: (you may as well stuff it in the kernel, then)
asciilifeform: who the fuck wants trb running as root
asciilifeform: Framedragger: if you want your own fs, and to write to contiguous disk, you do
asciilifeform: i dare say the tabular algo is simple enough to actually be implementable.
Framedragger: at least i have the excuse of not having looked at the bdb problem / staying away from trb for the time being :p
asciilifeform: if you want to actually have the contiguous 16GB
asciilifeform: it has obvious minus, in that you need a kernel driver
asciilifeform: yes, first time, i kept waiting for someone to open a schoolbook and describe this very elementary algo
Framedragger: is this the first time you articulated this approach here? i think that's the best on can have for fs-tx-db
asciilifeform: you'll have somewhere between 0 and 2 , statistically, tx per 32bit table entry.
asciilifeform: Framedragger: it's good for many, many times the current cumulative-tx
Framedragger: this is quite nice, and as you say, seek operation already gives a small chunk which should cover most/all tx for current state of affairs (total number of transactions)...
asciilifeform: if you're willing to blow another 16GB, you can have an l2 table ☟︎
Framedragger: oh i finally understood, literally all there is when one seeks to location 3ec455a2 is a list of block numbers. (or single block number.)
deedbot: http://phuctor.nosuchlabs.com/gpgkey/7FF10FFD5E7F7EE6C2497D20452061FB8713C20F904A7A86B21EE093A5E7D254 << Recent Phuctorings. - Phuctored: 1797...3053 divides RSA Moduli belonging to '79.98.16.61 (ssh-rsa key from 79.98.16.61 (13-14 June 2016 extraction) for Phuctor import. Ask asciilifeform or framedragger on Freenode, or email fd at mkj dot lt) <ssh...lt>; ' (a61.ip.network-consulting.fr. FR)
deedbot: http://phuctor.nosuchlabs.com/gpgkey/7FF10FFD5E7F7EE6C2497D20452061FB8713C20F904A7A86B21EE093A5E7D254 << Recent Phuctorings. - Phuctored: 1391...6403 divides RSA Moduli belonging to '79.98.16.61 (ssh-rsa key from 79.98.16.61 (13-14 June 2016 extraction) for Phuctor import. Ask asciilifeform or framedragger on Freenode, or email fd at mkj dot lt) <ssh...lt>; ' (a61.ip.network-consulting.fr. FR)
Framedragger: why the need for "the machine might have to try 2 or 3 blocks before it finds tx" then? and if so, then no guarantee of only 1 seek?
asciilifeform: block where tx lives.
Framedragger: asciilifeform: wait, what is "block index"? just the integer denoting block number?
asciilifeform: ( or, in the case of ext4 or other konsoomer fs, fuck-knows-how-many ! )
asciilifeform: so then you have 1 disk seek per tx, rather than 2.
asciilifeform: also on a reasonable box you can keep the l1 table in ram
asciilifeform: Framedragger: it is literally the smallest number of moving parts i can think of , that'd do the job.
Framedragger: asciilifeform: hmm, very nice. i suppose it's as close to fixed-length as is possible given current bitcoin
Framedragger: trinque: it's just a kindergarten way of wrapping up some syscalls. will obviously benchmark outside it later. i wasn't completely certain that my tool wouldn't trash the host fs. :)
asciilifeform: this whole thing would probably fit in ~1,000 lines of c.
asciilifeform: however it does have the advantage of not using 10,000,001 lines of ext4 open sores crapolade for anything.
asciilifeform: but, thing is, it won't work any better on mechanical disk than the old bdb. and imho it is quite impossible to make variable-length tx bitcoin work well on mechanical disk, with modern spamola level.
asciilifeform: anyway this is the simplest physically-possible scheme.
asciilifeform: how to ~write~ to this data structure is, i think, obvious to the reader, i do not need to describe.
asciilifeform: and there we find ~lists of block indices~, so the machine might have to try 2 or 3 blocks before it finds tx.
asciilifeform: BUT if there are collisions, the upper bit of the 32bit entry, is set. and then we know that the lower 31, are an index into another contiguous array
asciilifeform: after which finding the actual tx is an O(N) operation in the block, with small N (the tx in the block)
asciilifeform: if there are no collisions (i.e. there is no other tx on the disk whose id begins with 3ec455a2) then that entry in the array IS the block index where the tx lives.
asciilifeform: now you store the table as follows: the top 32 bits (e.g., 3ec455a2 from above) are an array index into this table ☟︎
Framedragger: (yeah btw, just ftr, symlink *creation* under populated dir structure (`ln -s files_f1/block35461.txt dc/dc89c1f2b58909d3814b250a731a9b9b791b092759553e3ba6579ffaad3a7565`) is slow. however, the creation was done using shellscript, need to move to c to be able to actually profile with precision.) ☟︎
asciilifeform: picture a 16GB contiguous block of ssd. you cut it up so that it corresponds to 2^32 x 32bits.
asciilifeform: btw i will also put down in the log, one very simple possible algorithm for a 'txidx-fs' : ☟︎☟︎☟︎
Framedragger: so the 'matching' (index lookup) is the 99% here, right?
asciilifeform: and likewise when a new block comes, all of the tx in it, get indexed this way.
asciilifeform: ... so that a 256-bit turd, e.g., 3ec455a2a84e978687a3990cec73f36b324fbd28e03603c6d9fc52018b001558, can be taken and matched to a block # where said tx lives. ☟︎
asciilifeform: it exists (wallet idiocy aside) in trb for 1 purpose and 1 purpose only
asciilifeform: let's put down in the log, what exactly ye olde bdb does, that eats 99+% of trb's wall clock:
Framedragger: for now, just generated 1mn symlinks with names corresponding to transaction hash hex.
Framedragger: what i want to do later when i find time is, actually read file, too, of course.
asciilifeform: to rephrase, this would tell something useful if we had with what to directly compare it with -- what operation in ye olde bdb would this measure correspond to ? ☟︎☟︎
Framedragger: will get a way to test real disk soon, didn't want to run on personal trashy PC, hence shitty server
asciilifeform: and on top of what (ext4 ? )
Framedragger: asciilifeform: call to `readlink()`.
asciilifeform: does this include the actual read ?
Framedragger: orly? this is *ns* (10^-9), mind you. hm. and this is just resolution of path with single symlink in it
Framedragger spent longer than wants to admit sorting out his heap and valgrind'ing. too much python is bad for a person
asciilifeform: that's slightly slower, looks like, than the old db.. ☟︎
Framedragger: getting ~4000-7000ns for symlink resolution to real path for a 1mn symlink dir structure, e.g.
Framedragger: http://btcbase.org/log/2017-03-10#1624048 << feltbad, so wrote that stupid symlink and fs profiling tool that no-one wanted to do. results later. while at it: anyone knows if CLOCK_MONOTONIC has sufficient resolution for profiling? asciilifeform? allegedly - yes. ☝︎☟︎
deedbot: http://phuctor.nosuchlabs.com/gpgkey/0376A22D7C6DDA3C6BCF95FADF960352C32B7EC823CB87C4197DEE78BA0E4D9F << Recent Phuctorings. - Phuctored: 1580...0613 divides RSA Moduli belonging to '213.182.76.82 (ssh-rsa key from 213.182.76.82 (13-14 June 2016 extraction) for Phuctor import. Ask asciilifeform or framedragger on Freenode, or email fd at mkj dot lt) <ssh...lt>; ' (213-182-76-82.ip.welcomeitalia.it. IT)
deedbot: http://phuctor.nosuchlabs.com/gpgkey/0376A22D7C6DDA3C6BCF95FADF960352C32B7EC823CB87C4197DEE78BA0E4D9F << Recent Phuctorings. - Phuctored: 1489...1137 divides RSA Moduli belonging to '213.182.76.82 (ssh-rsa key from 213.182.76.82 (13-14 June 2016 extraction) for Phuctor import. Ask asciilifeform or framedragger on Freenode, or email fd at mkj dot lt) <ssh...lt>; ' (213-182-76-82.ip.welcomeitalia.it. IT)
trinque: connecting trb to it now
asciilifeform: trinque: you gotta log in as shown in the recipe
trinque: asciilifeform: getting rejected by dulap with that ssh key I gave ya.
mircea_popescu: i suppose that's why they have that "excluding present company" device.