3100+ entries in 0.024s

Framedragger: there's an assumption as max num of collisions here, of course, but obvs in practical terms it's a very safe assumption...
Framedragger: yeah, makes sense to me (on average, current likelihood of particular 32 bit entry being populated is ~ < 6%)
Framedragger: hmh, at least functions make up not the *worst* interface seen, but still lotsa work and weird mutable shit sprayed all around, i imagine
Framedragger: yeah that's one reason i'm not too attracted to trb, tbh, the amount of sewage gruntwork required to decouple shit from the monolith.
Framedragger: i guess one can imagine a single sequence of tx then, simply.
Framedragger: it's a really Good Thing that the hashing function which spits out transaction hashes gives *uniform distribution*. no congestion / too many collisions expected, and this scheme leverages that.
Framedragger: (i see how good it is to be aware of how actual disks read data here. some theoretician would propose a pointer-exact-location scheme instead...)
Framedragger: ahh, yeah okay, back-to-back you mean exactly that, not having to allocate 1MB per block.
Framedragger: yeah i forget sometimes. fixed block length is nice for this...
Framedragger: well *that sounds like a very decent idea*. :)
Framedragger: yeah, given actual tx amounts.. 250mn vs. 2^32
Framedragger: right! ahh that's nice. (so just to clarify, the 1024 byte block trick wouldn't work if there's a collision (unless additional budget / w/e))
Framedragger: bear with my slowness, can you clarify how it looks like if there's a collision in the initial lookup?
Framedragger: have separate service taking care of that? i mean, kernel driver is this kind of 'externality', too (and also ring0)
Framedragger: here i have an ssd seek profiler which just needs root
Framedragger: at least i have the excuse of not having looked at the bdb problem / staying away from trb for the time being :p
Framedragger: is this the first time you articulated this approach here? i think that's the best on can have for fs-tx-db
Framedragger: this is quite nice, and as you say, seek operation already gives a small chunk which should cover most/all tx for current state of affairs (total number of transactions)...
Framedragger: oh i finally understood, literally all there is when one seeks to location 3ec455a2 is a list of block numbers. (or single block number.)
Framedragger: why the need for "the machine might have to try 2 or 3 blocks before it finds tx" then? and if so, then no guarantee of only 1 seek?
Framedragger: asciilifeform: wait, what is "block index"? just the integer denoting block number?
Framedragger: asciilifeform: hmm, very nice. i suppose it's as close to fixed-length as is possible given current bitcoin
Framedragger: trinque: it's just a kindergarten way of wrapping up some syscalls. will obviously benchmark outside it later. i wasn't completely certain that my tool wouldn't trash the host fs. :)
Framedragger: aha, right! so it's basically a (small) hashtable.
Framedragger: (yeah btw, just ftr, symlink *creation* under populated dir structure (`ln -s files_f1/block35461.txt dc/dc89c1f2b58909d3814b250a731a9b9b791b092759553e3ba6579ffaad3a7565`) is slow. however, the creation was done using shellscript, need to move to c to be able to actually profile with precision.)
☟︎ Framedragger: so the 'matching' (index lookup) is the 99% here, right?
Framedragger: for now, just generated 1mn symlinks with names corresponding to transaction hash hex.
Framedragger: what i want to do later when i find time is, actually read file, too, of course.
Framedragger: will get a way to test real disk soon, didn't want to run on personal trashy PC, hence shitty server
Framedragger: note, it's just some additional syscalls, re docker
Framedragger: orly? this is *ns* (10^-9), mind you. hm. and this is just resolution of path with single symlink in it
Framedragger spent longer than wants to admit sorting out his heap and valgrind'ing. too much python is bad for a person
Framedragger: getting ~4000-7000ns for symlink resolution to real path for a 1mn symlink dir structure, e.g.
Framedragger: i like my rc airplanes. "the will of history necessitates you to X" has a marx'ified hegelian vibe :p
☟︎☟︎☟︎ Framedragger: mircea_popescu: basically, and that's strictly it - because i couldn't intuitively wrap my head around the fact that average number of nodes per specific folder would be _really_ low if depth is say more than 3. still weird in my head, but yeah.
Framedragger: << (obviously these'd be more useful with actual empirical numbers of average/median seek times, writes, seek/write as things get congested, etc.)
Framedragger: (really kindergarten level simple but wanted to see this myself, could be useful for reference - unless it's incorrect..)
Framedragger: re. fs nodes, couldn't sleep + not sure if this makes sense, so just throwing these out - barebones super simplistic (function is `n_objects_to_store ^ 1 / folder_depth`) plots showing expected average number of nodes per folder (assumptions are no bias in hashspace and also equal share of hash bits per folder level) - it may not be intuitive how low the averages are until you look:
Framedragger: assuming equally distributed transaction hashspace, if you want your tree to fill up with 1000 nodes on average per given depth, you'd be storing 10^24 transactions. but this assumes that every folder depth gets assigned equal number of bits to represent, of course.
Framedragger: for symlink fs testers (or maybe selfnote for later): note that if you allow for sufficient folder tree depth, the "1000s of symlinks per dir" won't realistically happen when storing, say, bitcoin transaction hashes. the latter have 256 bits => 64 hex chars. if you allow for depth of 8 where last level (8) is symlink itself, you get 32 bits per folder level.
Framedragger: also, """But I appear to have a lingering effect that seems to have started from the time my /tmp directory had the millions of files in it.
Framedragger: yeah, okay; as long as it's not fixed-width trb-i, no way around this.
Framedragger: would this be performant enough even theoretically, given no way to use offsets?
Framedragger: (but maybe you covered that, too, and i forgot in logs.)
Framedragger: but with transactions, you can be sure that once it returns, it will have written to disk. fsync can still be on. (iirc).
Framedragger: (and if you now say 'db is lost cause anyway' while not linking to code/config *again*, i'll grit teeth angrily)
Framedragger: but i do hope you're doing the former, i mean i assumed so. that's the lowest-hanging fruit re. 'how do i do batch writes to db'
Framedragger: asciilifeform: on top of 'transactions', postgres has 'checkpoint' parameter. but you probably won't like it because of the whole 'not turning off fsync' thing
Framedragger: instead, it's "just" a matter of having a however-deep directory tree with symlinks as the leaves.
Framedragger: for a minute i thought (don't know why) that what is *additionally* needed is the capability to have paths of /symlinks/to/symlinks/.
Framedragger: oh wait, i phrased this incorrectly while at the same time horribly mis-reading: sorry, this is about max depth of path composed of symlinks.