4700+ entries in 0.011s
Framedragger: i have no doubt that a sillicon valley version would be a datacentre of mongodb nodes which constantly fail and corrupt data, and cost millions. :)
Framedragger: extracting rows identifiable by 'id > $num' also becomes super slow?
Framedragger: there's a slave/clone db, it gets updates efficiently from master.
Framedragger: quite sure the diff'ing / updates were thought out thoroughly, i.e. time complexity is constant.
Framedragger: asciilifeform: have you profiled an
http request to a permalinkable phuctor page? where's the bottleneck? curious if you could insert a thing into flask which crawls through everything and stores locally.
Framedragger: (again, many phuctor pages will simply timeout, iirc. but maybe can adjust; and still worth doing.)
Framedragger: by 'it', i mean scriba submitting to archive.is
Framedragger: mircea_popescu: right, will get it done. i had started on it, got sidetracked by the python encoding problem, and the got sidetracked by other stuff. need to re-trace, and will first do the archival bit.
Framedragger: (i thought someone was supposed to retroactively archive all links in all logs of all times?)
Framedragger: asciilifeform: and the permalink pages identifiable via fingerprint, are they generated by the flask backend, too?
Framedragger: asciilifeform: why can't a separate box be set up to just crawl through all of phuctor pages, and then determine which of them are 'static' / won't ever change, for starters. and re-query the dynamic ones (at least the /phuctored) every $x amount of time
Framedragger: that would indeed be nontrivial and would take quite a bit of your time
Framedragger: that's the question - will you have to actually do that.
Framedragger: mircea_popescu: just fyi archive.is fails and/or timeouts on some pages
Framedragger: asciilifeform: (instead of doing a more obscure "query-able phuctor-and-stuff db" thing i could help with some kind of phuctor-public-display-vps infrastructure setup. just sayin'. thing's not clear in my head.)
Framedragger: no doubt that's priority asciilifeform, and will patiently wait
Framedragger: i've read, i am doubly interested due to vagueness of said broadcasts :p
Framedragger: a couple of vps providers i used have nice APIs
Framedragger: with regards to vps i could help, if help/hands are needed. i know you have other priority stuff asciilifeform. also, don't know what the meta-priority level here is. (i.e., compared to other projects etc)
Framedragger: and then imagine, deploying 'display' vps would become simpler still.
Framedragger: or, caching server makes hits itself, and generates html.
Framedragger: those sibling pages, why can't they hit once, and html be generated, too.
Framedragger: have a way (rss or better version of rss, or whatever) to sync it every $n hours.
Framedragger: even if more than that - all of that shit can be cached, static html pages. maybe i'm oversimplifying.
Framedragger: i don't know why you need true real time, tbh.
Framedragger: i'm not saying that going for cheapest ad-hoc option is accetabru. just, a display box showing static content needs much less.
Framedragger: asciilifeform: wouldn't be sure re. costs. vps can be $5/$10 a month, and stuff i used for ssh key crawling (scaleway) can bill hourly
Framedragger: to the point of having a ready-made system image (no, does not imply need to use docker), deployable at vps center in a matter of minutes.
Framedragger: (by other people's sites i mean sites that i'm responsible for.)
Framedragger: because 1) other sites' experience may be impacted, and 2) phuctor db would place some load on things. why = because i'd create a few indices, those would hog some memory, and assuming users want to do quite a bit of sorting etc, would take some cpu time as well. just sayin'. nothing scientific.
Framedragger:
http://btcbase.org/log/2016-11-19#1570793 << current loggotron also runs on vps, and in itself it requires very few resources. no db use, even. at this point there's a bunch of stuff and other people's sites running on that vps, i don't feel comfortable adding additional load.
☝︎ Framedragger: mircea_popescu: (syn flood suspicion because sockets don't respond with anything, even when possible to establish tcp connection. and yes ping does seem to work.)
Framedragger: ben_vulpes: i'll think about this next week. it may be that it's not really needed, yeah - i agree. and integration of things such as ssh server banners with phuctor keys is something that asciilifeform said he'd be up to do on phuctor's db itself..
Framedragger: i mentioned js in relation to WoT as it's more applicable there (lots of ready-made libraries for discrete graph visualizations and so on)
Framedragger: asciilifeform: well, less so visualizations than easy ways to query and analyse data. but, i agree with mircea_popescu and yourself that concrete merits are to be discussed. maybe it's not needed.
Framedragger: and yeah, i guess one should go with svgs, mike_c used <path> and it was fine
Framedragger: regarding visualization, a more condensed question: if a javascript-using thing were delivered, would this be hated upon (and berated by asciilifeform) and accepted if otherwise good and properly maintained, or hated upon and dismissed (and berated by asciilifeform)? :)
☟︎ Framedragger: ben_vulpes: oh hmm, i've never used it, very interesting and thanks for the pointer
Framedragger: re. visualization, i like stuff like this (mouse over on labels around the circle), but it's a hella lot of JS, and i share the hate towards the latter:
http://bl.ocks.org/mbostock/7607999 - what's nice about btcalpha visualization is that it uses by-now standard html5 canvas directives (<path>) with no need for JS.
☟︎ Framedragger: asciilifeform: at least with public-static and phuctor boxes being separate, you'd have access to the latter if it were private. (but i guess you could object with "private/undisclosed box on the internet, what is this oxymoron!")
☟︎☟︎ Framedragger: i suppose the idea could be to re-implement that, but using deedbot's view of WoT, and add additional things as desired.
☟︎ Framedragger: fair 'nuff. guess it depends on agreed-upon processes and overall mindsets of team, and so on...
Framedragger: mod6: sure! maybe "assignee" would have the desired (lesser) connotation, i don't know. coming from some trac feature/bug tracking in distributed teams experience, 'owner' is there interpreted as simply 'person who is ultimately responsible for implementing/fixing this', with other collaborators invited and acknowledged
Framedragger: i hear you, examining them ourselves (in some automated fashion or w/e) would have been prudent. "trust the public to do it", uh :/
Framedragger: asciilifeform: by links break you mean that they are unreachable for extended periods of time? but boxes with cached static content can be armoured against ddos more, no? i guess unless you maintain that this is indeed syn flood or equivalent, which is agnostic to whether content up the stack is dynamic or sttaic...
Framedragger: still, the permalink-able key pages can be cached and/or served from somewhere else, no?
Framedragger: 'providing public services' does not necessarily imply 'provide them on the very box which does the important stuff'.. (though i hear you re. your 'ability to provide caching to live feed' concerns..)
Framedragger: asciilifeform: seems that i can make it open 80/tcp, but it won't give me any data
Framedragger: ($afk item #1 is "coffee grinder is a bit shit and requires force, i don't want to use force in my mornings.") :)
Framedragger: okay, i'll sit on it, meanwhile have $afk stuff to do, and i'll update next week re. what i'd like to work on, if anything.
Framedragger: that's true! this wouldn't be anything amazeballs.
Framedragger: i guess another thing is, it would be nice for #trilema folx to be able to test their own phuctor-related hypotheses without having to download all the data. but, it's not as if it's exabytes of data or anything...
Framedragger: i mean, tbh i should give asciilifeform those banners. VPS DB idea can wait.
Framedragger: asciilifeform: i know - i hope to have time next Wed to look into this
Framedragger: i suppose a more simplistic thing to do would be for me to fish out those banners, convert into decent format if needed, and to give to whoever wants :)
Framedragger: again i screwed up re. phrasing. i'd import the banners from the data i have.
Framedragger: well, you wanted to run, say, DISTINCT on banners. it sure would be great to do it in a non-hacky way, and for others to allow to do the same, no?
Framedragger: so there would be no decision making needed as regards user friendliness / abstract buttons
Framedragger: mircea_popescu: the first 'portal' would allow users to run read-only sql queries.
Framedragger: mircea_popescu: yes, if by pipe you mean, bulk-import into VPS, not torture phuctor's own db 24/7.
Framedragger: i'm unclear what to do with 'live data feed', but i don't think it's a problem. it could just sync with phuctor at timed intervals.
Framedragger: ah - i wasn't clear. db would live on this VPS.
Framedragger: mircea_popescu: by db i mean a postgresql populated with phuctor data.
Framedragger: second 'portal' would be more simplistic and would, first off, be a simple way for me to present some non-gimmicky visualizations, e.g. what asciilifeform suggested some months ago.
Framedragger: hosts the phuctor data - p, q, e - and metadata - including country codes from geoip where applicable.
Framedragger: there would be, as per my current plan, two 'portals'. one would expose the db (postgres). current plan: ths would use phppgadmin. it's maintained and stable. user would make use of a read-only db role. so you could run sql queries on the whole thing. "whole thing" = sql schema which
Framedragger: okay - i wanted not to over-commit, but this is indeed pretty damn vague. will think and amend.