log☇︎
201800+ entries in 0.121s
mircea_popescu: wait, putin bombed his own loyal subjects ? what is he supposed to be, bahamas ?
a111: Logged on 2017-04-05 18:06 Framedragger: quite sure that it's possible to dump dom in selenium but in any case, yes dumping it seems like a prerequisite, like what archive.is does and what phf said above
phf: http://btcbase.org/log/2017-04-05#1638105 << dumping dom is not hard (it's innerhtml attribute on document.body/head), extracting images/objects is harder. used to be able to do it using canvas, but not there's security provisions. you literally have to go into the browser's cache (via internal apis) to got the graphic, etc. back. or else redownload, which is rife with potential issues ☝︎☟︎
CompanionCube: why would putin bomb their own place like that
asciilifeform: before the corpses cold, even.
asciilifeform: sop narrative, the pantsuit folx are automagically 'putler did it'
Framedragger: mircea_popescu: lenin returned to russia 100 years ago in april. COINCIDENCE?
asciilifeform: major wank there, also
Framedragger: i wonder if "can run archive requests on 'uncleaned' (i.e.: already possibly infected) VM" could be allowed for. it's not exactly a gpg-signed-msg timestamping service.
mircea_popescu: more liek being the whore.
asciilifeform: in the end you're just as dead.
asciilifeform: yes, will take you 9001 shots for the roulette revolver to fire.
Framedragger: asciilifeform: do you mean that "single archive request handled by dedicated process which terminates at end of request" wouldn't be enough in terms of cleanup? (due to js as vector of attack to machine?)
asciilifeform: the loading per se, with civilian browser + e.g., selenium, is trivial. the ~cleanup~ (run in vm? restore state how ?) is the tricky bit.
mircea_popescu: danielpbarron for the record it's snippet not snippit
mircea_popescu: danielpbarron what format is the mysql database stored as, sql ? its native binary whatever it is ? ☟︎
Framedragger: quite sure that it's possible to dump dom in selenium but in any case, yes dumping it seems like a prerequisite, like what archive.is does and what phf said above ☟︎
mircea_popescu: part of the problem being that common browser instructed to "Save page" will proceed to save the html rather than the dom, or however you call the ast in www.
mircea_popescu: i suppose in principle could exist as an "proper archival browser extension". i suppose very close to what ppl like browsershots etc do
Framedragger: (and that use case existing is good signal in this case)
asciilifeform: it's only a 'remote control' for traditional www browser.
Framedragger: CompanionCube: that's just one use case, to be clear
asciilifeform: i have a cl interface to selenium written from some years ago, i used it to cheat spammers
Framedragger: asciilifeform: tru.
Framedragger: worx. (heavier than scrapy in that can handle 'need js to click on 'next' button' logic)
CompanionCube: doesn't seem that good of a fit
CompanionCube: isn't that for automatic website testing
mircea_popescu: Framedragger would be nice, but conceivably you'd be stuck bolting js to links or such
mircea_popescu: else any random rustard can throw a wrench in your whole thing. "gotta support fdlkgjkldfjl!!~11"
mircea_popescu: yeah. it sounds like the right thing but in the process of so sounding hides under a welded shut hood all the design trade-offs which'll need to be made.
asciilifeform: 'deedbot for arbitrary www links' sounds like the right thing.
mircea_popescu: i suppose this is part of the problem of "wtf were the design goals again ?" -- swiss knife syndrome, many things for many people at many different times.
mircea_popescu: but the majority of those webpages were republican rather than imperial.
mircea_popescu: at least to my eyes, archiving to date was there to provide a sort of cheaper and lighter deedbotting for webpages.
asciilifeform: that's largely what archivetron was ~for~, to date, neh
mircea_popescu: honestly i'm not even sure i actually want to archive, eg, fake news sites.
doppler: ^-- that's what I was referring to.
asciilifeform: the use of 'flash' to output ~text~ articles -- has, afaik, mercifully, died.
mircea_popescu: well i didn't want to archive "artists" flash bs in 2007 either
asciilifeform: mircea_popescu wants to archive browser-tetris ?!
mircea_popescu: doppler flash is stiull live and well in the "browsere games" niche. heck, adobe recently made new linux flash (after years of stfu)
CompanionCube: asciilifeform: problem is that ipads will display different to desktop
asciilifeform: precisely, it's why asciilifeform has not touched the problem.
mircea_popescu: im not even sure wtf the design goals would be, which is why we're wallowing
doppler: I'm so glad flash player sites are mostly gone now, but JS has just taken its place really
asciilifeform: it's the only way to 'perma-archive EXACTLY what plebes see'. supposing that is a design goal.
CompanionCube sometimes wishes we could banish entirely-JS 'websites' from the face of the earth
asciilifeform: 'and, bonus! evil TERRORISTS! can't archive it!'
mircea_popescu: half literate fucktards, all typefaces are new to them all the time.
mircea_popescu: "why THE FUCK do you want to mess with the fonts" "because we truly have nothing to do with our time" "Do you understand reading speed decreases 3x if the reader has to deal with unfamiliar typefaces ?" "uuga booga"
mircea_popescu: (this happens a lot more than anyone sane would on his own power imagine)
phf: right, not to mention competing rendering engines ("system wide library doesn't kern glyphs the way our ui designer thinks is appropriate so we do it ourselves!!1")
mircea_popescu: it's the way of the future ; everything is connected etc.
asciilifeform: 'wanna know the output? run this massive turing-complete barrel of shit'
phf: a sentence like "hello world this is test" might get an invocation like render("world this") followed by render("hello") followed by render("is test"), simply because higher level widget engine decided that's the order of exposure, or hierarchy, or whatever
mircea_popescu: phf provided eg "java runtime fonts" aren't there on top of x11-fonts etc
phf: asciilifeform: there's ~one~ true type library, but it's called at random times to render a small part of the page, so by simply following the invocations you won't be able to reconstruct how the individual results fit into the on-screen
mircea_popescu: sorry i can't hear you over the sound of hillary clinton's pantsuit.
doppler: what about the website-viewer's right to make their own decisions?
mircea_popescu: asciilifeform not the case anymore afaik.
mircea_popescu: doppler but you know, when people get self-determination and the right to make their own decisions, everything improves.
asciilifeform: mircea_popescu: actually i have a fairly accurate idea. but last time i opened the reactor cover, at least there was ~1~ copy of truetype shitfest running per box...
doppler: too bad web developers ever received the power to control the rendering of their sites so closely
Framedragger: mircea_popescu: not familiar / wouldn't know. my exposure to the whole thing was literally just "found relevant library; hop on irc to ask the author; chat for a while; realise he's blind; ask about his experience"
phf: but it doesn't invovoke truetype ~as one last pass~. instead you have fifty different truetype invocations to lay out a small surface, that's placed into an hierarchy of such surfaces
mircea_popescu: phf> state of the art for blind folx is misserable. << about same as in 1997.
asciilifeform: so you catch it there.
asciilifeform: phf: at some point it invokes the truetype engine, neh
phf: asciilifeform: that is not though how rendering works on "modern os". unlike x there's no central authority on glyph rendering. instead you have layers of surfaces that each app manages on its own.
asciilifeform: and fuck the display.
asciilifeform: anything that asks a glyph to be drawn on the raster display -- instead ends up creating a 'haha, glyph G was asked for' record.
phf: basically agent on top of internet explorer/others that, with a specially annotated page (ARIA standard) can make the experience usable
Framedragger: i can ask one such folk ('camlorn on #libaudioverse - has his own 3d audio library, competent at what he does). i recall him explaining the shitshow that was getting cs degree by translating cs paper pdf (horror)
asciilifeform: what if you were to use a patched glyph renderer ?
phf: state of the art for blind folx is misserable.
asciilifeform: could try picking up ~that~, and reusing.
asciilifeform: i wonder what the state of the art for blind folx is.
mircea_popescu: asciilifeform the ocr idea is miserable, sadly, because so many glyphs and nonsense.
mircea_popescu: asciilifeform> ideally you want it running in a qemu-like thing with randomly-generated instruction set. << this may be overkill. could as well run it in a rom-os machine or something.
asciilifeform: but consequently suffers other problems (it is trivially blocked, and is in fact trivially distinguishable in real time from 'civilian' browser)
phf: doesn't have to contain rather. i'm sure they don't scrape it diligently enough.
asciilifeform: yeah, i suspect that it is built along the lines described earlier
Framedragger: asciilifeform: as phf said, archive.is output does not contain js. just to clarify.
asciilifeform: and preferably in a designated public toilet
asciilifeform: the js, if it executes (and to archive heathenry, it generally must) must execute ONCE.
mircea_popescu: ah i see alf addressed. it's an exercise in typical expertsexchange wankery, total misunderstanding of engineering etc.
Framedragger: phf: ah, but i meant the initial rendering phase - the 'archive this plz' process itself. but thanks for clarifying yeah
asciilifeform: (today it calls load_party_line_from_washinton() and it gets one thing; tomorrow -- another)
asciilifeform: i.e. a js-containing turdball is not self-contained
asciilifeform: and not merely from the 'js 0day' pov !!
phf: Framedragger: yes, archive.is is a headless webkit. it loads the page with all the resources, it lets the javascript run until the DOM is in some "final" state, it snapshots the DOM. at this point you no longer need javascript to further render page ☟︎
mircea_popescu: ben_vulpes it's not measurably bad. theoretically it weakens the problem of "spend somone else's inputs".
asciilifeform: i don't actually know of a sane way to do this job. which is why i have not ever attempted to do it.
asciilifeform: this would solve the 'blocked ip' problem. but at the same time introduce new ones.
asciilifeform: which l1 folx can then sign and upload.
phf: Framedragger: you don't have to expose the user to js with archive.is approach
asciilifeform: Framedragger: imho a more reasonable thing would be a command line thing, a kind of augmented curl, that yields a tarball
Framedragger: so, if tmsr were to have its own archiver, i don't think archive.is' approach is the way to go, even though it is (arguably, maybe) the most 'reliable' / 'true' (actual js rendering in browser). exposing user to JS defeats half its purpose. imho.
phf: also, these days you have js packing frameworks (like browserify) doing frontend on-demand js loads, which means that ~browser~ doesn't know when page load is complete. you get "lightweight" pages, that don't actually have anything until 8mb of javascript gets its shit together.
Framedragger: but this web