log☇︎
382500+ entries in 0.263s
asciilifeform: the reason is that every time the surgeon begins to cut, he sees 'inoperable tumour'
asciilifeform: as in, cutting all the shitgnomatic crud until we have something like the lions book again ?
asciilifeform: why do you think no one has to date succeeded in 'bonfire of the unix' ?
jurov: can you point me to a line number? i fail to see it there
asciilifeform: i suppose this is not a bad time for one of my usual batshit monologues:
asciilifeform: jurov: a good chunk of that complexity is on account of O(N) operations on strings made necessary by variads
punkman: why does diff need to know about words?
asciilifeform: punkman: it could work if no one were to ever use anything else. but the distinction between 'this is a binary blob and is to be cut into octets' vs 'this is a text blob and is to be cut into words' would remain
jurov: asciilifeform: i see thirs of the code dealing with various "ignore whitespace" kludges
asciilifeform is l0lzing over how we are replacing the asciilifeform vs mircea_popescu debate on udp, with jurov playing the part of asciilifeform and asciilifeform playing the part of mircea_popescu ☟︎
asciilifeform: all of them traceable to variadic chars
asciilifeform: where we learned that yes, it is retarded in 1,001 ways
asciilifeform: and yes, jurov found the diff thread
asciilifeform: http://git.savannah.gnu.org/cgit/diffutils.git/tree/src/util.c << a good chunk of the complexity IS THERE BECAUSE characters are EVER variadic ☟︎
jurov: http://log.bitcoin-assets.com//?date=20-08-2015#1245519 there is i18n which behaves as described here ☝︎
asciilifeform: punkman: this is disingenously true
phf: jurov: so did i, and i appreciate utf-8 in my music selection or random durp i download from the internet. i wouldn't care if there's some frequent surrogate issue with track file names, but i do care that i can't have a professional tool where that sort of questions have predictable, if not necessarily easy and fun answers
jurov: also as sysadmin had to configure samba to deal with them..nopenopenope. utf8 was definite improvement.
jurov: around 2000 i had on my system files and filenames in cp852, koi8, iso8859-2 and cp1250 encodings... no thanks
phf: i personally prefer the lc_ctype=koi8-r solution, but that's because i'm a retrograde :p
phf: jurov: that's a question to ascii, as far as "better". my point is that "us-ascii" vs "utf-8" in v/vdiff has greater implications
jurov: i mean, remove extended chars from everythign that goes in
jurov: otherwise you have to keep stuff 8-bit clean throughout, is that better?
phf: some hypothetical b-a os and internal consistency of the whole model. if you say "we support utf-8" here, you're also saying "we support unicode x.y" everywhere ☟︎☟︎
phf: but i'm not talking about diff specifically, i'm saying that whatever decissions are being made about diff now are going to influence the kind of work we'll have to do in the future ☟︎
jurov: instead of adding kludge to diff, be it extra step
jurov: you're expected to make it consistent before passing in to diff
jurov: and such tgins existed before, consider a="\t" vs b=" "
jurov: that's the braindamaged part, which should be dealt separately
jurov: *every* time i did anything nontrivial with #include <string.h> it ended up by drawing stuff on paper ☟︎
jurov: btw, that "string" = "byte array" thing, specifically people who like to do pointer arithmetic around it caused enough grief i won't miss it at all
jurov: punkman: yes because there's whole unicode machinery implemented which ppl rightfully abhor
jurov: and "concept of string".. something has to give, in this case "string"===="byte array"
phf: i think the argument against that is that ascii is trying to build a b-a system starting with small tools out. if you're claiming to support unicode in your diff, that necessarily means that "done by other tool" is automatically included in the scope of future b-a work
jurov: the normalization, sorting, etc. would be done by other tool
jurov: diff would compare just the entities and would expect normalized both inputs the same way
phf: a typical solution is then, like you said, ignore 90% of unicode standard, which means that, instead of having a clean model to work with, you have a c++ solution of "we support this particular subset of c++, because the other parts we think are braindead"
jurov: phf i proposed just ignore them
phf: jurov: variadic anywhere in your string handling machinery means that you no longer have a meaningful concept of string, i.e. nth printable character is an nth object in array, instead everything that you manipulate is a stream with a state machine attached. be that utf-8 at the encoding level, or ucs-4 at the storage level which brings in surrogates and supplementary characters
jurov: like the msdos cases, 8bit streams were preserver, but the rest was utter mess
jurov: that shitgnomes use variadic encoding as a means to append bloat is not failure of said encoding. they'd find another outlet easily
asciilifeform: and 10-20x as many gnarly crevices in which to hide off-by-ones, etc
asciilifeform: but aside from this, it makes for 10-20x the lines of code for me to read.
jurov: you'er basically insisting that anyone who wants to do 8bit stuff in unix should sit themselves on a stake instead, or resort to ms-dos style wankery
asciilifeform: ever wonder what is all of this RUBBISH that is in a typical unix that ISN'T found in the lions book ?
assbot: Logged on 22-10-2014 02:38:31; asciilifeform: othernubs`: see ubiquitous 'lions book' that we're imitating: http://v6.cuzuco.com/v6.pdf
asciilifeform: jurov: do you own a copy of the lions book ?
asciilifeform: while a unix with weird shit hanging off the side is not a woman, but a bloke with cock lopped off
asciilifeform: it is a severe impedance mismatch with the unix philosophy.
jurov: oh it won't behave sanely to your standars, anyway
jurov: they can be ignored.. and you supposedly ignore this argument :(
asciilifeform: jurov: so whatchawaiting for, write us a sane diff that behaves sanely on a typical unix box ?
jurov: because they opted to do other encodings, nat'l sorting, normalized/unnormalized chars, etc.
asciilifeform: jurov: go and find the src for gnu diff (it is in the logs) and see how much of the bulk and library load garbage is on account of non-latin.
jurov: and in any other cases where you don't have to choose exact glyph printed, you can ignore 99% of unicode spec
asciilifeform: jurov: the hard part is not diffing (diff readily diffs binaries)
jurov: not just printed. i want to see diff working over it
phf: jurov: i'm actually just trying to bridge your position and ascii's since i'm pretty indifferent personally to outcome of the decision. i used to latex by typing funny us-ascii combinations to produce cyrillic metafont renderings, and i also am intimately familiar with what's involved in support "other encoding, nat'l sorting, normalized/unnormalized chars", and it's a pita.
jurov: well, you're bound to see your software broken by people who don't agree with you then ☟︎
asciilifeform: i favour the old msdos solution instead
asciilifeform: as a native of what latinate folks regard as likewise a 'hieroglyphs' language, i STILL favour the abolition of 'hieroglyphs display on EVERY box' ☟︎
jurov: no need to pull huge libs
jurov: sigh. code to convert utf8 stream to wide char and back is 50 lines. with diff, one can safely ignore everything else, other encoding, nat'l sorting, normalized/unnormalized chars, etc.
phf: utf-8 is a solution if you're ok iwth variadic encoding (though i've read many convincing arguments that utf-8 is new jersey brain dead), but the main issue is that full support of unicode is impossible withtout massive amounts of machinery. and you're going to butt head against the internal storage format and all the varieities of ucs-*, and ~then~ you're going to run into issues of normalization and multiple mappings and such
asciilifeform: but we haven't such a thing.
asciilifeform: now, a system with built-in notion of bignums (variable-length integers), e.g., lispm, does not suffer from any of this gnarl.
asciilifeform: and the variadic encoding is intrinsically gnarly
asciilifeform: phf: even utf8 is a non-solution because there remain multiple glyph mappings
asciilifeform: it is quite basic to the concept, to the extent there ever was a concept behind it.
phf: jurov: problem is that there's no such thing as "any text file". a text file necessarily has an encoding, and your options are either standardize on an encoding (the way plan9 did it, with going all utf-8) or else carry massive machinery for supporting a range of encodings from utf-8 to shiftjis
asciilifeform: it goes to the heart of how unix works
jurov: i understand but it does not mean it must be done that way
asciilifeform: jurov: witness what unix proggies that try to be 'special encoding'-aware look like.
jurov: you know text was retarded from the very beginning
jurov: so? it is now bound to longint with special encoding
asciilifeform: but it appears that all of the magic in the world did not suffice to retrofit a non-retarded notion of text onto unix
asciilifeform: see, if symbolics corp. had won the war and the notion of 'character' was not braindamagedly bound to 'octet' - we would not be having this conversation.
asciilifeform: layering workarounds into the canon is precisely how we ended up with ball of shit.
jurov: i want diff to be usable for any text files, not just code ☟︎
asciilifeform: it is what civilized people use when they program a computer.
phf: plan9's diff handles utf-8, the implicit decision here is that we're sticking to us-ascii
asciilifeform: jurov: there are specially-made tools for this (see the screenshots on mircea_popescu's www in the phuctor 3-parter article re: what i used)
jurov: why shouldn't i be able to diff files with extended characters?
asciilifeform: funny how we keep having this thread.
asciilifeform: no such thing in there
punkman: asciilifeform: is that because of the LC_ALL=C in vdiff?
asciilifeform: where do i meet all of these mythical people who own a computer, use gpg, but do not know enough english to operate it without this garbage ? ☟︎
asciilifeform: (all over the place)
asciilifeform: and holy fuck, enough utf8 hieroglyphisms to kill five elephants
asciilifeform: ;;later tell mircea_popescu didja know that it is impossible to produce a v-genesis for gnupg-1.4.10 ? it contains unprintable turds, yes ! e.g., the idiot 'localizators', po/*.gmo ☟︎
assbot: Google threatens action against Symantec-issued SSL certificates following botched investigation | PCWorld ... ( http://bit.ly/1Mz394P )
asciilifeform: 'Wikileaks is not offering a search of Cryptome - the files are hosted on their server as a honeypot for snatching user data for who knows what. There are subtle sneaky differences that give it away...' ☟︎
assbot: After guilty plea, judge confused as to why prosecutors still want iPhone unlocked | Ars Technica ... ( http://bit.ly/1RF2eO5 )
shinohai: Felicidades v_diddy btw, had my head in the clouds the past few hours.
jurov: mircea_popescu: in your quest for other non-derpy irc chans, have you tried #lisp? ☟︎
mircea_popescu: england wasn't part of the party, not at that time.
mircea_popescu: how do you do that from oxford, you swim the channel or something ?
mircea_popescu: THIS is why peasant chicks wistfully wisted for l'etudiant. because it is fucking fascinating, you are seriously telling me you can ~just go~ ?!?!?!
mircea_popescu: whereas the vagabond could carry all his possessions in his head!
mircea_popescu: this was immense - at the time a peasant who wanted to move, provided he somehow managed, had to fucking drag his entire possessions, on a rug. no wheels even, because too fucking techy for a peasant.