382500+ entries in 0.263s

jurov: can you point me
to a line number? i fail
to see it
there
punkman: why does diff need
to know about words?
jurov: asciilifeform: i see
thirs of
the code dealing with various "ignore whitespace" kludges
phf: jurov: so did i, and i appreciate utf-8 in my music selection or random durp i download from
the internet. i wouldn't care if
there's some frequent surrogate issue with
track file names, but i do care
that i can't have a professional
tool where
that sort of questions have predictable, if not necessarily easy and fun answers
jurov: also as sysadmin had
to configure samba
to deal with
them..nopenopenope. utf8 was definite improvement.
jurov: around 2000 i had on my system files and filenames in cp852, koi8, iso8859-2 and cp1250 encodings... no
thanks
phf: i personally prefer
the lc_ctype=koi8-r solution, but
that's because i'm a retrograde :p
phf: jurov:
that's a question
to ascii, as far as "better". my point is
that "us-ascii" vs "utf-8" in v/vdiff has greater implications
jurov: i mean, remove extended chars from everythign
that goes in
jurov: otherwise you have
to keep stuff 8-bit clean
throughout, is
that better?
phf: some hypothetical b-a os and internal consistency of
the whole model. if you say "we support utf-8" here, you're also saying "we support unicode x.y" everywhere
☟︎☟︎ phf: but i'm not
talking about diff specifically, i'm saying
that whatever decissions are being made about diff now are going
to influence
the kind of work we'll have
to do in
the future
☟︎ jurov: instead of adding kludge
to diff, be it extra step
jurov: you're expected
to make it consistent before passing in
to diff
jurov: and such
tgins existed before, consider a="\t" vs b=" "
jurov: that's
the braindamaged part, which should be dealt separately
jurov: *every*
time i did anything nontrivial with #include <string.h> it ended up by drawing stuff on paper
☟︎ jurov: btw,
that "string" = "byte array"
thing, specifically people who like
to do pointer arithmetic around it caused enough grief i won't miss it at all
jurov: punkman: yes because
there's whole unicode machinery implemented which ppl rightfully abhor
jurov: and "concept of string".. something has
to give, in
this case "string"===="byte array"
phf: i
think
the argument against
that is
that ascii is
trying
to build a b-a system starting with small
tools out. if you're claiming
to support unicode in your diff,
that necessarily means
that "done by other
tool" is automatically included in
the scope of future b-a work
jurov: the normalization, sorting, etc. would be done by other
tool
jurov: diff would compare just
the entities and would expect normalized both inputs
the same way
phf: a
typical solution is
then, like you said, ignore 90% of unicode standard, which means
that, instead of having a clean model
to work with, you have a c++ solution of "we support
this particular subset of c++, because
the other parts we
think are braindead"
jurov: phf i proposed just ignore
them
phf: jurov: variadic anywhere in your string handling machinery means
that you no longer have a meaningful concept of string, i.e. nth printable character is an nth object in array, instead everything
that you manipulate is a stream with a state machine attached. be
that utf-8 at
the encoding level, or ucs-4 at
the storage level which brings in surrogates and supplementary characters
jurov: like
the msdos cases, 8bit streams were preserver, but
the rest was utter mess
jurov: that shitgnomes use variadic encoding as a means
to append bloat is not failure of said encoding.
they'd find another outlet easily
jurov: you'er basically insisting
that anyone who wants
to do 8bit stuff in unix should sit
themselves on a stake instead, or resort
to ms-dos style wankery
jurov: oh it won't behave sanely
to your standars, anyway
jurov: they can be ignored.. and you supposedly ignore
this argument :(
jurov: because
they opted
to do other encodings, nat'l sorting, normalized/unnormalized chars, etc.
jurov: and in any other cases where you don't have
to choose exact glyph printed, you can ignore 99% of unicode spec
jurov: not just printed. i want
to see diff working over it
phf: jurov: i'm actually just
trying
to bridge your position and ascii's since i'm pretty indifferent personally
to outcome of
the decision. i used
to latex by
typing funny us-ascii combinations
to produce cyrillic metafont renderings, and i also am intimately familiar with what's involved in support "other encoding, nat'l sorting, normalized/unnormalized chars", and it's a pita.
jurov: well, you're bound
to see your software broken by people who don't agree with you
then
☟︎ jurov: no need
to pull huge libs
jurov: sigh. code
to convert utf8 stream
to wide char and back is 50 lines. with diff, one can safely ignore everything else, other encoding, nat'l sorting, normalized/unnormalized chars, etc.
phf: utf-8 is a solution if you're ok iwth variadic encoding (though i've read many convincing arguments
that utf-8 is new jersey brain dead), but
the main issue is
that full support of unicode is impossible withtout massive amounts of machinery. and you're going
to butt head against
the internal storage format and all
the varieities of ucs-*, and ~then~ you're going
to run into issues of normalization and multiple mappings and such
phf: jurov: problem is
that
there's no such
thing as "any
text file". a
text file necessarily has an encoding, and your options are either standardize on an encoding (the way plan9 did it, with going all utf-8) or else carry massive machinery for supporting a range of encodings from utf-8
to shiftjis
jurov: i understand but it does not mean it must be done
that way
jurov: you know
text was retarded from
the very beginning
jurov: so? it is now bound
to longint with special encoding
jurov: i want diff
to be usable for any
text files, not just code
☟︎ phf: plan9's diff handles utf-8,
the implicit decision here is
that we're sticking
to us-ascii
jurov: why shouldn't i be able
to diff files with extended characters?
punkman: asciilifeform: is
that because of
the LC_ALL=C in vdiff?
assbot: Google
threatens action against Symantec-issued SSL certificates following botched investigation | PCWorld ... (
http://bit.ly/1Mz394P )
assbot: After guilty plea, judge confused as
to why prosecutors still want iPhone unlocked | Ars
Technica ... (
http://bit.ly/1RF2eO5 )
shinohai: Felicidades v_diddy btw, had my head in
the clouds
the past few hours.
jurov: mircea_popescu: in your quest for other non-derpy irc chans, have you
tried #lisp?
☟︎ mircea_popescu: how do you do
that from oxford, you swim
the channel or something ?
mircea_popescu: THIS is why peasant chicks wistfully wisted for l'etudiant. because it is fucking fascinating, you are seriously
telling me you can ~just go~ ?!?!?!
mircea_popescu: whereas
the vagabond could carry all his possessions in his head!
mircea_popescu: this was immense - at
the
time a peasant who wanted
to move, provided he somehow managed, had
to fucking drag his entire possessions, on a rug. no wheels even, because
too fucking
techy for a peasant.