51900+ entries in 0.029s

mircea_popescu: "we dunno wtf happened, i guess
to a certain mind events centered on jan 13th suspicious as fuck, we have a new kernel and we redid
the bios just in case, dunno what more can be possibly done" is, or atleast i guess will have
to be, acceptable.
mircea_popescu: do you suppose bios could benefit from a reflash ? if nothing else,
to have
thermal / crash logging as expected ?
a111: Logged on 2019-01-14 16:48 asciilifeform: after we get
to
the bottom of UY1 issue, i'ma make sure
that all iron owned by pizarro has asciilifeform-baked kernel in place.
mircea_popescu: i recall folks asking, and you saying ok, rather
than "you know what... i don't even recall who made
this kernel, maybe i remake it when i have
time befgore oking
this"
mircea_popescu: quite well snapped
to
the peculiar idiocy of a certain band of peculiar idiots.
mircea_popescu: but yes, i run systems which had 0 unexpected reboots, and i've
thrown out components / replaced / redesigned systems over unexpected reboots.
mircea_popescu: not proposing
this is foolproof or anything ; not strictly speaking ~impossible~
thermal
trip went unnoticed.
mircea_popescu: you'd just notice box went from spewing 50s
to spewing 55s suddenly. or w/e.
mircea_popescu: there's a discrete flow of air from each rack
that
they measure.
BingoBoingo: asciilifeform: I asked. If you have some
targeted questions I will be happy
to ask
them.
mircea_popescu: same fucking
thing is
the case in ~every dc i eve rsaw,
there's a line of sensors above
the racks, and can
tell whether box is working 30s, 40s or 70s
mircea_popescu: you ever been in one of
those parking lots where
they have devices
telling you how many free spots per isle/level, and red/green light above
the individual spots ? without, magically, having a rod up your driver's ass.
BingoBoingo: <mircea_popescu> i ~also~ find it peculiar your dc wouldn't have alerted you in case of
thermal
trip. because in general
they have sensors. <<
There was a ground fault alarm
tripped in
the datacenterś fire supression system over
the weekend, but
the
time doesn's line up with
the beginning of
this reset crisis.
mircea_popescu: i suppose
this is an academic discussion. in any case, i've had warnings re hot boxes from dcs before, it's not a wholly unheard of item. laser sensor costs ~nothing, and hvac management is 60% of what
they do for a living.
BingoBoingo: I
took off a light layer of particulate. When I opened
the chassis I found it matched
the photos from when
the FG were installed.
mircea_popescu: the isle cooler
tends
to notice if rack x is spewing out 200C
mircea_popescu: i ~also~ find it peculiar your dc wouldn't have alerted you in case of
thermal
trip. because in general
they have sensors.
mircea_popescu: photo or no photo. did you
take your own weight in gunk out of
the fans over
there ?!
mircea_popescu: and
then, when i went
to clean it, i found it dirty ; as opposed
to clean.
mircea_popescu: asciilifeform i did. let me
tell you how it behaved : box went down. upon reboot it went down again, in
the following manner : every
time it was rebooted, within a finite
time interval (bout an hour). no exceptions.
mircea_popescu: asciilifeform
this observation merely begs
the question of "why".
mircea_popescu: indulge me. so
the
theory goes
that an event with a probability inferior
to 1e-4 / day occured
three
times in
two days ?
mircea_popescu: ie, a fan stoppedf by itself, and
then started working again, by itself ?
mircea_popescu: (i
think
the
third reboot was actually you guise, or not ?)
mircea_popescu: so
to get
this straight, your "most likely explanation" points
to... ram failure resulting in kernel panic...
twice ?
mircea_popescu: i never saw a mobo bust a cap and
then boot by itself again
tho. besides, he'd see a busted cap i imagine.
mircea_popescu: weren't you shipping a bunch of new rams
to make it ?!
BingoBoingo: From what I understand
the ram came with
the chassis
mircea_popescu: i'm confused, you bought used ram ? i seem
to recall a discussion...
mircea_popescu: outside of hard drives, and capacitors on OVER FIFTEEN YEAR OLD motherbopards, i have not witnessed
this wonder myself, of failing hardware.
mircea_popescu: i confess i have nfi what makes you
think commodity hardware failed in
this case.
mircea_popescu: so
then as a factual matter, if asciilifeform
threw out erry box
that rebooted "by itself" for no apparent reason, pizarro would be missing uy1
mircea_popescu: this stance is consistent with, inter alia, republican practice -- we moved variously boxes off providers who kept rebooting
them "mysteriously"
trinque: BingoBoingo: signaling my willingness
to pay, not my condoning of
the outage.
BingoBoingo: <mircea_popescu>
the one concerning bit is whether indeed pizarro still owns
that box or not. <<
This very much concerns me
BingoBoingo: trinque: My
though on
the month is
that
the money being paid for shared hosting is very real
to our customers, and we lack a firm hour count on how many customer uptime hours have been lost.
mircea_popescu: the one concerning bit is whether indeed pizarro still owns
that box or not.
trinque: ftr I don't need a month's comp for a few hours of outage,
though a few hours of outage does suck.
BingoBoingo: During palm
touch
tests before cleaning fans
the warmest part of
the chasis was near
the RAID card, by a margin
that
though small registered on my skin. Do we have a way
to instrument
the RAID card.
trinque: conspicuous bit is various folks having moved
their comms aboard uy1