tree checksum vpatch file split hunks
all signers: asciilifeform diana_coman
antecedents:
press order:
logotron_genesis.kv | asciilifeform diana_coman |
patch:
(0 . 0)(1 . 1)
5 589248 logotron_genesis "Genesis."
-(0 . 0)(1 . 73)
10 This is the Aug 2019 draft of S.NSA WWW logotron and IRC bot kit,
11 as can be seen presently at http://logs.nosuchlabs.com/log .
12
13 To make your own installation, you will need:
14
15 (1) Traditional 'python' 2.7.
16 (2) 'flask' lib for (1).
17 (3) 'psycopg2' lib for (1).
18 (4) 'postgres' (9 or 10).
19 (5) A WWW server that knows how to proxy.
20
21 To use the kit, you will first need to create a user and DB, e.g.:
22
23 su - postgres
24 psql
25
26 create user nsabot createdb;
27 alter role nsabot superuser;
28 createdb nsalog;
29 grant all privileges on database nsalog to nsabot ;
30
31 ... you can take 'super' away from this user after 1st run,
32 it is needed in order to let him load the pg_trgm indexer
33 plugin.
34
35 Next, run 'init_db.sh' (alter the constants to match the
36 names of your postgres user and the DB), this creates the schema.
37
38 Then see 'eat.sh' and the 'eat_dump.py' it uses, re how to
39 fill your log archive DB. 'eat_dump.py' eats in Phf's classical
40 format, e.g.:
41
42 1926177;1564727032;mp_en_viaje;in the meantime, everyone's invited on trilema & other blogs.
43
44 where 1926177 is absolute line index (in given chan), 1564727032
45 is unix epochal timestamp, mp_en_viaje is speaker (if he is
46 'actioning', there will be a * behind his name), and the remainder
47 of the line is the payload.
48
49 You WILL need to adjust the constants in 'eat_dump.py', it is not
50 currently capable of eating config file. Set these to your DB
51 and PG user.
52
53 Now, adjust the constants in 'nsabot.conf' (rename per taste)
54 to specify your IRC params, name of bot, host at which www
55 logger will reside, and other knob values.
56
57 Adjust the three 'flask' templates in 'templates' subdir to
58 give the desired look and feel for the www end. Currently we are using
59 Phf's classic style sheet, with minor modifications.
60
61 'reader.py' takes one mandatory command-line argument: full path
62 to the config above. Same for 'bot.py', which is the IRC bot.
63
64 Run these via e.g. nohup ./bot.py & ; nohup ./reader.py &
65 and let your proxying WWW server know how to reach the latter's port.
66
67 For bot.py you will need a registered nick on fleanode (or wherever
68 it is used.) There are no fleanode-specific hacks in the bot, ergo
69 it can be stood up behind ZNC (although this has not been tested.)
70
71 Certain important features are presently unimplemented, in no order:
72 (1) Backlinkage.
73 (2) Search result pagination.
74 (3) Double-quoted search terms.
75 (4) Paste archiving.
76 (5) Multi-headed IRC bot for weather resistance.
77 (6) 'Ecologically clean' export of raw log material.
78 (7) Informative eggogology for bot commands.
79 (8) Automatic synchronization with mirrors (see 6)
80
81 A ZNC log eater is also required, to properly fill in the archives.
82 This is not yet available at the time of this writing.
-(0 . 0)(1 . 446)
87 #!/usr/bin/python
88
89 import ConfigParser, sys, logging, socket, time, re, requests, urllib
90 from urllib import quote
91
92 # DBism
93 import psycopg2, psycopg2.extras
94 import psycopg2.extensions
95 psycopg2.extensions.register_type(psycopg2.extensions.UNICODE)
96 psycopg2.extensions.register_type(psycopg2.extensions.UNICODEARRAY)
97 import time, datetime
98 from datetime import datetime
99
100 ##############################################################################
101
102 cfg = ConfigParser.ConfigParser()
103
104 ##############################################################################
105
106 # Single mandatory arg: config file path
107 if len(sys.argv[1:]) != 1:
108 # If no args, print usage and exit:
109 print sys.argv[0] + " CONFIG"
110 exit(0)
111
112 # Read Config
113 cfg.readfp(open(sys.argv[1]))
114
115 # Get log path
116 logpath = cfg.get("bofh", "log")
117
118 # Get IRCism debug toggle
119 irc_dbg = cfg.get("irc", "irc_dbg")
120 if irc_dbg == 1:
121 log_lvl = logging.DEBUG
122 else:
123 log_lvl = logging.INFO
124
125 # Init logo
126 logging.basicConfig(filename=logpath, filemode='a', level=log_lvl,
127 format='%(asctime)s %(levelname)s %(message)s',
128 datefmt='%d-%b-%y %H:%M:%S')
129
130 # Date format used in log lines
131 Date_Short_Format = "%Y-%m-%d"
132
133 # Date format used in echoes
134 Date_Long_Format = "%Y-%m-%d %H:%M:%S"
135
136 ##############################################################################
137 # Get the remaining knob values:
138
139 try:
140 # IRCism:
141 Buf_Size = int(cfg.get("tcp", "bufsize"))
142 Timeout = int(cfg.get("tcp", "timeout"))
143 TX_Delay = float(cfg.get("tcp", "t_delay"))
144 Servers = [x.strip() for x in cfg.get("irc", "servers").split(',')]
145 Port = int(cfg.get("irc", "port"))
146 Nick = cfg.get("irc", "nick")
147 Pass = cfg.get("irc", "pass")
148 Channels = [x.strip() for x in cfg.get("irc", "chans").split(',')]
149 Join_Delay = int(cfg.get("irc", "join_t"))
150 Prefix = cfg.get("control", "prefix")
151 # DBism:
152 DB_Name = cfg.get("db", "db_name")
153 DB_User = cfg.get("db", "db_user")
154 DB_DEBUG = cfg.get("db", "db_debug")
155 # Logism:
156 Base_URL = cfg.get("logotron", "base_url")
157 Era = int(cfg.get("logotron", "era"))
158 NewChan_Idx = int(cfg.get("logotron", "newchan_idx"))
159 Src_URL = cfg.get("logotron", "src_url")
160
161 except Exception as e:
162 print "Invalid config: ", e
163 exit(1)
164
165 ##############################################################################
166
167 # Connect to the given DB
168 try:
169 db = psycopg2.connect("dbname=%s user=%s" % (DB_Name, DB_User))
170 except Exception:
171 print "Could not connect to DB!"
172 logging.error("Could not connect to DB!")
173 exit(1)
174 else:
175 logging.info("Connected to DB!")
176
177 ##############################################################################
178
179 def close_db():
180 db.close()
181
182 def exec_db(query, args=()):
183 cur = db.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
184 if (DB_DEBUG): logging.debug("query: '{0}'".format(query))
185 if (DB_DEBUG): logging.debug("args: '{0}'".format(args))
186 cur.execute(query, args)
187
188 def query_db(query, args=(), one=False):
189 cur = db.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
190 if (DB_DEBUG): logging.debug("query: '{0}'".format(query))
191 cur.execute(query, args)
192 rv = cur.fetchone() if one else cur.fetchall()
193 if (DB_DEBUG): logging.debug("query res: '{0}'".format(rv))
194 return rv
195
196 def rollback_db():
197 cur = db.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
198 cur.execute("ROLLBACK")
199 db.commit()
200
201 def commit_db():
202 cur = db.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
203 db.commit()
204
205
206 ##############################################################################
207 # IRCism
208 ##############################################################################
209
210 # Used to compute 'uptime'
211 time_last_conn = datetime.now()
212
213 # Init socket:
214 sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
215
216 # Set keepalive:
217 sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
218
219 # Initially we are not connected to anything
220 connected = False
221
222 # Connect to given host:port; return whether connected
223 def connect(host, port):
224 logging.info("Connecting to %s:%s" % (host, port))
225 sock.settimeout(Timeout)
226 try:
227 sock.connect((host, port))
228 except (socket.timeout, socket.error) as e:
229 logging.warning(e)
230 return False
231 except Exception as e:
232 logging.exception(e)
233 return False
234 else:
235 logging.info("Connected.")
236 return True
237
238
239 # Attempt connect to each of hosts, in order, on port; return whether connected
240 def connect_any(hosts, port):
241 for host in hosts:
242 if connect(host, port):
243 return True
244 return False
245
246
247 # Transmit IRC message
248 def send(message):
249 global connected
250 if not connected:
251 logging.warning("Tried to send while disconnected?")
252 return False
253 time.sleep(TX_Delay)
254 logging.debug("> '%s'" % message)
255 message = "%s\r\n" % message
256 try:
257 sock.send(message.encode("utf-8"))
258 except (socket.timeout, socket.error) as e:
259 logging.warning("Socket could not send! Disconnecting.")
260 connected = False
261 return False
262 except Exception as e:
263 logging.exception(e)
264 return False
265
266
267 # Speak given message on a selected channel
268 def speak(channel, message):
269 send("PRIVMSG #%s :%s" % (channel, message))
270 # Now save what the bot spoke:
271 save_line(datetime.now(), channel, Nick, False, message)
272
273
274 # Standard incoming IRC line (excludes fleanode liquishit, etc)
275 irc_line_re = re.compile("""^:([^!]+)\!\S+\s+PRIVMSG\s+\#(\S+)\s+\:(.*)""")
276
277 # The '#' prevents interaction via PM, this is not a PM-able bot.
278
279 # 'Actions'
280 irc_act_re = re.compile(""".*ACTION\s+(.*)""")
281
282
283 # A line was received from IRC
284 def received_line(line):
285 # Process the traditional pingpong
286 if line.startswith("PING"):
287 send("PONG " + line.split()[1])
288 else:
289 logging.debug("< '%s'" % line)
290 standard_line = re.search(irc_line_re, line)
291 if standard_line:
292 # Break this line into the standard segments
293 (user, chan, text) = [s.strip() for s in standard_line.groups()]
294 # Determine whether this line is an 'action' :
295 action = False
296 act = re.search(irc_act_re, line)
297 if act:
298 action = True
299 text = act.group(1)
300 # This line is edible, process it.
301 eat_logline(user, chan, text, action)
302
303
304 # IRCate until we get disconnected
305 def irc():
306 global connected
307
308 # Connect to one among the specified servers, in given priority :
309 while not connected:
310 connected = connect_any(Servers, Port)
311
312 # Save time of last successful connect
313 time_last_conn = datetime.now()
314
315 # Auth to server
316 send("NICK %s\r\n" % Nick)
317 send("USER %s %s %s :%s\r\n" % (Nick, Nick, Nick, Nick))
318 send("NICKSERV IDENTIFY %s %s\r\n" % (Nick, Pass))
319
320 time.sleep(Join_Delay) # wait to join until fleanode eats auth
321
322 # Join selected channels
323 for chan in Channels:
324 logging.info("Joining channel '%s'..." % chan)
325 send("JOIN #%s\r\n" % chan)
326
327 while connected:
328 try:
329 data = sock.recv(Buf_Size)
330 except socket.timeout as e:
331 logging.debug("Listen timed out")
332 continue
333 except socket.error as e:
334 logging.warning("Listen socket error, disconnecting.")
335 connected = False
336 continue
337 except Exception as e:
338 logging.exception(e)
339 connected = False
340 continue
341 else:
342 if len(data) == 0:
343 logging.warning("Listen socket closed, disconnecting.")
344 connected = False
345 continue
346 try:
347 data = data.strip(b'\r\n').decode("utf-8")
348 for l in data.splitlines():
349 received_line(l)
350 continue
351 except Exception as e:
352 logging.exception(e)
353 continue
354
355 ##############################################################################
356
357 html_escape_table = {
358 "&": "&",
359 '"': """,
360 "'": "'",
361 ">": ">",
362 "<": "<",
363 }
364
365 def html_escape(text):
366 res = ("".join(html_escape_table.get(c,c) for c in text))
367 return urllib.quote(res.encode('utf-8'))
368
369
370 searcher_re = re.compile("""(\d+) Results""")
371
372 # Retrieve a search result count using the WWWistic frontend.
373 # This way it is not necessary to have query parser in two places.
374 # However it is slightly wasteful of CPU (requires actually loading results.)
375 def get_search_res(chan, query):
376 try:
377 esc_q = html_escape(query)
378 url = Base_URL + "log-search?q=" + esc_q + "&chan=" + chan
379 res = requests.get(url).text
380 t = res[res.find('<title>') + 7 : res.find('</title>')].strip()
381 found = searcher_re.match(t)
382 if found:
383 output = "[" + url + "]" + "[" + found.group(1)
384 output += """ results for "%s" in #%s]""" % (query, chan)
385 return output
386 else:
387 return """No results found for "%s" in #%s""" % (query, chan)
388 except Exception as e:
389 logging.exception(e)
390 return "No results returned (is logotron WWW up ?)"
391
392 ##############################################################################
393
394 # Commands:
395
396 def cmd_help(arg, user, chan):
397 # Speak the 'help' text
398 speak(chan, "%s: my valid commands are: %s" %
399 (user, ', '.join(Commands.keys())));
400
401 def cmd_search(arg, user, chan):
402 logging.debug("search: '%s'" % arg)
403 speak(chan, get_search_res(chan, arg))
404
405 def cmd_seen(arg, user, chan):
406 speak(chan, "%s: this command is not yet implemented." % user);
407
408 def cmd_src(arg, user, chan):
409 speak(chan, "%s: my source code can be seen at: %s" % (user, Src_URL));
410
411 def cmd_uptime(arg, user, chan):
412 uptime_txt = ""
413 uptime = (datetime.now() - time_last_conn)
414 days = uptime.days
415 hours = uptime.seconds/3600
416 minutes = (uptime.seconds%3600)/60
417 uptime_txt += '%dd ' % days
418 uptime_txt += '%dh ' % hours
419 uptime_txt += '%dm' % minutes
420 # Speak the uptime
421 speak(chan, "%s: time since my last reconnect : %s" %
422 (user, uptime_txt));
423
424 Commands = {
425 "help" : cmd_help,
426 "s" : cmd_search,
427 "seen" : cmd_seen,
428 "uptime" : cmd_uptime,
429 "src" : cmd_src
430 }
431
432 ##############################################################################
433
434 # Save given line to perma-log
435 def save_line(time, chan, speaker, action, payload):
436 ## Put in DB:
437 try:
438 # Get index of THIS new line to be saved
439 last_idx = query_db(
440 '''select idx from loglines where chan=%s
441 and idx = (select max(idx) from loglines where chan=%s) ;''',
442 [chan, chan], one=True)
443
444 # Was this chan unseen previously?
445 if last_idx == None:
446 cur_idx = NewChan_Idx # Then use the config'd start index
447 else:
448 cur_idx = last_idx['idx'] + 1 # Otherwise, get the next idx
449
450 logging.debug("Adding log line with index: %s" % cur_idx)
451
452 # Set up the insert
453 exec_db('''insert into loglines (idx, t, chan, era,
454 speaker, self, payload) values (%s, %s, %s, %s, %s, %s, %s) ; ''',
455 [cur_idx, time, chan, Era, speaker, action, payload])
456
457 # Fire
458 commit_db()
459 except Exception as e:
460 rollback_db()
461 logging.warning("DB add failed, rolled back.")
462 logging.exception(e)
463
464
465 # RE for finding log refs
466 logref_re = re.compile(Base_URL + """log\/([^/]+)/([^/]+)#(\d+)""")
467
468
469 # All valid received lines end up here
470 def eat_logline(user, chan, text, action):
471 # If somehow received line from channel which isn't in the set:
472 if chan not in Channels:
473 logging.warning(
474 "Received martian : '%s' : '%s'" % (chan, text))
475 return
476
477 # First, add the line to the log:
478 save_line(datetime.now(), chan, user, action, text)
479
480 # Then, see if the line was a command for this bot:
481 if text.startswith(Prefix):
482 cmd = text.partition(Prefix)[2].strip()
483 cmd = [x.strip() for x in cmd.split(' ', 1)]
484 if len(cmd) == 1:
485 arg = ""
486 else:
487 arg = cmd[1]
488 # Dispatch this command...
489 command = cmd[0]
490 logging.debug("Dispatching command '%s' with arg '%s'.." %
491 (command, arg))
492 func = Commands.get(command)
493 # If this command is undefined:
494 if func == None:
495 logging.debug("Invalid command: %s" % command)
496 # Utter the 'help' text as response to the sad command
497 cmd_help("", user, chan)
498 else:
499 # Is defined command, dispatch it:
500 func(arg, user, chan)
501 else:
502 # Finally, see if contains log refs:
503 for ref in re.findall(logref_re, text):
504 ref_chan, ref_date, ref_idx = ref
505 # Find this line in DB:
506 ref_line = query_db(
507 '''select t, speaker, payload from loglines
508 where chan=%s and idx=%s;''',
509 [ref_chan, ref_idx], one=True)
510 # If retrieved line is valid, echo it:
511 if ref_line != None:
512 time_txt = ref_line['t'].strftime(Date_Long_Format)
513 my_line = "Logged on %s %s: %s" % (time_txt,
514 ref_line['speaker'],
515 ref_line['payload'])
516 # Speak the line echo into the chan where ref was seen
517 speak(chan, my_line)
518
519 ##############################################################################
520
521 # IRCate; if disconnected, reconnect
522 def run():
523 while 1:
524 irc()
525 logging.warning("Disconnected, will reconnect...")
526
527 ##############################################################################
528
529 # Run continuously.
530 run()
531
532 ##############################################################################
-(0 . 0)(1 . 1)
537 for f in phf/*.txt; do ./eat_dump.py $f trilema 2 ; done
-(0 . 0)(1 . 104)
542 #!/usr/bin/python
543
544 ##############################################################################
545 import psycopg2, psycopg2.extras
546 import psycopg2.extensions
547 psycopg2.extensions.register_type(psycopg2.extensions.UNICODE)
548 psycopg2.extensions.register_type(psycopg2.extensions.UNICODEARRAY)
549 import re
550 import time
551 import datetime
552 from datetime import datetime
553 import sys
554 import os
555
556 # Debug Knob
557 DB_DEBUG = False
558 ##############################################################################
559
560 ##############################################################################
561 db = psycopg2.connect("dbname=nsalog user=nsabot") ## CHANGE THESE
562
563 def close_db():
564 db.close()
565
566 def exec_db(query, args=()):
567 cur = db.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
568 if (DB_DEBUG): print "query: '{0}'".format(query)
569 if (DB_DEBUG): print "args: '{0}'".format(args)
570 if (DB_DEBUG): print "EXEC:"
571 cur.execute(query, args)
572
573 def rollback_db():
574 cur = db.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
575 cur.execute("ROLLBACK")
576 db.commit()
577
578 def commit_db():
579 cur = db.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
580 db.commit()
581
582 ##############################################################################
583
584 # Eat individual line of a Phf-style log dump
585 def eat_logline(line, chan, era):
586 match = re.search("(\d+)\;(\d+)\;([^;]+)\;(.*$)", line)
587 if match:
588 g = match.groups()
589 self_speak = False
590
591 try:
592 idx = int(g[0]) # Serial Number of Log Line
593 time = int(g[1]) # Unix Epochal Time of Log Line
594 except Exception, e:
595 print("Malformed Line! '" + line +"' ! : " + e)
596 close_db()
597 exit(1)
598
599 speaker = g[2] # Name of Speaker
600 payload = g[3] # Payload (remainder of line)
601
602 ## If spoken line is of form "* user ..." :
603 if speaker == "*":
604 spl = payload.split(' ', 1)
605 speaker = spl[0]
606 payload = spl[1]
607 self_speak = True
608
609 ## Put in DB:
610 try:
611 exec_db('''insert into loglines (idx, t, chan, era, speaker, self, payload)
612 values (%s, %s, %s, %s, %s, %s, %s) ; ''',
613 [int(idx), datetime.fromtimestamp(time), str(chan), int(era), str(speaker),
614 bool(self_speak), str(payload)])
615 commit_db()
616 except psycopg2.IntegrityError as e:
617 rollback_db()
618 print "Dupe Ignored, Idx=", idx
619 else:
620 print("Malformed Line! '" + line +"' !")
621 close_db()
622 exit(1)
623
624
625 # Eat Phf-style log dump at given path
626 def eat_dump(path, chan, era):
627 with open(path) as fp:
628 for line in fp:
629 eat_logline(line, chan, era)
630
631
632 ##############################################################################
633
634 if (len(sys.argv) == 4):
635 logdump = sys.argv[1] # Path to Phf-style log dump
636 chan = sys.argv[2] # Chan Name
637 era = sys.argv[3] # Era (integer)
638 # Eat:
639 eat_dump(logdump, chan, era)
640 close_db()
641 else:
642 print "Usage: ./eat_dump LOGFILE CHAN ERA"
643 exit(0)
644
645 ##############################################################################
-(0 . 0)(1 . 3)
650 #!/bin/bash
651
652 psql -U nsabot -d nsalog -a -f nsalog_schem.sql
-(0 . 0)(1 . 67)
657 [bofh]
658
659 # Path to IRC bot debuggism log
660 log = nsabot.log
661
662 [irc]
663 servers = irc.freenode.net
664 port = 6667
665
666 # Bot's nick (change to yours, as with all knobs)
667 nick = snsabot
668
669 # All chans for both www end and bot, go here:
670 chans = asciilifeform-test, asciilifeform-test-2
671
672 # IRC nick PW
673 pass = YOURFLEANODEPW
674
675 # How long to wait for fleanode to ack auth of nick before joining chans
676 join_t = 20
677
678 # Verbose barf of irc tx/rx
679 irc_dbg = 0
680
681 [tcp]
682 bufsize = 4096
683
684 # Recv timeout
685 timeout = 30
686
687 # Delay between IRC transmits - possibly ought to be longer
688 t_delay = 0.1
689
690 [control]
691 # Command Trigger for IRC bot
692 prefix = !q
693
694 [logotron]
695 # The current era.
696 era = 3
697 # Convention for these :
698 # 1 : Age of #b-a (and earlier dark age material)
699 # 2 : Phf's (and several variously-reliable) loggers
700 # 3 : Present day.
701
702 # Where the source lives (change to yours)
703 src_url = http://not.yet
704
705 # From where index starts for new chan, so to leave room for archive insert
706 newchan_idx = 1000000
707
708 # Base URL of logtron site (change to yours!)
709 base_url = http://logs.nosuchlabs.com/
710
711 # Other people's bots (for colouration strictly)
712 bots = a111, deedbot, feedbot, auctionbot, lobbesbot
713
714 # On what port will sit the www logtron
715 www_port = 5002
716
717 [db]
718 # Change to your DB (set it up so only answers locally)
719 db_name = nsalog
720 db_user = nsabot
721
722 # Verbose barf of DB transactions
723 db_debug = 0
-(0 . 0)(1 . 29)
728 drop table if exists loglines;
729 create table loglines (
730 ser serial,
731 idx integer not null,
732 t timestamp,
733 chan text not null,
734 era integer not null,
735 speaker text not null,
736 self boolean,
737 payload text not null,
738 backlinks integer[],
739 PRIMARY KEY(idx, chan),
740 UNIQUE(idx, chan)
741 );
742
743
744 create index logline_idx_i on loglines(idx);
745 create index logline_t_i on loglines(t);
746 create index logline_chan_i on loglines(chan);
747 create index logline_era_i on loglines(era);
748 create index logline_speaker_i on loglines(speaker);
749 create index logline_payload_i on loglines(payload);
750
751 CREATE EXTENSION pg_trgm;
752
753 -- drop index payload_search_idx;
754
755 create index concurrently payload_search_idx
756 ON loglines USING gin (payload gin_trgm_ops);
-(0 . 0)(1 . 441)
761 #!/usr/bin/python
762
763 ##############################################################################
764 import ConfigParser, sys
765 import psycopg2, psycopg2.extras
766 import psycopg2.extensions
767 psycopg2.extensions.register_type(psycopg2.extensions.UNICODE)
768 psycopg2.extensions.register_type(psycopg2.extensions.UNICODEARRAY)
769 import time
770 import datetime
771 from datetime import timedelta
772 import sys
773 reload(sys)
774 sys.setdefaultencoding('utf8')
775 import os
776 import threading
777 import re
778 from datetime import datetime
779 from urlparse import urljoin
780 from flask import Flask, request, session, url_for, redirect, \
781 render_template, abort, g, flash, _app_ctx_stack, make_response, \
782 jsonify
783 from flask import Flask
784 from flask.ext.cache import Cache
785 ##############################################################################
786
787 ##############################################################################
788 # Single mandatory arg: config file path
789 if len(sys.argv[1:]) != 1:
790 # If no args, print usage and exit:
791 print sys.argv[0] + " CONFIG"
792 exit(0)
793
794 # Read Config from given conf file
795 config_path = os.path.abspath(sys.argv[1])
796 cfg = ConfigParser.ConfigParser()
797 cfg.readfp(open(config_path))
798
799 try:
800 # IRCism:
801 Nick = cfg.get("irc", "nick")
802 Channels = [x.strip() for x in cfg.get("irc", "chans").split(',')]
803 Bots = [x.strip() for x in cfg.get("logotron", "bots").split(',')]
804 Bots.append(Nick) # Add our own bot to the bot list
805 # DBism:
806 DB_Name = cfg.get("db", "db_name")
807 DB_User = cfg.get("db", "db_user")
808 DB_DEBUG = cfg.get("db", "db_debug")
809 # Logism:
810 Base_URL = cfg.get("logotron", "base_url")
811 Era = int(cfg.get("logotron", "era"))
812 # WWW:
813 WWW_Port = int(cfg.get("logotron", "www_port"))
814
815 except Exception as e:
816 print "Invalid config: ", e
817 exit(1)
818
819 ##############################################################################
820
821 ##############################################################################
822 ### Knobs not made into config yet ###
823 Default_Chan = Channels[0]
824 Min_Query_Length = 3
825 Max_Search_Results = 1000
826
827 ## Format for Date in Log Lines
828 Date_Short_Format = "%Y-%m-%d"
829
830 ## WWW Debug Knob
831 DEBUG = False
832 ##############################################################################
833
834 app = Flask(__name__)
835 cache = Cache(app,config={'CACHE_TYPE': 'simple'})
836 app.config.from_object(__name__)
837
838 def get_db():
839 db = getattr(g, 'db', None)
840 if db is None:
841 db = g.db = psycopg2.connect("dbname=%s user=%s" % (DB_Name, DB_User))
842 return db
843
844 def close_db():
845 if hasattr(g, 'db'):
846 g.db.close()
847
848 @app.before_request
849 def before_request():
850 g.db = get_db()
851
852 @app.teardown_request
853 def teardown_request(exception):
854 close_db()
855
856 def query_db(query, args=(), one=False):
857 cur = get_db().cursor(cursor_factory=psycopg2.extras.RealDictCursor)
858 if (DB_DEBUG): print "query: '{0}'".format(query)
859 cur.execute(query, args)
860 rv = cur.fetchone() if one else cur.fetchall()
861 if (DB_DEBUG): print "query res: '{0}'".format(rv)
862 return rv
863
864 def exec_db(query, args=()):
865 cur = get_db().cursor(cursor_factory=psycopg2.extras.RealDictCursor)
866 if (DB_DEBUG): print "query: '{0}'".format(query)
867 if (DB_DEBUG): print "args: '{0}'".format(args)
868 if (DB_DEBUG): print "EXEC:"
869 cur.execute(query, args)
870
871 def getlast_db():
872 cur = get_db().cursor(cursor_factory=psycopg2.extras.RealDictCursor)
873 cur.execute('select lastval()')
874 return cur.fetchone()['lastval']
875
876 def commit_db():
877 cur = get_db().cursor(cursor_factory=psycopg2.extras.RealDictCursor)
878 g.db.commit()
879
880 ##############################################################################
881
882 ## All eggogs redirect to main page
883 @app.errorhandler(404)
884 def page_not_found(error):
885 return redirect(url_for('log'))
886
887 ##############################################################################
888
889 html_escape_table = {
890 "&": "&",
891 '"': """,
892 "'": "'",
893 ">": ">",
894 "<": "<",
895 }
896
897 def html_escape(text):
898 return "".join(html_escape_table.get(c,c) for c in text)
899
900 ##############################################################################
901
902 ## Get base URL
903 def get_base():
904 if DEBUG:
905 return request.host_url
906 return Base_URL
907
908
909 # Get perma-URL corresponding to given log line
910 def line_url(l):
911 return "{0}log/{1}/{2}#{3}".format(get_base(),
912 l['chan'],
913 l['t'].strftime(Date_Short_Format),
914 l['idx'])
915
916 def gen_chanlist(selected_chan):
917 # Get current time
918 now = datetime.now()
919
920 s = """<table align="center" class="chantable"><tr>"""
921 for chan in Channels:
922 chan_formed = chan
923 if chan == selected_chan:
924 chan_formed = "<span class='highlight'>" + chan + "</span>"
925 s += """<th><a href="{0}log/{1}">{2}</a></th>""".format(
926 get_base(), chan, chan_formed)
927 s += "</tr><tr>"
928
929 for chan in Channels:
930
931 last_time = query_db(
932 '''select t, idx from loglines where chan=%s
933 and idx = (select max(idx) from loglines where chan=%s) ;''',
934 [chan, chan], one=True)
935
936 last_time_txt = ""
937 if last_time != None:
938 span = (now - last_time['t'])
939 days = span.days
940 hours = span.seconds/3600
941 minutes = (span.seconds%3600)/60
942
943 if days != 0:
944 last_time_txt += '%dd ' % days
945 if hours != 0:
946 last_time_txt += '%dh ' % hours
947 if minutes != 0:
948 last_time_txt += '%dm' % minutes
949
950 s += """<td><i><a href="{0}log/{1}/{2}#{3}">{4}</a></i></td>""".format(
951 get_base(),
952 chan,
953 last_time['t'].strftime(Date_Short_Format),
954 last_time['idx'],
955 last_time_txt)
956
957 else:
958 last_time_txt = ""
959 s += "<td></td>"
960
961 s += "</tr></table>"
962 return s
963
964
965 # Make above callable from inside htm templater:
966 app.jinja_env.globals.update(gen_chanlist=gen_chanlist)
967
968
969 # HTML Tag Regex
970 tag_regex = re.compile("(<[^>]+>)")
971
972
973 # Find the segments of a block of text which constitute HTML tags
974 def get_link_intervals(str):
975 links = []
976 span = []
977 for match in tag_regex.finditer(str):
978 span = match.span()
979 links += [span]
980 return links
981
982
983 # Highlight all matched tokens in given text
984 def highlight_matches(strings, text):
985 e = '(' + ('|'.join(strings)) + ')'
986 return re.sub(e,
987 r"""<span class='highlight'>\1</span>""",
988 text,
989 flags=re.I)
990
991
992 # Highlight matched tokens in the display of a search result logline,
993 # but leave HTML tags alone
994 def highlight_text(strings, text):
995 result = ""
996 last = 0
997 for i in get_link_intervals(text):
998 i_start, i_end = i
999 result += highlight_matches(strings, text[last:i_start])
1000 result += text[i_start:i_end] # the HTML tag, leave it alone
1001 last = i_end
1002 result += highlight_matches(strings, text[last:]) # last block
1003 return result
1004
1005
1006 # Regexps used in format_logline:
1007 boxlinks_re = re.compile('\[\s*<a href="(http[^ \[\]]+)">[^ <]+</a>\s*\]\[([^\[\]]+)\]')
1008 stdlinks_re = re.compile('(http[^ \[\]]+)')
1009
1010
1011 ## Format given log line for display
1012 def format_logline(l, highlights = []):
1013 payload = html_escape(l['payload'])
1014
1015 # Format ordinary links:
1016 payload = re.sub(stdlinks_re, r'<a href="\1">\1</a>', payload)
1017
1018 # Now also format [link][text] links :
1019 payload = re.sub(boxlinks_re, r'<a href="\1">\2</a>', payload)
1020
1021 # If this is a search result, illuminate the matched strings:
1022 if highlights != []:
1023 payload = highlight_text(highlights, payload)
1024
1025 bot = ""
1026 if l['speaker'] in Bots:
1027 bot = " bot"
1028
1029 # HTMLize the given line :
1030 s = ("<div id='{0}' class='{1}{5}'>"
1031 "<a class='nick' title='{2}'"
1032 " href=\"{3}\">{1}</a>: {4}</div>").format(l['idx'],
1033 l['speaker'],
1034 l['t'],
1035 line_url(l),
1036 payload,
1037 bot)
1038
1039 return s
1040
1041 # Make above callable from inside htm templater:
1042 app.jinja_env.globals.update(format_logline=format_logline)
1043
1044
1045 # Generate navbar for the given date:
1046 def generate_navbar(date, tail, chan):
1047 cur_day = datetime.strptime(date, Date_Short_Format)
1048 prev_day = cur_day - timedelta(days=1)
1049 prev_day_txt = prev_day.strftime(Date_Short_Format)
1050
1051 s = "<a href='{0}log/{1}/{2}'>← {2}</a>".format(
1052 get_base(),
1053 chan,
1054 prev_day_txt)
1055
1056 if not tail:
1057 next_day = cur_day + timedelta(days=1)
1058 next_day_txt = next_day.strftime(Date_Short_Format)
1059 s = s + " | <a href='{0}log/{1}/{2}'>{2} →</a>".format(
1060 get_base(),
1061 chan,
1062 next_day_txt)
1063
1064 return s
1065
1066 # Make above callable from inside htm templater:
1067 app.jinja_env.globals.update(generate_navbar=generate_navbar)
1068
1069
1070 @app.route('/log/<chan>/<date>')
1071 @app.route('/log/<chan>', defaults={'date': None})
1072 @app.route('/log/', defaults={'chan': Default_Chan, 'date': None})
1073 @app.route('/log', defaults={'chan': Default_Chan, 'date': None})
1074 def log(chan, date):
1075 # Handle rubbish chan:
1076 if chan not in Channels:
1077 return redirect(url_for('log'))
1078
1079 # Get current time
1080 now = datetime.now()
1081
1082 # Whether we are viewing 'current' tail
1083 tail = False
1084
1085 # If viewing 'current' log:
1086 if date == None:
1087 date = now.strftime(Date_Short_Format)
1088 tail = True
1089
1090 # Parse given date, and redirect to default log if rubbish:
1091 try:
1092 day_start = datetime.strptime(date, Date_Short_Format)
1093 except Exception, e:
1094 return redirect(url_for('log'))
1095
1096 # Determine the end of the interval being shown
1097 day_end = day_start + timedelta(days=1)
1098
1099 # Get the loglines from DB
1100 lines = query_db(
1101 '''select * from loglines where chan=%s
1102 and t between %s and %s order by idx asc;''',
1103 [chan, day_start, day_end], one=False)
1104
1105 # Return the HTMLized text
1106 return render_template('log.html',
1107 chan = chan,
1108 loglines = lines,
1109 date = date,
1110 tail = tail)
1111
1112
1113
1114 Name_Chars = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_-"
1115
1116 def sanitize_speaker(s):
1117 return "".join([ch for ch in s if ch in Name_Chars])
1118
1119
1120 def re_escape(s):
1121 return re.sub(r"[(){}\[\].*?|^$\\+-]", r"\\\g<0>", s)
1122
1123 # Search knob. Supports 'chan' parameter.
1124 @app.route('/log-search')
1125 def logsearch():
1126 # The query params:
1127 chan = request.args.get('chan', default = Default_Chan, type = str)
1128 query = request.args.get('q', default = '', type = str)
1129 # page_num = request.args.get('page', default = 0, type = int)
1130
1131 # Handle rubbish chan:
1132 if chan not in Channels:
1133 return redirect(url_for('log'))
1134
1135 nres = 0
1136 searchres = []
1137 tokens_orig = []
1138 search_head = "Query is too short!"
1139 # Forbid query that is too short:
1140 if len(query) >= Min_Query_Length:
1141 # Get the search tokens to use:
1142 tokens = query.split()
1143 tokens_standard = []
1144 from_users = []
1145
1146 # separate out "from:foo" tokens and ordinary:
1147 for t in tokens:
1148 if t.startswith("from:") or t.startswith("f:"):
1149 from_users.append(t.split(':')[1]) # Record user for 'from' query
1150 else:
1151 tokens_standard.append(t)
1152
1153 from_users = ['%' + sanitize_speaker(t) + '%' for t in from_users]
1154 tokens_orig = [re_escape(t) for t in tokens_standard]
1155 tokens_formed = ['%' + t + '%' for t in tokens_orig]
1156
1157 # Query is usable; perform the search on DB and get the finds
1158 if from_users == []:
1159 searchres = query_db(
1160 '''select * from loglines where chan=%s
1161 and payload ilike all(%s) order by idx desc limit %s;''',
1162 [chan,
1163 tokens_formed,
1164 Max_Search_Results], one=False)
1165 else:
1166 print "from=", from_users
1167
1168 searchres = query_db(
1169 '''select * from loglines where chan=%s
1170 and speaker ilike any(%s)
1171 and payload ilike all(%s) order by idx desc limit %s;''',
1172 [chan,
1173 from_users,
1174 tokens_formed,
1175 Max_Search_Results], one=False)
1176
1177
1178 # Number of entries found
1179 nres = len(searchres)
1180 search_head = "<b>{0}</b> entries found in {1} for <b>'{2}'</b> :".format(
1181 nres, chan, html_escape(query))
1182
1183 # No paging support just yet:
1184 return render_template('searchres.html',
1185 query = query,
1186 nres = nres,
1187 chan = chan,
1188 search_head = search_head,
1189 tokens = tokens_orig,
1190 loglines = searchres)
1191
1192
1193 # Comment this out if you don't have one
1194 @app.route('/favicon.ico')
1195 def favicon():
1196 return redirect(url_for('static', filename='favicon.ico'))
1197
1198
1199 ## App Mode
1200 if __name__ == '__main__':
1201 app.run(threaded=True, port=WWW_Port)
-(0 . 0)(1 . 1)
1206 favicon.ico goes in this dir.
-(0 . 0)(1 . 127)
1211 <html>
1212
1213 <head>
1214 <title>
1215 {% block title %}
1216 {% endblock %}
1217 </title>
1218 <meta http-equiv='Content-Type' content='text/html; charset=UTF-8' />
1219 <meta name='viewport' content='width=device-width, initial-scale=1' />
1220 <style type='text/css'>
1221 table.chantable {
1222 margin-left:auto;
1223 margin-right:auto;
1224 padding : 5px;
1225 border : 1px solid black;
1226 border-spacing : 25px 0;
1227 }
1228
1229 .nick {
1230 font-weight: bold;
1231 text-decoration: none;
1232 color: black;
1233 }
1234
1235 .nick:visited {
1236 color: black;
1237 }
1238
1239 .re {
1240 display: none;
1241 }
1242
1243 div:hover .re {
1244 display: inline;
1245 }
1246
1247 div {
1248 white-space: pre-wrap;
1249 }
1250
1251 .bot,
1252 .bot .nick,
1253 .bot a {
1254 color: #777;
1255 font-weight: normal;
1256 }
1257
1258 .mention {
1259 background: lightyellow;
1260 }
1261
1262 :target,
1263 .active {
1264 background: lightyellow;
1265 }
1266
1267 img.inline {
1268 margin: 0.5em auto 1em auto;
1269 display: block;
1270 border: 1px solid black;
1271 width: 34em;
1272 }
1273
1274 img.hist {
1275 margin: 0.5em auto 1em auto;
1276 }
1277
1278 .annotations a {
1279 text-decoration: none;
1280 }
1281
1282 #navbar {
1283 margin-bottom: 1em;
1284 }
1285
1286 .highlight {
1287 background: yellow;
1288 padding: 1px;
1289 }
1290
1291 </style>
1292 </head>
1293
1294 <body>
1295
1296 <p>
1297 <table align="center">
1298 <tr>
1299 <td>
1300 <a href="http://nosuchlabs.com">
1301 <img src="http://logs.nosuchlabs.com/static/snsa_small.jpg"
1302 align="left"
1303 alt="No Such lAbs"/>
1304 </a>
1305 </td>
1306
1307 <td>
1308 {{ gen_chanlist( chan ) | safe }}
1309 </td>
1310
1311 <td>
1312 <a href="http://pizarroisp.net">
1313 <img src="http://logs.nosuchlabs.com/static/piz_small.jpg"
1314 align="right"
1315 alt="Pizarro"/>
1316 </a>
1317 </td>
1318 </tr>
1319 </table>
1320
1321 </p>
1322
1323 <hr>
1324
1325 <form align="center" id="search" method='get' action='/log-search'><span><a href='/log/{{chan}}'>log</a></span>
1326 <input type='text' name='q' value='{{ query }}' maxlength='2048' spellcheck='false' size='55' value='' />
1327 <input type='hidden' name='chan' value='{{chan}}'>
1328 <input type='submit' value='search {{chan}}' />
1329 </form>
1330
1331 <hr>
1332
1333 {% block body %}{% endblock %}
1334
1335 </body>
1336
1337 </html>
-(0 . 0)(1 . 17)
1342 {% extends "layout.html" %}
1343
1344 {% block title %}
1345 #{{ chan }} | {{ date }}
1346 {% endblock %}
1347
1348 {% block body %}
1349
1350 <div id='navbar'>{{ generate_navbar(date, tail, chan) | safe }}</div>
1351
1352 {% for l in loglines %}
1353 {{ format_logline(l) | safe }}
1354 {% endfor %}
1355
1356 <div id='navbar'>{{ generate_navbar(date, tail, chan) | safe }}</div>
1357
1358 {% endblock %}
-(0 . 0)(1 . 17)
1363 {% extends "layout.html" %}
1364
1365 {% block title %}
1366 {{ nres }} Results for {{ query }} in #{{ chan }}
1367 {% endblock %}
1368
1369 {% block body %}
1370
1371 <div align="center"><span>{{ search_head | safe }}</span></div>
1372
1373 <hr>
1374
1375 {% for l in loglines %}
1376 {{ format_logline(l, tokens) | safe }}
1377 {% endfor %}
1378
1379 {% endblock %}