Commit graph

109 commits

Author SHA1 Message Date
pezcurrel
5c605cbe5b Some little cosmetic (readability of log files) changes 2022-12-17 18:40:22 +01:00
pezcurrel
2571396253 Tuned to recent changes in crawler.php (and getinstinfo.php) 2022-12-17 17:36:46 +01:00
pezcurrel
ad8fa26306 Made mysql connection and charset setting errors more graceful; added “users” page to updstats; other minor changes 2022-12-17 17:35:35 +01:00
pezcurrel
2d1d28b002 Fixed regexp checking if max_charcters is an integer; made mexit use eecho again, moving the closing of logf after eecho(s); made logf be opened only if logminmsglev < 4 2022-12-17 17:33:46 +01:00
pezcurrel
d1f088a026 Command for subprocesses gets now built on the fly using cmd function; logfile doesn’t get opened if logminmsglev < 4; other minor changes 2022-12-17 17:31:24 +01:00
pezcurrel
6d897cfdff Removed “crawlernew” directory 2022-12-17 15:03:11 +01:00
pezcurrel
c7d5b50377 Adapted to new crawler version 2022-12-17 15:02:52 +01:00
pezcurrel
0b9e892aef Splitted old crawler.php in 2; this is the part that coordinates 2022-12-17 15:02:20 +01:00
pezcurrel
7629a1caae Moved from subdir “crawlernew” 2022-12-17 15:00:36 +01:00
pezcurrel
b46469bfbb Cope with mysql errors even with php ver. < 8; check if $link is false before trying to close mysql connection in function mexit 2022-12-16 22:39:51 +01:00
pezcurrel
e46a82d923 Added suffix “s” to option “-t” in $cmd definition; cope with mysql errors even with php ver. < 8; other small changes 2022-12-16 22:38:16 +01:00
pezcurrel
3804171253 Renamed getfc to gurl 2022-12-16 21:59:26 +01:00
pezcurrel
d75a6445ae Fixed sleep and relative message to actually use $options['udirfailst']; changed some options’ default; changed a bit the help text; changed a bit some messages in the “fetchusers” section 2022-12-16 21:23:14 +01:00
pezcurrel
b666fcfdda Use parsetime for “--timeout” too, changed a bit the help text accordingly 2022-12-16 21:19:34 +01:00
pezcurrel
2d31b7ca79 Changed a bit how option “jsonwrite” works; other minor changes 2022-12-16 19:25:45 +01:00
pezcurrel
f193bd294c A very tiny db test, first commit 2022-12-16 19:12:17 +01:00
pezcurrel
9529a938a7 Lots of changes, not very important :-)) 2022-12-16 19:06:47 +01:00
pezcurrel
bab5bd5dd2 Added mysqli_query error management for older php versions to function myq; minor changes 2022-12-16 19:05:46 +01:00
pezcurrel
4522bc3ea8 Added mysqli_query error management for older php versions to function myq 2022-12-16 19:02:41 +01:00
pezcurrel
513981b7e2 Fetches info from an instance, first commit 2022-12-16 00:00:37 +01:00
pezcurrel
1cafbe05ea Crawler new version, “multithreaded”, coordinator script, first commit 2022-12-16 00:00:06 +01:00
pezcurrel
3a720a90ac Added a warning when nodeinfo specs couldn’t be fetched; made it set New=1 even when host doesn’t respond and is not in the db 2022-12-15 12:45:20 +01:00
pezcurrel
9360bdc481 Made false positives for “IsMastodon” less likely (impossible?) 2022-12-12 22:40:17 +01:00
pezcurrel
483fbcd103 Added the possibility, for each set of records with the same URI, to choose one record to keep and delete the others, or to automatically keep the record with the lowest ID and delete the others 2022-12-12 21:32:35 +01:00
pezcurrel
90e85f8182 Took away “-t 10” option from crawler.php calls since 10 is now its default timeout 2022-12-12 17:06:51 +01:00
pezcurrel
e07fba673d Fixed call to non-existent function ”mysq” to “myq” 2022-12-12 08:36:18 +01:00
pezcurrel
ec6324fb4f Made langs() shorten to a maximum of 5 elements the $languages array 2022-12-12 08:29:18 +01:00
pezcurrel
d80ba5ddc4 Fixed double “,” in a query inside langs() 2022-12-12 08:24:26 +01:00
pezcurrel
6f9260e08e myq() did not return results, now it does 2022-12-12 08:17:01 +01:00
pezcurrel
f6752a34bc Added function “myq” as a wrapper for mysqli_query managing exceptions; used it throughout the whole script 2022-12-12 08:12:29 +01:00
pezcurrel
2649e7d137 Info from nodeinfo didn’t end up into $info, now they do 2022-12-12 00:47:06 +01:00
pezcurrel
783e54a9f9 Little script to convert a unix timestamp in input to a date 2022-12-11 23:33:09 +01:00
pezcurrel
1701a2cfe6 Little script to search for records with the same “URI” in Instances table 2022-12-11 23:32:42 +01:00
pezcurrel
853be9f0e0 Little script to fix uris beginning with “https://” 2022-12-11 23:31:47 +01:00
pezcurrel
b16515f4e8 Lots of changes :-)) 2022-12-11 23:29:51 +01:00
pezcurrel
61ad655a62 Disabled fetching profile’s page when “noindex” is not set in account because it takes too long; disabled featured tags fetching fro the same reason; other minor changes 2022-12-10 23:32:58 +01:00
pezcurrel
f343cb702e Changed some eecho messages importance 2022-12-10 13:57:30 +01:00
pezcurrel
4b7f6a199c Added truncs where needed; added code to check for “noindex” on user’s profile page when “noindex” is not set in accounts info 2022-12-10 12:35:22 +01:00
pezcurrel
18ce06871b Added ckratelimit() where useful; made it more flexible with lowercasing every header key; more work on fetching users from users directories 2022-12-09 22:53:18 +01:00
pezcurrel
8341f0e209 Fixed a cosmetic bug; some more work into users directories fetching 2022-12-09 19:25:44 +01:00
pezcurrel
ffd20debe6 Removed executable attribute 2022-12-08 14:35:55 +01:00
pezcurrel
667a0824ff Script to “normalize” language codes in the database 2022-12-08 13:55:47 +01:00
pezcurrel
5d073632f5 Added “pt_BR” language 2022-12-08 13:54:59 +01:00
pezcurrel
5aa4cde8f7 Added “pt_BR” language 2022-12-08 13:54:20 +01:00
pezcurrel
09765e566e Added “normalization” of language codes (dash to underscore) and “pt_BR” language 2022-12-08 13:53:43 +01:00
pezcurrel
6c2672ea00 Fixed and added some “gecho” messages 2022-12-08 00:07:31 +01:00
pezcurrel
8ea4acc17a Fixed wrong path in MAILCFG; stderr redirect to logfile; other minor changes 2022-12-08 00:06:16 +01:00
pezcurrel
c3a90ba6b8 Adding “fetchuser” option; changed “timeout” default from 5 to 10 seconds 2022-12-08 00:03:10 +01:00
pezcurrel
a07e59d52c Added option “--moreclauses” 2022-12-05 21:18:58 +01:00
pezcurrel
b0d2438d8e Updated with regexps matching new spammers 2022-12-04 08:51:03 +01:00