1
0
Fork 0
mirror of https://gitlab.com/news-flash/article_scraper.git synced 2025-07-07 16:15:32 +02:00
Commit graph

122 commits

Author SHA1 Message Date
Jan Lukas Gernert
a1c07d436f fix alternative top candidate calcs 2023-02-28 18:28:01 +01:00
Jan Lukas Gernert
f4ccd22837 fix node ancestor depth 2023-02-28 18:27:46 +01:00
Jan Lukas Gernert
58721efa35 fix positive/negative class weight regex 2023-02-28 18:27:36 +01:00
Jan Lukas Gernert
aea57d0cf3 fix has_single_tag_inside_element & update tests 2023-02-28 03:59:48 +01:00
Jan Lukas Gernert
31a8033844 fixes, more sanitation & 1 more failing test 2023-02-28 01:50:13 +01:00
Jan Lukas Gernert
56c08c501a fmt 2023-02-27 01:01:16 +01:00
Jan Lukas Gernert
df999cd9fc more cleanups & more tests 2023-02-27 01:00:56 +01:00
Jan Lukas Gernert
0834c4d72a fixes 2023-02-26 02:22:53 +01:00
Jan Lukas Gernert
d8e3a75b01 update configs 2023-02-25 01:40:07 +01:00
Jan Lukas Gernert
2460745547 cleanup 2023-02-25 00:44:18 +01:00
Jan Lukas Gernert
63035ca028 fmt 2023-02-25 00:43:42 +01:00
Jan Lukas Gernert
e3246af28b refactor & more testing 2023-02-25 00:42:26 +01:00
Jan Lukas Gernert
7ae98904d4 unwrap noscript images 2023-02-23 01:53:42 +01:00
Jan Lukas Gernert
98c06e11f4 improve title extraction 2023-02-20 02:32:58 +01:00
Jan Lukas Gernert
cce912c354 first content extraction kinda working 2023-02-20 00:29:44 +01:00
Jan Lukas Gernert
2c76a869e7 fmt 2023-02-17 14:35:35 +01:00
Jan Lukas Gernert
71a8816747 somewhat complete readability algorithm 2023-02-17 14:16:01 +01:00
Jan Lukas Gernert
979358fd35 more 2023-01-01 21:35:46 +01:00
Jan Lukas Gernert
2750ad648d start implementing readability 2023-01-01 14:51:34 +01:00
Jan Lukas Gernert
c08f5afa5d move stuff around 2022-12-13 08:54:57 +01:00
Jan Lukas Gernert
90383545e0 extract & parse charsets other than utf8 2022-12-11 17:38:42 +01:00
Jan Lukas Gernert
97b194c9e8 clippy regex escape 2022-12-11 16:31:01 +01:00
Jan Lukas Gernert
88bb88a38f clippy 2022-12-11 16:23:02 +01:00
Jan Lukas Gernert
dc1bf2ef0c fmt 2022-12-11 16:19:49 +01:00
Jan Lukas Gernert
22e98fdab7 extract thumbnail url 2022-12-11 16:18:03 +01:00
Jan Lukas Gernert
0c8aba4f4a refactor: a bit less nested code 2022-12-01 10:14:47 +01:00
Jan Lukas Gernert
27be5a3204 port failure -> thiserror 2022-12-01 09:22:08 +01:00
Jan Lukas Gernert
d906f6b7fe readability stub 2022-10-08 23:10:26 +02:00
Jan Lukas Gernert
273ddd832c start refactor & fingerprints 2022-10-08 23:09:00 +02:00
Jan Lukas Gernert
29df3aa698 simplify ci pipe 2022-10-07 09:41:16 +02:00
Jan Lukas Gernert
7b205e8e27 fmt 2022-10-07 09:32:39 +02:00
Jan Lukas Gernert
69659da983 clippy fixes 2022-10-07 09:20:10 +02:00
Jan Lukas Gernert
8c2af14871 special handling trying to find single page links: fixes youtube 2022-10-07 08:48:09 +02:00
Jan Lukas Gernert
7b1b027c6d add support for header values: fixes golem test 2022-10-07 07:17:33 +02:00
Jan Lukas Gernert
0e3553b647 remove dbg code 2022-10-07 07:17:33 +02:00
Jan Lukas Gernert
c1ae011fcd use global rules 2022-10-07 07:17:31 +02:00
Jan Lukas Gernert
3a6a70ee64 embedded config files 2022-10-07 07:16:54 +02:00
Jan Lukas Gernert
aa09666f4c async config loading 2022-10-07 07:16:06 +02:00
Jan Lukas Gernert
9fb772bfa8 update deps 2022-10-07 07:16:06 +02:00
Jan Lukas Gernert
5c66930f21 Merge branch 'volker1' into 'master'
Fixed spelling

See merge request news-flash/article_scraper!7
2022-06-16 04:11:03 +00:00
Volker Weißmann
593901c849 Fixed spelling 2022-06-15 19:15:51 +02:00
Jan Lukas Gernert
a4992f55bf (cargo-release) start next development iteration 1.1.8-alpha.0 2021-01-21 08:54:46 +01:00
Jan Lukas Gernert
07d8c1fa0f (cargo-release) version 1.1.7 2021-01-21 08:53:56 +01:00
Jan Lukas Gernert
76940232a5 take url reference 2021-01-21 08:53:51 +01:00
Jan Lukas Gernert
cf4c6c42c5 (cargo-release) start next development iteration 1.1.7-alpha.0 2021-01-06 10:48:28 +01:00
Jan Lukas Gernert
a51356d49f (cargo-release) version 1.1.6 2021-01-06 10:47:34 +01:00
Jan Lukas Gernert
b73448b189 fix clippy lints 2021-01-06 10:32:43 +01:00
Jan Lukas Gernert
3138d664d6 (cargo-release) start next development iteration 1.1.6-alpha.0 2021-01-06 09:54:53 +01:00
Jan Lukas Gernert
9072c3cf65 (cargo-release) version 1.1.5 2021-01-06 09:54:01 +01:00
Jan Lukas Gernert
7e05a98f30 update to tokio 1.0 2021-01-06 09:53:47 +01:00