1
0
Fork 0
mirror of https://gitlab.com/news-flash/article_scraper.git synced 2025-07-08 08:30:00 +02:00
Commit graph

26 commits

Author SHA1 Message Date
Jan Lukas Gernert
8d529a6d74 fmt 2023-03-12 13:39:29 +01:00
Jan Lukas Gernert
c8bc583864 add exception to conditioal cleaning for list with images 2023-03-12 13:39:10 +01:00
Jan Lukas Gernert
c5c6b788c8 add citilab test & fix noscript unwrapping 2023-03-09 20:10:03 +01:00
Jan Lukas Gernert
881c2b90ac fix alternate candidates 2023-03-06 01:36:21 +01:00
Jan Lukas Gernert
7060e30911 fix conditional clean of nested tags 2023-03-06 00:03:59 +01:00
Jan Lukas Gernert
f5b7ff198a fix post processing 2023-03-04 23:40:01 +01:00
Jan Lukas Gernert
daa5543c4e fix turning div's into p's 2023-03-04 17:41:14 +01:00
Jan Lukas Gernert
df41e690ae fix conditional cleaning class weight 2023-03-02 01:08:52 +01:00
Jan Lukas Gernert
aaff97c184 cleanup 2023-03-01 01:55:26 +01:00
Jan Lukas Gernert
7c9e527827 strip iframes but keep vidoes 2023-03-01 01:37:37 +01:00
Jan Lukas Gernert
f4ccd22837 fix node ancestor depth 2023-02-28 18:27:46 +01:00
Jan Lukas Gernert
aea57d0cf3 fix has_single_tag_inside_element & update tests 2023-02-28 03:59:48 +01:00
Jan Lukas Gernert
31a8033844 fixes, more sanitation & 1 more failing test 2023-02-28 01:50:13 +01:00
Jan Lukas Gernert
0834c4d72a fixes 2023-02-26 02:22:53 +01:00
Jan Lukas Gernert
e3246af28b refactor & more testing 2023-02-25 00:42:26 +01:00
Jan Lukas Gernert
7ae98904d4 unwrap noscript images 2023-02-23 01:53:42 +01:00
Jan Lukas Gernert
2750ad648d start implementing readability 2023-01-01 14:51:34 +01:00
Jan Lukas Gernert
27be5a3204 port failure -> thiserror 2022-12-01 09:22:08 +01:00
Jan Lukas Gernert
d906f6b7fe readability stub 2022-10-08 23:10:26 +02:00
Jan Lukas Gernert
273ddd832c start refactor & fingerprints 2022-10-08 23:09:00 +02:00
Jan Lukas Gernert
69659da983 clippy fixes 2022-10-07 09:20:10 +02:00
Jan Lukas Gernert
8c2af14871 special handling trying to find single page links: fixes youtube 2022-10-07 08:48:09 +02:00
Jan Lukas Gernert
7b1b027c6d add support for header values: fixes golem test 2022-10-07 07:17:33 +02:00
Jan Lukas Gernert
c1ae011fcd use global rules 2022-10-07 07:17:31 +02:00
Jan Lukas Gernert
3a6a70ee64 embedded config files 2022-10-07 07:16:54 +02:00
Jan Lukas Gernert
aa09666f4c async config loading 2022-10-07 07:16:06 +02:00