Jan Lukas Gernert
|
9832fa2c77
|
clippy fixes
|
2023-04-02 13:23:07 +02:00 |
|
Jan Lukas Gernert
|
3fa8c9674d
|
fix relative srcset urls & more tests
|
2023-04-02 09:03:37 +02:00 |
|
Jan Lukas Gernert
|
0d6db710e8
|
4 more test & remove share elements
|
2023-04-01 17:19:37 +02:00 |
|
Jan Lukas Gernert
|
be6e08bd6d
|
fix replacing font tags
|
2023-04-01 12:31:56 +02:00 |
|
Jan Lukas Gernert
|
c46d93058f
|
fix nytimes-3
|
2023-03-31 10:38:04 +02:00 |
|
Jan Lukas Gernert
|
c42ffa57a2
|
start adding nytimes tests
|
2023-03-31 09:37:23 +02:00 |
|
Jan Lukas Gernert
|
027fab7602
|
fix url completion for hash urls
|
2023-03-30 21:27:35 +02:00 |
|
Jan Lukas Gernert
|
9e73b94f11
|
fix medialnewstoday test
|
2023-03-30 07:58:11 +02:00 |
|
Jan Lukas Gernert
|
a649b93c03
|
fmt
|
2023-03-28 07:25:22 +02:00 |
|
Jan Lukas Gernert
|
d8a9d0a757
|
update lazy image fixing code
|
2023-03-27 21:10:48 +02:00 |
|
Jan Lukas Gernert
|
2189f527d7
|
fix strip unlikely table-child & add 2 new tests
|
2023-03-26 11:54:13 +02:00 |
|
Jan Lukas Gernert
|
873e081c33
|
clean js-links & add new test
|
2023-03-26 11:31:59 +02:00 |
|
Jan Lukas Gernert
|
b541cd73f8
|
whitespace fixes
|
2023-03-24 08:02:08 +01:00 |
|
Jan Lukas Gernert
|
f7fa696921
|
fmt & clippy
|
2023-03-19 23:37:42 +01:00 |
|
Jan Lukas Gernert
|
280c516cbe
|
make cleaning more obvious
|
2023-03-19 23:09:06 +01:00 |
|
Jan Lukas Gernert
|
11e08ae505
|
move conditional cleaning right after parsing & port attribute cleaning form readability
|
2023-03-19 22:43:26 +01:00 |
|
Jan Lukas Gernert
|
7737311a92
|
small fix
|
2023-03-19 13:31:10 +01:00 |
|
Jan Lukas Gernert
|
848291e4f3
|
small fixes
|
2023-03-12 23:13:28 +01:00 |
|
Jan Lukas Gernert
|
4ca4b73823
|
fmt
|
2023-03-12 19:36:34 +01:00 |
|
Jan Lukas Gernert
|
603b373e0d
|
lots of fixes
|
2023-03-12 19:36:10 +01:00 |
|
Jan Lukas Gernert
|
779afd6245
|
fix cleaning of empty p/div-tags
|
2023-03-12 12:20:50 +01:00 |
|
Jan Lukas Gernert
|
1e71aa2bfb
|
remove duplicate code
|
2023-03-10 22:17:53 +01:00 |
|
Jan Lukas Gernert
|
3ece2522bb
|
add clean links test
|
2023-03-09 21:24:29 +01:00 |
|
Jan Lukas Gernert
|
c5c6b788c8
|
add citilab test & fix noscript unwrapping
|
2023-03-09 20:10:03 +01:00 |
|
Jan Lukas Gernert
|
f5b7ff198a
|
fix post processing
|
2023-03-04 23:40:01 +01:00 |
|
Jan Lukas Gernert
|
7c9e527827
|
strip iframes but keep vidoes
|
2023-03-01 01:37:37 +01:00 |
|
Jan Lukas Gernert
|
3a92585f4d
|
use url.join() instead of custom code
|
2023-03-01 00:42:03 +01:00 |
|
Jan Lukas Gernert
|
aea57d0cf3
|
fix has_single_tag_inside_element & update tests
|
2023-02-28 03:59:48 +01:00 |
|
Jan Lukas Gernert
|
31a8033844
|
fixes, more sanitation & 1 more failing test
|
2023-02-28 01:50:13 +01:00 |
|
Jan Lukas Gernert
|
56c08c501a
|
fmt
|
2023-02-27 01:01:16 +01:00 |
|
Jan Lukas Gernert
|
df999cd9fc
|
more cleanups & more tests
|
2023-02-27 01:00:56 +01:00 |
|
Jan Lukas Gernert
|
0834c4d72a
|
fixes
|
2023-02-26 02:22:53 +01:00 |
|
Jan Lukas Gernert
|
63035ca028
|
fmt
|
2023-02-25 00:43:42 +01:00 |
|
Jan Lukas Gernert
|
e3246af28b
|
refactor & more testing
|
2023-02-25 00:42:26 +01:00 |
|
Jan Lukas Gernert
|
7ae98904d4
|
unwrap noscript images
|
2023-02-23 01:53:42 +01:00 |
|
Jan Lukas Gernert
|
98c06e11f4
|
improve title extraction
|
2023-02-20 02:32:58 +01:00 |
|
Jan Lukas Gernert
|
cce912c354
|
first content extraction kinda working
|
2023-02-20 00:29:44 +01:00 |
|
Jan Lukas Gernert
|
71a8816747
|
somewhat complete readability algorithm
|
2023-02-17 14:16:01 +01:00 |
|
Jan Lukas Gernert
|
979358fd35
|
more
|
2023-01-01 21:35:46 +01:00 |
|
Jan Lukas Gernert
|
2750ad648d
|
start implementing readability
|
2023-01-01 14:51:34 +01:00 |
|
Jan Lukas Gernert
|
c08f5afa5d
|
move stuff around
|
2022-12-13 08:54:57 +01:00 |
|
Jan Lukas Gernert
|
90383545e0
|
extract & parse charsets other than utf8
|
2022-12-11 17:38:42 +01:00 |
|
Jan Lukas Gernert
|
88bb88a38f
|
clippy
|
2022-12-11 16:23:02 +01:00 |
|
Jan Lukas Gernert
|
dc1bf2ef0c
|
fmt
|
2022-12-11 16:19:49 +01:00 |
|
Jan Lukas Gernert
|
22e98fdab7
|
extract thumbnail url
|
2022-12-11 16:18:03 +01:00 |
|
Jan Lukas Gernert
|
0c8aba4f4a
|
refactor: a bit less nested code
|
2022-12-01 10:14:47 +01:00 |
|
Jan Lukas Gernert
|
27be5a3204
|
port failure -> thiserror
|
2022-12-01 09:22:08 +01:00 |
|
Jan Lukas Gernert
|
d906f6b7fe
|
readability stub
|
2022-10-08 23:10:26 +02:00 |
|
Jan Lukas Gernert
|
273ddd832c
|
start refactor & fingerprints
|
2022-10-08 23:09:00 +02:00 |
|