Jan Lukas Gernert
|
4031750956
|
tag cleaning test
|
2023-03-01 01:37:44 +01:00 |
|
Jan Lukas Gernert
|
7c9e527827
|
strip iframes but keep vidoes
|
2023-03-01 01:37:37 +01:00 |
|
Jan Lukas Gernert
|
cea23f1638
|
always use fakehost url for tests
|
2023-03-01 00:46:35 +01:00 |
|
Jan Lukas Gernert
|
80de6d177c
|
url completion test
|
2023-03-01 00:42:44 +01:00 |
|
Jan Lukas Gernert
|
3a92585f4d
|
use url.join() instead of custom code
|
2023-03-01 00:42:03 +01:00 |
|
Jan Lukas Gernert
|
13d147d270
|
fmt
|
2023-02-28 18:30:23 +01:00 |
|
Jan Lukas Gernert
|
451dd61547
|
add two new tests
|
2023-02-28 18:28:55 +01:00 |
|
Jan Lukas Gernert
|
a1c07d436f
|
fix alternative top candidate calcs
|
2023-02-28 18:28:01 +01:00 |
|
Jan Lukas Gernert
|
f4ccd22837
|
fix node ancestor depth
|
2023-02-28 18:27:46 +01:00 |
|
Jan Lukas Gernert
|
58721efa35
|
fix positive/negative class weight regex
|
2023-02-28 18:27:36 +01:00 |
|
Jan Lukas Gernert
|
aea57d0cf3
|
fix has_single_tag_inside_element & update tests
|
2023-02-28 03:59:48 +01:00 |
|
Jan Lukas Gernert
|
31a8033844
|
fixes, more sanitation & 1 more failing test
|
2023-02-28 01:50:13 +01:00 |
|
Jan Lukas Gernert
|
56c08c501a
|
fmt
|
2023-02-27 01:01:16 +01:00 |
|
Jan Lukas Gernert
|
df999cd9fc
|
more cleanups & more tests
|
2023-02-27 01:00:56 +01:00 |
|
Jan Lukas Gernert
|
0834c4d72a
|
fixes
|
2023-02-26 02:22:53 +01:00 |
|
Jan Lukas Gernert
|
d8e3a75b01
|
update configs
|
2023-02-25 01:40:07 +01:00 |
|
Jan Lukas Gernert
|
2460745547
|
cleanup
|
2023-02-25 00:44:18 +01:00 |
|
Jan Lukas Gernert
|
63035ca028
|
fmt
|
2023-02-25 00:43:42 +01:00 |
|
Jan Lukas Gernert
|
e3246af28b
|
refactor & more testing
|
2023-02-25 00:42:26 +01:00 |
|
Jan Lukas Gernert
|
7ae98904d4
|
unwrap noscript images
|
2023-02-23 01:53:42 +01:00 |
|
Jan Lukas Gernert
|
98c06e11f4
|
improve title extraction
|
2023-02-20 02:32:58 +01:00 |
|
Jan Lukas Gernert
|
cce912c354
|
first content extraction kinda working
|
2023-02-20 00:29:44 +01:00 |
|
Jan Lukas Gernert
|
2c76a869e7
|
fmt
|
2023-02-17 14:35:35 +01:00 |
|
Jan Lukas Gernert
|
71a8816747
|
somewhat complete readability algorithm
|
2023-02-17 14:16:01 +01:00 |
|
Jan Lukas Gernert
|
979358fd35
|
more
|
2023-01-01 21:35:46 +01:00 |
|
Jan Lukas Gernert
|
2750ad648d
|
start implementing readability
|
2023-01-01 14:51:34 +01:00 |
|
Jan Lukas Gernert
|
c08f5afa5d
|
move stuff around
|
2022-12-13 08:54:57 +01:00 |
|
Jan Lukas Gernert
|
90383545e0
|
extract & parse charsets other than utf8
|
2022-12-11 17:38:42 +01:00 |
|
Jan Lukas Gernert
|
97b194c9e8
|
clippy regex escape
|
2022-12-11 16:31:01 +01:00 |
|
Jan Lukas Gernert
|
88bb88a38f
|
clippy
|
2022-12-11 16:23:02 +01:00 |
|
Jan Lukas Gernert
|
dc1bf2ef0c
|
fmt
|
2022-12-11 16:19:49 +01:00 |
|
Jan Lukas Gernert
|
22e98fdab7
|
extract thumbnail url
|
2022-12-11 16:18:03 +01:00 |
|
Jan Lukas Gernert
|
0c8aba4f4a
|
refactor: a bit less nested code
|
2022-12-01 10:14:47 +01:00 |
|
Jan Lukas Gernert
|
27be5a3204
|
port failure -> thiserror
|
2022-12-01 09:22:08 +01:00 |
|
Jan Lukas Gernert
|
d906f6b7fe
|
readability stub
|
2022-10-08 23:10:26 +02:00 |
|
Jan Lukas Gernert
|
273ddd832c
|
start refactor & fingerprints
|
2022-10-08 23:09:00 +02:00 |
|
Jan Lukas Gernert
|
29df3aa698
|
simplify ci pipe
|
2022-10-07 09:41:16 +02:00 |
|
Jan Lukas Gernert
|
7b205e8e27
|
fmt
|
2022-10-07 09:32:39 +02:00 |
|
Jan Lukas Gernert
|
69659da983
|
clippy fixes
|
2022-10-07 09:20:10 +02:00 |
|
Jan Lukas Gernert
|
8c2af14871
|
special handling trying to find single page links: fixes youtube
|
2022-10-07 08:48:09 +02:00 |
|
Jan Lukas Gernert
|
7b1b027c6d
|
add support for header values: fixes golem test
|
2022-10-07 07:17:33 +02:00 |
|
Jan Lukas Gernert
|
0e3553b647
|
remove dbg code
|
2022-10-07 07:17:33 +02:00 |
|
Jan Lukas Gernert
|
c1ae011fcd
|
use global rules
|
2022-10-07 07:17:31 +02:00 |
|
Jan Lukas Gernert
|
3a6a70ee64
|
embedded config files
|
2022-10-07 07:16:54 +02:00 |
|
Jan Lukas Gernert
|
aa09666f4c
|
async config loading
|
2022-10-07 07:16:06 +02:00 |
|
Jan Lukas Gernert
|
9fb772bfa8
|
update deps
|
2022-10-07 07:16:06 +02:00 |
|
Jan Lukas Gernert
|
5c66930f21
|
Merge branch 'volker1' into 'master'
Fixed spelling
See merge request news-flash/article_scraper!7
|
2022-06-16 04:11:03 +00:00 |
|
Volker Weißmann
|
593901c849
|
Fixed spelling
|
2022-06-15 19:15:51 +02:00 |
|
Jan Lukas Gernert
|
a4992f55bf
|
(cargo-release) start next development iteration 1.1.8-alpha.0
|
2021-01-21 08:54:46 +01:00 |
|
Jan Lukas Gernert
|
07d8c1fa0f
|
(cargo-release) version 1.1.7
|
2021-01-21 08:53:56 +01:00 |
|