Jan Lukas Gernert
|
47eed3a94f
|
add hidden notes test
|
2023-03-19 19:54:41 +01:00 |
|
Jan Lukas Gernert
|
78b0ab693e
|
add herald sun test
|
2023-03-19 19:51:31 +01:00 |
|
Jan Lukas Gernert
|
c90d05cf84
|
add heise test
|
2023-03-19 19:25:56 +01:00 |
|
Jan Lukas Gernert
|
41ee8eec2c
|
add guardian test
|
2023-03-19 19:23:39 +01:00 |
|
Jan Lukas Gernert
|
914b66a0a2
|
add 3 more tests
|
2023-03-19 19:16:34 +01:00 |
|
Jan Lukas Gernert
|
001fd8f167
|
add engadget & firefox blog tests
|
2023-03-19 18:40:42 +01:00 |
|
Jan Lukas Gernert
|
32dd074b6d
|
add embedded videos test
|
2023-03-19 15:39:08 +01:00 |
|
Jan Lukas Gernert
|
8309e227eb
|
add 2nd ehow test
|
2023-03-19 15:35:09 +01:00 |
|
Jan Lukas Gernert
|
d693e37956
|
fmt
|
2023-03-19 13:31:44 +01:00 |
|
Jan Lukas Gernert
|
cb00f7add2
|
add ehow test
|
2023-03-19 13:31:35 +01:00 |
|
Jan Lukas Gernert
|
3a56439ae8
|
fix scorint p tags twice
|
2023-03-19 13:31:27 +01:00 |
|
Jan Lukas Gernert
|
7737311a92
|
small fix
|
2023-03-19 13:31:10 +01:00 |
|
Jan Lukas Gernert
|
29daf06a1b
|
clippy fix
|
2023-03-13 19:11:42 +01:00 |
|
Jan Lukas Gernert
|
b5d8f43ef8
|
stabalize buzzfeed test
|
2023-03-12 23:13:52 +01:00 |
|
Jan Lukas Gernert
|
848291e4f3
|
small fixes
|
2023-03-12 23:13:28 +01:00 |
|
Jan Lukas Gernert
|
4ca4b73823
|
fmt
|
2023-03-12 19:36:34 +01:00 |
|
Jan Lukas Gernert
|
603b373e0d
|
lots of fixes
|
2023-03-12 19:36:10 +01:00 |
|
Jan Lukas Gernert
|
11d9657bdd
|
fix using parent if top candidate is only child
|
2023-03-12 14:20:19 +01:00 |
|
Jan Lukas Gernert
|
14ba2ccb70
|
add dropbox test
|
2023-03-12 13:50:06 +01:00 |
|
Jan Lukas Gernert
|
8d529a6d74
|
fmt
|
2023-03-12 13:39:29 +01:00 |
|
Jan Lukas Gernert
|
23c156ab2c
|
add new test
|
2023-03-12 13:39:17 +01:00 |
|
Jan Lukas Gernert
|
c8bc583864
|
add exception to conditioal cleaning for list with images
|
2023-03-12 13:39:10 +01:00 |
|
Jan Lukas Gernert
|
c19525f8cd
|
add new test
|
2023-03-12 12:21:00 +01:00 |
|
Jan Lukas Gernert
|
779afd6245
|
fix cleaning of empty p/div-tags
|
2023-03-12 12:20:50 +01:00 |
|
Jan Lukas Gernert
|
d9c92ea42c
|
add new test
|
2023-03-12 11:56:41 +01:00 |
|
Jan Lukas Gernert
|
fa63d297f8
|
add new test
|
2023-03-12 11:53:42 +01:00 |
|
Jan Lukas Gernert
|
c654f63319
|
add cnn test
|
2023-03-12 11:42:44 +01:00 |
|
Jan Lukas Gernert
|
58a799b096
|
fix negative regex & fmt
|
2023-03-12 11:42:37 +01:00 |
|
Jan Lukas Gernert
|
1e71aa2bfb
|
remove duplicate code
|
2023-03-10 22:17:53 +01:00 |
|
Jan Lukas Gernert
|
a356ced646
|
fix potential infinite loop
|
2023-03-10 22:17:31 +01:00 |
|
Jan Lukas Gernert
|
6a58e45c7a
|
add cnet test
|
2023-03-10 07:05:10 +01:00 |
|
Jan Lukas Gernert
|
a915d8fe67
|
update some older tests
|
2023-03-10 06:36:21 +01:00 |
|
Jan Lukas Gernert
|
7b6d22ebc8
|
add cnet-svg-classes test
|
2023-03-10 06:33:24 +01:00 |
|
Jan Lukas Gernert
|
3ece2522bb
|
add clean links test
|
2023-03-09 21:24:29 +01:00 |
|
Jan Lukas Gernert
|
c5c6b788c8
|
add citilab test & fix noscript unwrapping
|
2023-03-09 20:10:03 +01:00 |
|
Jan Lukas Gernert
|
69b7b1fdc2
|
fix clippy
|
2023-03-06 01:51:26 +01:00 |
|
Jan Lukas Gernert
|
612f022879
|
add buzzfeed test
|
2023-03-06 01:36:37 +01:00 |
|
Jan Lukas Gernert
|
881c2b90ac
|
fix alternate candidates
|
2023-03-06 01:36:21 +01:00 |
|
Jan Lukas Gernert
|
45b4141049
|
add new test
|
2023-03-06 00:04:23 +01:00 |
|
Jan Lukas Gernert
|
7060e30911
|
fix conditional clean of nested tags
|
2023-03-06 00:03:59 +01:00 |
|
Jan Lukas Gernert
|
9c5ffda5de
|
add breitbart test
|
2023-03-04 23:40:23 +01:00 |
|
Jan Lukas Gernert
|
f5b7ff198a
|
fix post processing
|
2023-03-04 23:40:01 +01:00 |
|
Jan Lukas Gernert
|
2528aa3e18
|
fmt
|
2023-03-04 17:55:17 +01:00 |
|
Jan Lukas Gernert
|
e2b804d00a
|
add blogger test
|
2023-03-04 17:41:22 +01:00 |
|
Jan Lukas Gernert
|
daa5543c4e
|
fix turning div's into p's
|
2023-03-04 17:41:14 +01:00 |
|
Jan Lukas Gernert
|
d93f5c9677
|
fmt
|
2023-03-02 01:09:48 +01:00 |
|
Jan Lukas Gernert
|
6964724102
|
add bbc test
|
2023-03-02 01:09:44 +01:00 |
|
Jan Lukas Gernert
|
df41e690ae
|
fix conditional cleaning class weight
|
2023-03-02 01:08:52 +01:00 |
|
Jan Lukas Gernert
|
02e043f6de
|
fix negative regex
|
2023-03-02 01:08:28 +01:00 |
|
Jan Lukas Gernert
|
aaff97c184
|
cleanup
|
2023-03-01 01:55:26 +01:00 |
|