Jan Lukas Gernert
|
873e081c33
|
clean js-links & add new test
|
2023-03-26 11:31:59 +02:00 |
|
Jan Lukas Gernert
|
b541cd73f8
|
whitespace fixes
|
2023-03-24 08:02:08 +01:00 |
|
Jan Lukas Gernert
|
280c516cbe
|
make cleaning more obvious
|
2023-03-19 23:09:06 +01:00 |
|
Jan Lukas Gernert
|
11e08ae505
|
move conditional cleaning right after parsing & port attribute cleaning form readability
|
2023-03-19 22:43:26 +01:00 |
|
Jan Lukas Gernert
|
3a56439ae8
|
fix scorint p tags twice
|
2023-03-19 13:31:27 +01:00 |
|
Jan Lukas Gernert
|
b5d8f43ef8
|
stabalize buzzfeed test
|
2023-03-12 23:13:52 +01:00 |
|
Jan Lukas Gernert
|
603b373e0d
|
lots of fixes
|
2023-03-12 19:36:10 +01:00 |
|
Jan Lukas Gernert
|
11d9657bdd
|
fix using parent if top candidate is only child
|
2023-03-12 14:20:19 +01:00 |
|
Jan Lukas Gernert
|
58a799b096
|
fix negative regex & fmt
|
2023-03-12 11:42:37 +01:00 |
|
Jan Lukas Gernert
|
a356ced646
|
fix potential infinite loop
|
2023-03-10 22:17:31 +01:00 |
|
Jan Lukas Gernert
|
69b7b1fdc2
|
fix clippy
|
2023-03-06 01:51:26 +01:00 |
|
Jan Lukas Gernert
|
881c2b90ac
|
fix alternate candidates
|
2023-03-06 01:36:21 +01:00 |
|
Jan Lukas Gernert
|
2528aa3e18
|
fmt
|
2023-03-04 17:55:17 +01:00 |
|
Jan Lukas Gernert
|
daa5543c4e
|
fix turning div's into p's
|
2023-03-04 17:41:14 +01:00 |
|
Jan Lukas Gernert
|
13d147d270
|
fmt
|
2023-02-28 18:30:23 +01:00 |
|
Jan Lukas Gernert
|
a1c07d436f
|
fix alternative top candidate calcs
|
2023-02-28 18:28:01 +01:00 |
|
Jan Lukas Gernert
|
aea57d0cf3
|
fix has_single_tag_inside_element & update tests
|
2023-02-28 03:59:48 +01:00 |
|
Jan Lukas Gernert
|
31a8033844
|
fixes, more sanitation & 1 more failing test
|
2023-02-28 01:50:13 +01:00 |
|
Jan Lukas Gernert
|
0834c4d72a
|
fixes
|
2023-02-26 02:22:53 +01:00 |
|
Jan Lukas Gernert
|
63035ca028
|
fmt
|
2023-02-25 00:43:42 +01:00 |
|
Jan Lukas Gernert
|
e3246af28b
|
refactor & more testing
|
2023-02-25 00:42:26 +01:00 |
|
Jan Lukas Gernert
|
7ae98904d4
|
unwrap noscript images
|
2023-02-23 01:53:42 +01:00 |
|
Jan Lukas Gernert
|
98c06e11f4
|
improve title extraction
|
2023-02-20 02:32:58 +01:00 |
|
Jan Lukas Gernert
|
cce912c354
|
first content extraction kinda working
|
2023-02-20 00:29:44 +01:00 |
|
Jan Lukas Gernert
|
2c76a869e7
|
fmt
|
2023-02-17 14:35:35 +01:00 |
|
Jan Lukas Gernert
|
71a8816747
|
somewhat complete readability algorithm
|
2023-02-17 14:16:01 +01:00 |
|
Jan Lukas Gernert
|
979358fd35
|
more
|
2023-01-01 21:35:46 +01:00 |
|
Jan Lukas Gernert
|
2750ad648d
|
start implementing readability
|
2023-01-01 14:51:34 +01:00 |
|