1
0
Fork 0
mirror of https://gitlab.com/news-flash/article_scraper.git synced 2025-07-07 16:15:32 +02:00
Commit graph

320 commits

Author SHA1 Message Date
Jan Lukas Gernert
35a14b0a5f start improving image download 2023-04-12 08:27:22 +02:00
Jan Lukas Gernert
c198225012 eliminate additional head request 2023-04-11 07:49:01 +02:00
Jan Lukas Gernert
fa41633e11 cli to parse single page with ftr 2023-04-10 13:47:45 +02:00
Jan Lukas Gernert
d978059709 command to use readability extractor 2023-04-07 11:51:14 +02:00
Jan Lukas Gernert
063996d62f readability cli 2023-04-06 08:53:19 +02:00
Jan Lukas Gernert
a2719c8c7e first few cli args 2023-04-05 08:43:00 +02:00
Jan Lukas Gernert
4a7349a5fa add cli crate 2023-04-04 08:42:04 +02:00
Jan Lukas Gernert
9832fa2c77 clippy fixes 2023-04-02 13:23:07 +02:00
Jan Lukas Gernert
acc2fe781a port final tests from readability for now 2023-04-02 13:22:16 +02:00
Jan Lukas Gernert
fcc5cb0e88 fix hidden fallback images for wikipedia & add more tests 2023-04-02 09:55:25 +02:00
Jan Lukas Gernert
3fa8c9674d fix relative srcset urls & more tests 2023-04-02 09:03:37 +02:00
Jan Lukas Gernert
15eec43ad9 6 more tests 2023-04-01 18:22:42 +02:00
Jan Lukas Gernert
cc6ff6d7e2 6 more tags & make seattletimes test consistent 2023-04-01 18:14:05 +02:00
Jan Lukas Gernert
0d6db710e8 4 more test & remove share elements 2023-04-01 17:19:37 +02:00
Jan Lukas Gernert
be6e08bd6d fix replacing font tags 2023-04-01 12:31:56 +02:00
Jan Lukas Gernert
253afc48f0 fmt 2023-03-31 21:21:44 +02:00
Jan Lukas Gernert
e09292d66d qq -.- 2023-03-31 21:21:14 +02:00
Jan Lukas Gernert
a11bb1293b adding more tests 2023-03-31 11:23:44 +02:00
Jan Lukas Gernert
c46d93058f fix nytimes-3 2023-03-31 10:38:04 +02:00
Jan Lukas Gernert
c42ffa57a2 start adding nytimes tests 2023-03-31 09:37:23 +02:00
Jan Lukas Gernert
70e2ed8c82 3 more tests 2023-03-31 07:09:13 +02:00
Jan Lukas Gernert
14af01c214 mozilla test consitency 2023-03-30 21:35:31 +02:00
Jan Lukas Gernert
063ac07410 activate mozilla-1 test 2023-03-30 21:28:51 +02:00
Jan Lukas Gernert
027fab7602 fix url completion for hash urls 2023-03-30 21:27:35 +02:00
Jan Lukas Gernert
b52212bf34 fmt 2023-03-30 08:12:43 +02:00
Jan Lukas Gernert
9e73b94f11 fix medialnewstoday test 2023-03-30 07:58:11 +02:00
Jan Lukas Gernert
029aaffcea 2 passing test & 2 failing tests 2023-03-29 18:08:00 +02:00
Jan Lukas Gernert
92b4427a9f fmt 2023-03-29 08:35:54 +02:00
Jan Lukas Gernert
ded7cf5adb more tests & title fixes 2023-03-29 08:35:36 +02:00
Jan Lukas Gernert
a649b93c03 fmt 2023-03-28 07:25:22 +02:00
Jan Lukas Gernert
e6c11ec684 4 more tests 2023-03-28 07:25:05 +02:00
Jan Lukas Gernert
d8a9d0a757 update lazy image fixing code 2023-03-27 21:10:48 +02:00
Jan Lukas Gernert
2189f527d7 fix strip unlikely table-child & add 2 new tests 2023-03-26 11:54:13 +02:00
Jan Lukas Gernert
873e081c33 clean js-links & add new test 2023-03-26 11:31:59 +02:00
Jan Lukas Gernert
da12fcdab6 clippy 2023-03-24 08:09:26 +01:00
Jan Lukas Gernert
b541cd73f8 whitespace fixes 2023-03-24 08:02:08 +01:00
Jan Lukas Gernert
2217c3c71a add new test 2023-03-20 20:55:41 +01:00
Jan Lukas Gernert
0901a37475 add new test 2023-03-20 00:09:10 +01:00
Jan Lukas Gernert
f7fa696921 fmt & clippy 2023-03-19 23:37:42 +01:00
Jan Lukas Gernert
280c516cbe make cleaning more obvious 2023-03-19 23:09:06 +01:00
Jan Lukas Gernert
11e08ae505 move conditional cleaning right after parsing & port attribute cleaning form readability 2023-03-19 22:43:26 +01:00
Jan Lukas Gernert
47eed3a94f add hidden notes test 2023-03-19 19:54:41 +01:00
Jan Lukas Gernert
78b0ab693e add herald sun test 2023-03-19 19:51:31 +01:00
Jan Lukas Gernert
c90d05cf84 add heise test 2023-03-19 19:25:56 +01:00
Jan Lukas Gernert
41ee8eec2c add guardian test 2023-03-19 19:23:39 +01:00
Jan Lukas Gernert
914b66a0a2 add 3 more tests 2023-03-19 19:16:34 +01:00
Jan Lukas Gernert
001fd8f167 add engadget & firefox blog tests 2023-03-19 18:40:42 +01:00
Jan Lukas Gernert
32dd074b6d add embedded videos test 2023-03-19 15:39:08 +01:00
Jan Lukas Gernert
8309e227eb add 2nd ehow test 2023-03-19 15:35:09 +01:00
Jan Lukas Gernert
d693e37956 fmt 2023-03-19 13:31:44 +01:00