1
0
Fork 0
mirror of https://gitlab.com/news-flash/article_scraper.git synced 2025-07-08 00:19:59 +02:00
Commit graph

261 commits

Author SHA1 Message Date
Jan Lukas Gernert
210601eaff (cargo-release) version 1.1.2 2020-06-06 05:18:38 +02:00
Jan Lukas Gernert
a42ececb2a check if final url differs from original even without redirect status 2020-06-06 05:18:25 +02:00
Jan Lukas Gernert
3bb8485f40 Merge branch 'fmt+lint' into 'master'
fix fmt+lint

See merge request news-flash/article_scraper!5
2020-05-31 03:04:48 +00:00
Felix Buehler
fa54b82e52 [ci] add fmt + lint checking 2020-05-30 13:07:10 +02:00
Felix Buehler
0c3946dd5b fix fmt+lint 2020-05-29 18:55:00 +02:00
Jan Lukas Gernert
7c9a512a34 (cargo-release) start next development iteration 1.1.2-alpha.0 2020-05-23 14:15:43 +02:00
Jan Lukas Gernert
1552967462 (cargo-release) version 1.1.1 2020-05-23 14:14:50 +02:00
Jan Lukas Gernert
f78cccf2a2 remove unused htmlescaper 2020-05-23 14:14:36 +02:00
Jan Lukas Gernert
9976eb9123 (cargo-release) start next development iteration 1.1.1-alpha.0 2020-05-20 16:34:38 +02:00
Jan Lukas Gernert
f51605a92c naivedatetime -> datetime utc 2020-05-20 16:33:40 +02:00
Jan Lukas Gernert
8f48b69161 remove unneeded files 2020-04-28 03:07:21 +02:00
Jan Lukas Gernert
1fd7173eac update for newer deps 2020-04-28 02:51:30 +02:00
Jan Lukas Gernert
1fbce6413d Merge branch 'master' of gitlab.com:news-flash/article_scraper 2020-04-28 02:34:24 +02:00
Jan Lukas Gernert
f6d021b67b first release 2020-04-28 02:33:25 +02:00
Jan Lukas Gernert
d2960d8539 require client for parsing 2020-02-10 18:01:35 +01:00
Jan Lukas Gernert
a7c247549a remve unused crate 2020-02-06 21:08:58 +01:00
Jan Lukas Gernert
1ecc0fc4b4 option to set custom reqwest client 2020-02-03 17:46:54 +01:00
Jan Lukas Gernert
71055eed1c fix corrupt filename 2020-01-27 17:32:17 +01:00
Jan Lukas Gernert
98348b7e59 tmp: dont strip scripts 2020-01-27 16:47:13 +01:00
Jan Lukas Gernert
23514aff9e less dramatic logging 2020-01-27 02:03:06 +01:00
Jan Lukas Gernert
afe661fe6c only go for single page link if xpath res isn't empty 2020-01-27 01:54:37 +01:00
Jan Lukas Gernert
e58acf828c improve logging clearity 2020-01-27 01:48:54 +01:00
Jan Lukas Gernert
c720dbc299 fixup 2020-01-27 01:35:15 +01:00
Jan Lukas Gernert
b272c99911 fix missing '/' in url completion 2020-01-27 01:21:21 +01:00
Jan Lukas Gernert
f570873aba load config files in background thread 2020-01-26 21:44:26 +01:00
Jan Lukas Gernert
2cac8a2678 got back to stable libxml 2020-01-26 17:34:47 +01:00
Jan Lukas Gernert
8025e8f004 Merge branch 'async' into 'master'
Async

See merge request news-flash/article_scraper!3
2020-01-19 21:15:13 +00:00
Jan Lukas Gernert
d9c7ef1471 Merge branch 'master' into 'async'
# Conflicts:
#   Cargo.toml
#   src/images/mod.rs
#   src/lib.rs
2020-01-19 21:15:08 +00:00
Jan Lukas Gernert
d843809437 update reqwest to stable 2020-01-18 19:06:53 +01:00
Jan Lukas Gernert
9e995122c4 only strip topmost nodes in tree branches 2019-12-19 17:36:48 +01:00
Jan Lukas Gernert
b032ec99bc Merge branch 'async' into 'master'
Async

See merge request news-flash/article_scraper!2
2019-12-16 11:37:00 +00:00
Jan Lukas Gernert
9c35fb9fa8 Async 2019-12-16 11:36:59 +00:00
Jan Lukas Gernert
26346839f2 remove prints 2019-11-19 19:28:49 +01:00
Jan Lukas Gernert
edfbca3cf3 fix document going out of scope 2019-11-19 14:41:08 +01:00
Jan Lukas Gernert
2c6bfed550 frickel 2019-11-18 05:53:34 +01:00
Jan Lukas Gernert
4b8af0d709 wip: async 2019-11-10 14:43:59 +01:00
Jan Lukas Gernert
5f82872d1f don't attempt to redownload embeded images 2019-09-26 21:48:24 +02:00
Jan Lukas Gernert
4f5aef8e17 merge 2019-09-26 21:29:11 +02:00
Jan Lukas Gernert
2137e84743 update to new serialization api of libxml 2019-09-26 21:28:05 +02:00
Jan Lukas Gernert
a99b8dec47 wip: test libxml XML_SAVE_NO_EMPTY option 2019-09-24 18:45:06 +02:00
Jan Lukas Gernert
a44ac3663c don't resize animated images 2019-09-24 14:25:57 +02:00
Jan Lukas Gernert
b489af74bd create data dir if it doesn't exist 2019-09-24 03:16:37 +02:00
Jan Lukas Gernert
481a2f41ac don't abort image download on failed image 2019-09-24 02:56:45 +02:00
Jan Lukas Gernert
f9905c8a9d download images parameter to parse method 2019-09-24 02:43:36 +02:00
Jan Lukas Gernert
f1be8a2608 make image downloader public 2019-09-24 02:40:43 +02:00
Jan Lukas Gernert
efd36dff66 update deps 2019-09-24 02:11:16 +02:00
Jan Lukas Gernert
4d2f5a0a50 update crates 2019-08-21 23:18:28 +02:00
Jan Lukas Gernert
fb6158f7df Merge branch 'embeded_images' into 'master'
Embed images as base64 inside article html

See merge request news-flash/article_scraper!1
2019-03-06 17:56:31 +00:00
Jan Lukas Gernert
3ca59d7f02 embed images as base64 inside article html 2019-03-06 18:37:24 +01:00
Jan Lukas Gernert
e1905d3c2c remove life time annotations added by rust 2018 2018-12-08 23:25:07 +01:00