Jan Lukas Gernert
|
afe661fe6c
|
only go for single page link if xpath res isn't empty
|
2020-01-27 01:54:37 +01:00 |
|
Jan Lukas Gernert
|
e58acf828c
|
improve logging clearity
|
2020-01-27 01:48:54 +01:00 |
|
Jan Lukas Gernert
|
c720dbc299
|
fixup
|
2020-01-27 01:35:15 +01:00 |
|
Jan Lukas Gernert
|
b272c99911
|
fix missing '/' in url completion
|
2020-01-27 01:21:21 +01:00 |
|
Jan Lukas Gernert
|
f570873aba
|
load config files in background thread
|
2020-01-26 21:44:26 +01:00 |
|
Jan Lukas Gernert
|
2cac8a2678
|
got back to stable libxml
|
2020-01-26 17:34:47 +01:00 |
|
Jan Lukas Gernert
|
8025e8f004
|
Merge branch 'async' into 'master'
Async
See merge request news-flash/article_scraper!3
|
2020-01-19 21:15:13 +00:00 |
|
Jan Lukas Gernert
|
d9c7ef1471
|
Merge branch 'master' into 'async'
# Conflicts:
# Cargo.toml
# src/images/mod.rs
# src/lib.rs
|
2020-01-19 21:15:08 +00:00 |
|
Jan Lukas Gernert
|
d843809437
|
update reqwest to stable
|
2020-01-18 19:06:53 +01:00 |
|
Jan Lukas Gernert
|
9e995122c4
|
only strip topmost nodes in tree branches
|
2019-12-19 17:36:48 +01:00 |
|
Jan Lukas Gernert
|
b032ec99bc
|
Merge branch 'async' into 'master'
Async
See merge request news-flash/article_scraper!2
|
2019-12-16 11:37:00 +00:00 |
|
Jan Lukas Gernert
|
9c35fb9fa8
|
Async
|
2019-12-16 11:36:59 +00:00 |
|
Jan Lukas Gernert
|
26346839f2
|
remove prints
|
2019-11-19 19:28:49 +01:00 |
|
Jan Lukas Gernert
|
edfbca3cf3
|
fix document going out of scope
|
2019-11-19 14:41:08 +01:00 |
|
Jan Lukas Gernert
|
2c6bfed550
|
frickel
|
2019-11-18 05:53:34 +01:00 |
|
Jan Lukas Gernert
|
4b8af0d709
|
wip: async
|
2019-11-10 14:43:59 +01:00 |
|
Jan Lukas Gernert
|
5f82872d1f
|
don't attempt to redownload embeded images
|
2019-09-26 21:48:24 +02:00 |
|
Jan Lukas Gernert
|
4f5aef8e17
|
merge
|
2019-09-26 21:29:11 +02:00 |
|
Jan Lukas Gernert
|
2137e84743
|
update to new serialization api of libxml
|
2019-09-26 21:28:05 +02:00 |
|
Jan Lukas Gernert
|
a99b8dec47
|
wip: test libxml XML_SAVE_NO_EMPTY option
|
2019-09-24 18:45:06 +02:00 |
|
Jan Lukas Gernert
|
a44ac3663c
|
don't resize animated images
|
2019-09-24 14:25:57 +02:00 |
|
Jan Lukas Gernert
|
b489af74bd
|
create data dir if it doesn't exist
|
2019-09-24 03:16:37 +02:00 |
|
Jan Lukas Gernert
|
481a2f41ac
|
don't abort image download on failed image
|
2019-09-24 02:56:45 +02:00 |
|
Jan Lukas Gernert
|
f9905c8a9d
|
download images parameter to parse method
|
2019-09-24 02:43:36 +02:00 |
|
Jan Lukas Gernert
|
f1be8a2608
|
make image downloader public
|
2019-09-24 02:40:43 +02:00 |
|
Jan Lukas Gernert
|
efd36dff66
|
update deps
|
2019-09-24 02:11:16 +02:00 |
|
Jan Lukas Gernert
|
4d2f5a0a50
|
update crates
|
2019-08-21 23:18:28 +02:00 |
|
Jan Lukas Gernert
|
fb6158f7df
|
Merge branch 'embeded_images' into 'master'
Embed images as base64 inside article html
See merge request news-flash/article_scraper!1
|
2019-03-06 17:56:31 +00:00 |
|
Jan Lukas Gernert
|
3ca59d7f02
|
embed images as base64 inside article html
|
2019-03-06 18:37:24 +01:00 |
|
Jan Lukas Gernert
|
e1905d3c2c
|
remove life time annotations added by rust 2018
|
2018-12-08 23:25:07 +01:00 |
|
Jan Lukas Gernert
|
02356a51aa
|
fix save_html returning error even it succeeded
|
2018-12-07 15:31:09 +01:00 |
|
Jan Lukas Gernert
|
aa26e099df
|
get rid of 'extern crate'
|
2018-12-07 02:26:46 +01:00 |
|
Jan Lukas Gernert
|
b679f2e1fa
|
update to rust 2018
|
2018-12-07 02:19:40 +01:00 |
|
Jan Lukas Gernert
|
6f38c2bc4c
|
merge
|
2018-12-07 02:17:15 +01:00 |
|
Jan Lukas Gernert
|
5555118914
|
update to reqwest 0.9
|
2018-12-07 02:15:06 +01:00 |
|
Jan Lukas Gernert
|
fcea6cf5d1
|
update to reqwest 0.9
|
2018-12-07 02:14:50 +01:00 |
|
Jan Lukas Gernert
|
fab4306ed9
|
TIL: map_err
|
2018-08-31 16:49:58 +02:00 |
|
Jan Lukas Gernert
|
5beb25a575
|
remove old dependency
|
2018-08-29 18:25:30 +02:00 |
|
Jan Lukas Gernert
|
b76bb7eea7
|
dont use .X versions of deps
|
2018-08-02 22:30:09 +02:00 |
|
Jan Lukas Gernert
|
ed66a705c1
|
update deps and remove html2text
|
2018-08-02 17:05:26 +02:00 |
|
Jan Lukas Gernert
|
4b2e6a24eb
|
initial commit
|
2018-07-31 16:10:09 +02:00 |
|