mirror of
https://gitlab.com/news-flash/article_scraper.git
synced 2025-07-07 16:15:32 +02:00
move conditional cleaning right after parsing & port attribute cleaning form readability
This commit is contained in:
parent
47eed3a94f
commit
11e08ae505
10 changed files with 943 additions and 104 deletions
325
expected.html
Normal file
325
expected.html
Normal file
|
@ -0,0 +1,325 @@
|
||||||
|
<article><DIV id="readability-page-1"><article role="article"><p>For more than a decade the Web has used XMLHttpRequest (XHR) to achieve
|
||||||
|
asynchronous requests in JavaScript. While very useful, XHR is not a very
|
||||||
|
nice API. It suffers from lack of separation of concerns. The input, output
|
||||||
|
and state are all managed by interacting with one object, and state is
|
||||||
|
tracked using events. Also, the event-based model doesn’t play well with
|
||||||
|
JavaScript’s recent focus on Promise- and generator-based asynchronous
|
||||||
|
programming.</p>
|
||||||
|
<p>The <a href="https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API" target="_blank">Fetch API</a> intends
|
||||||
|
to fix most of these problems. It does this by introducing the same primitives
|
||||||
|
to JS that are used in the HTTP protocol. In addition, it introduces a
|
||||||
|
utility function <code>fetch()</code> that succinctly captures the intention
|
||||||
|
of retrieving a resource from the network.</p>
|
||||||
|
<p>The <a href="https://fetch.spec.whatwg.org/" target="_blank">Fetch specification</a>, which
|
||||||
|
defines the API, nails down the semantics of a user agent fetching a resource.
|
||||||
|
This, combined with ServiceWorkers, is an attempt to:</p>
|
||||||
|
<ol>
|
||||||
|
<li>Improve the offline experience.</li>
|
||||||
|
<li>Expose the building blocks of the Web to the platform as part of the
|
||||||
|
<a href="https://extensiblewebmanifesto.org/" target="_blank">extensible web movement</a>.</li>
|
||||||
|
</ol>
|
||||||
|
<p>As of this writing, the Fetch API is available in Firefox 39 (currently
|
||||||
|
Nightly) and Chrome 42 (currently dev). Github has a <a href="https://github.com/github/fetch" target="_blank">Fetch polyfill</a>.</p>
|
||||||
|
<h2>Feature detection</h2>
|
||||||
|
<p>Fetch API support can be detected by checking for <code>Headers</code>,<code>Request</code>, <code>Response</code> or <code>fetch</code> on
|
||||||
|
the <code>window</code> or <code>worker</code> scope.</p>
|
||||||
|
<h2>Simple fetching</h2>
|
||||||
|
<p>The most useful, high-level part of the Fetch API is the <code>fetch()</code> function.
|
||||||
|
In its simplest form it takes a URL and returns a promise that resolves
|
||||||
|
to the response. The response is captured as a <code>Response</code> object.</p>
|
||||||
|
<DIV><pre>fetch<span>(</span><span>"/data.json"</span><span>)</span>.<span>then</span><span>(</span><span>function</span><span>(</span>res<span>)</span><span>{</span><span>// res instanceof Response == true.</span><span>if</span><span>(</span>res.<span>ok</span><span>)</span><span>{</span>
|
||||||
|
res.<span>json</span><span>(</span><span>)</span>.<span>then</span><span>(</span><span>function</span><span>(</span>data<span>)</span><span>{</span>
|
||||||
|
console.<span>log</span><span>(</span>data.<span>entries</span><span>)</span><span>;</span><span>}</span><span>)</span><span>;</span><span>}</span><span>else</span><span>{</span>
|
||||||
|
console.<span>log</span><span>(</span><span>"Looks like the response wasn't perfect, got status"</span><span>,</span> res.<span>status</span><span>)</span><span>;</span><span>}</span><span>}</span><span>,</span><span>function</span><span>(</span>e<span>)</span><span>{</span>
|
||||||
|
console.<span>log</span><span>(</span><span>"Fetch failed!"</span><span>,</span> e<span>)</span><span>;</span><span>}</span><span>)</span><span>;</span></pre></DIV>
|
||||||
|
<p>Submitting some parameters, it would look like this:</p>
|
||||||
|
<DIV><pre>fetch<span>(</span><span>"http://www.example.org/submit.php"</span><span>,</span><span>{</span>
|
||||||
|
method<span>:</span><span>"POST"</span><span>,</span>
|
||||||
|
headers<span>:</span><span>{</span><span>"Content-Type"</span><span>:</span><span>"application/x-www-form-urlencoded"</span><span>}</span><span>,</span>
|
||||||
|
body<span>:</span><span>"firstName=Nikhil&favColor=blue&password=easytoguess"</span><span>}</span><span>)</span>.<span>then</span><span>(</span><span>function</span><span>(</span>res<span>)</span><span>{</span><span>if</span><span>(</span>res.<span>ok</span><span>)</span><span>{</span>
|
||||||
|
alert<span>(</span><span>"Perfect! Your settings are saved."</span><span>)</span><span>;</span><span>}</span><span>else</span><span>if</span><span>(</span>res.<span>status</span><span>==</span><span>401</span><span>)</span><span>{</span>
|
||||||
|
alert<span>(</span><span>"Oops! You are not authorized."</span><span>)</span><span>;</span><span>}</span><span>}</span><span>,</span><span>function</span><span>(</span>e<span>)</span><span>{</span>
|
||||||
|
alert<span>(</span><span>"Error submitting form!"</span><span>)</span><span>;</span><span>}</span><span>)</span><span>;</span></pre></DIV>
|
||||||
|
<p>The <code>fetch()</code> function’s arguments are the same as those passed
|
||||||
|
to the
|
||||||
|
<br><code>Request()</code> constructor, so you may directly pass arbitrarily
|
||||||
|
complex requests to <code>fetch()</code> as discussed below.</p>
|
||||||
|
<h2>Headers</h2>
|
||||||
|
<p>Fetch introduces 3 interfaces. These are <code>Headers</code>, <code>Request</code> and
|
||||||
|
<br><code>Response</code>. They map directly to the underlying HTTP concepts,
|
||||||
|
but have
|
||||||
|
<br>certain visibility filters in place for privacy and security reasons,
|
||||||
|
such as
|
||||||
|
<br>supporting CORS rules and ensuring cookies aren’t readable by third parties.</p>
|
||||||
|
<p>The <a href="https://fetch.spec.whatwg.org/#headers-class" target="_blank">Headers interface</a> is
|
||||||
|
a simple multi-map of names to values:</p>
|
||||||
|
<DIV><pre><span>var</span> content <span>=</span><span>"Hello World"</span><span>;</span><span>var</span> reqHeaders <span>=</span><span>new</span> Headers<span>(</span><span>)</span><span>;</span>
|
||||||
|
reqHeaders.<span>append</span><span>(</span><span>"Content-Type"</span><span>,</span><span>"text/plain"</span>
|
||||||
|
reqHeaders.<span>append</span><span>(</span><span>"Content-Length"</span><span>,</span> content.<span>length</span>.<span>toString</span><span>(</span><span>)</span><span>)</span><span>;</span>
|
||||||
|
reqHeaders.<span>append</span><span>(</span><span>"X-Custom-Header"</span><span>,</span><span>"ProcessThisImmediately"</span><span>)</span><span>;</span></pre></DIV>
|
||||||
|
<p>The same can be achieved by passing an array of arrays or a JS object
|
||||||
|
literal
|
||||||
|
<br>to the constructor:</p>
|
||||||
|
<DIV><pre>reqHeaders <span>=</span><span>new</span> Headers<span>(</span><span>{</span><span>"Content-Type"</span><span>:</span><span>"text/plain"</span><span>,</span><span>"Content-Length"</span><span>:</span> content.<span>length</span>.<span>toString</span><span>(</span><span>)</span><span>,</span><span>"X-Custom-Header"</span><span>:</span><span>"ProcessThisImmediately"</span><span>,</span><span>}</span><span>)</span><span>;</span></pre></DIV>
|
||||||
|
<p>The contents can be queried and retrieved:</p>
|
||||||
|
<DIV><pre>console.<span>log</span><span>(</span>reqHeaders.<span>has</span><span>(</span><span>"Content-Type"</span><span>)</span><span>)</span><span>;</span><span>// true</span>
|
||||||
|
console.<span>log</span><span>(</span>reqHeaders.<span>has</span><span>(</span><span>"Set-Cookie"</span><span>)</span><span>)</span><span>;</span><span>// false</span>
|
||||||
|
reqHeaders.<span>set</span><span>(</span><span>"Content-Type"</span><span>,</span><span>"text/html"</span><span>)</span><span>;</span>
|
||||||
|
reqHeaders.<span>append</span><span>(</span><span>"X-Custom-Header"</span><span>,</span><span>"AnotherValue"</span><span>)</span><span>;</span>
|
||||||
|
|
||||||
|
console.<span>log</span><span>(</span>reqHeaders.<span>get</span><span>(</span><span>"Content-Length"</span><span>)</span><span>)</span><span>;</span><span>// 11</span>
|
||||||
|
console.<span>log</span><span>(</span>reqHeaders.<span>getAll</span><span>(</span><span>"X-Custom-Header"</span><span>)</span><span>)</span><span>;</span><span>// ["ProcessThisImmediately", "AnotherValue"]</span>
|
||||||
|
|
||||||
|
reqHeaders.<span>delete</span><span>(</span><span>"X-Custom-Header"</span><span>)</span><span>;</span>
|
||||||
|
console.<span>log</span><span>(</span>reqHeaders.<span>getAll</span><span>(</span><span>"X-Custom-Header"</span><span>)</span><span>)</span><span>;</span><span>// []</span></pre></DIV>
|
||||||
|
<p>Some of these operations are only useful in ServiceWorkers, but they provide
|
||||||
|
<br>a much nicer API to Headers.</p>
|
||||||
|
<p>Since Headers can be sent in requests, or received in responses, and have
|
||||||
|
various limitations about what information can and should be mutable, <code>Headers</code> objects
|
||||||
|
have a <strong>guard</strong> property. This is not exposed to the Web, but
|
||||||
|
it affects which mutation operations are allowed on the Headers object.
|
||||||
|
<br>Possible values are:</p>
|
||||||
|
<ul>
|
||||||
|
<li>“none”: default.</li>
|
||||||
|
<li>“request”: guard for a Headers object obtained from a Request (<code>Request.headers</code>).</li>
|
||||||
|
<li>“request-no-cors”: guard for a Headers object obtained from a Request
|
||||||
|
created
|
||||||
|
<br>with mode “no-cors”.</li>
|
||||||
|
<li>“response”: naturally, for Headers obtained from Response (<code>Response.headers</code>).</li>
|
||||||
|
<li>“immutable”: Mostly used for ServiceWorkers, renders a Headers object
|
||||||
|
<br>read-only.</li>
|
||||||
|
</ul>
|
||||||
|
<p>The details of how each guard affects the behaviors of the Headers object
|
||||||
|
are
|
||||||
|
<br>in the <a href="https://fetch.spec.whatwg.org/" target="_blank">specification</a>. For example,
|
||||||
|
you may not append or set a “request” guarded Headers’ “Content-Length”
|
||||||
|
header. Similarly, inserting “Set-Cookie” into a Response header is not
|
||||||
|
allowed so that ServiceWorkers may not set cookies via synthesized Responses.</p>
|
||||||
|
<p>All of the Headers methods throw TypeError if <code>name</code> is not a
|
||||||
|
<a href="https://fetch.spec.whatwg.org/#concept-header-name" target="_blank">valid HTTP Header name</a>. The mutation operations will throw TypeError
|
||||||
|
if there is an immutable guard. Otherwise they fail silently. For example:</p>
|
||||||
|
<DIV><pre><span>var</span> res <span>=</span> Response.<span>error</span><span>(</span><span>)</span><span>;</span><span>try</span><span>{</span>
|
||||||
|
res.<span>headers</span>.<span>set</span><span>(</span><span>"Origin"</span><span>,</span><span>"http://mybank.com"</span><span>)</span><span>;</span><span>}</span><span>catch</span><span>(</span>e<span>)</span><span>{</span>
|
||||||
|
console.<span>log</span><span>(</span><span>"Cannot pretend to be a bank!"</span><span>)</span><span>;</span><span>}</span></pre></DIV>
|
||||||
|
<h2>Request</h2>
|
||||||
|
<p>The Request interface defines a request to fetch a resource over HTTP.
|
||||||
|
URL, method and headers are expected, but the Request also allows specifying
|
||||||
|
a body, a request mode, credentials and cache hints.</p>
|
||||||
|
<p>The simplest Request is of course, just a URL, as you may do to GET a
|
||||||
|
resource.</p>
|
||||||
|
<DIV><pre><span>var</span> req <span>=</span><span>new</span> Request<span>(</span><span>"/index.html"</span><span>)</span><span>;</span>
|
||||||
|
console.<span>log</span><span>(</span>req.<span>method</span><span>)</span><span>;</span><span>// "GET"</span>
|
||||||
|
console.<span>log</span><span>(</span>req.<span>url</span><span>)</span><span>;</span><span>// "http://example.com/index.html"</span></pre></DIV>
|
||||||
|
<p>You may also pass a Request to the <code>Request()</code> constructor to
|
||||||
|
create a copy.
|
||||||
|
<br>(This is not the same as calling the <code>clone()</code> method, which
|
||||||
|
is covered in
|
||||||
|
<br>the “Reading bodies” section.).</p>
|
||||||
|
<DIV><pre><span>var</span> copy <span>=</span><span>new</span> Request<span>(</span>req<span>)</span><span>;</span>
|
||||||
|
console.<span>log</span><span>(</span>copy.<span>method</span><span>)</span><span>;</span><span>// "GET"</span>
|
||||||
|
console.<span>log</span><span>(</span>copy.<span>url</span><span>)</span><span>;</span><span>// "http://example.com/index.html"</span></pre></DIV>
|
||||||
|
<p>Again, this form is probably only useful in ServiceWorkers.</p>
|
||||||
|
<p>The non-URL attributes of the <code>Request</code> can only be set by passing
|
||||||
|
initial
|
||||||
|
<br>values as a second argument to the constructor. This argument is a dictionary.</p>
|
||||||
|
<DIV><pre><span>var</span> uploadReq <span>=</span><span>new</span> Request<span>(</span><span>"/uploadImage"</span><span>,</span><span>{</span>
|
||||||
|
method<span>:</span><span>"POST"</span><span>,</span>
|
||||||
|
headers<span>:</span><span>{</span><span>"Content-Type"</span><span>:</span><span>"image/png"</span><span>,</span><span>}</span><span>,</span>
|
||||||
|
body<span>:</span><span>"image data"</span><span>}</span><span>)</span><span>;</span></pre></DIV>
|
||||||
|
<p>The Request’s mode is used to determine if cross-origin requests lead
|
||||||
|
to valid responses, and which properties on the response are readable.
|
||||||
|
Legal mode values are <code>"same-origin"</code>, <code>"no-cors"</code> (default)
|
||||||
|
and <code>"cors"</code>.</p>
|
||||||
|
<p>The <code>"same-origin"</code> mode is simple, if a request is made to another
|
||||||
|
origin with this mode set, the result is simply an error. You could use
|
||||||
|
this to ensure that
|
||||||
|
<br>a request is always being made to your origin.</p>
|
||||||
|
<DIV><pre><span>var</span> arbitraryUrl <span>=</span> document.<span>getElementById</span><span>(</span><span>"url-input"</span><span>)</span>.<span>value</span><span>;</span>
|
||||||
|
fetch<span>(</span>arbitraryUrl<span>,</span><span>{</span> mode<span>:</span><span>"same-origin"</span><span>}</span><span>)</span>.<span>then</span><span>(</span><span>function</span><span>(</span>res<span>)</span><span>{</span>
|
||||||
|
console.<span>log</span><span>(</span><span>"Response succeeded?"</span><span>,</span> res.<span>ok</span><span>)</span><span>;</span><span>}</span><span>,</span><span>function</span><span>(</span>e<span>)</span><span>{</span>
|
||||||
|
console.<span>log</span><span>(</span><span>"Please enter a same-origin URL!"</span><span>)</span><span>;</span><span>}</span><span>)</span><span>;</span></pre></DIV>
|
||||||
|
<p>The <code>"no-cors"</code> mode captures what the web platform does by default
|
||||||
|
for scripts you import from CDNs, images hosted on other domains, and so
|
||||||
|
on. First, it prevents the method from being anything other than “HEAD”,
|
||||||
|
“GET” or “POST”. Second, if any ServiceWorkers intercept these requests,
|
||||||
|
they may not add or override any headers except for <a href="https://fetch.spec.whatwg.org/#simple-header" target="_blank">these</a>.
|
||||||
|
Third, JavaScript may not access any properties of the resulting Response.
|
||||||
|
This ensures that ServiceWorkers do not affect the semantics of the Web
|
||||||
|
and prevents security and privacy issues that could arise from leaking
|
||||||
|
data across domains.</p>
|
||||||
|
<p><code>"cors"</code> mode is what you’ll usually use to make known cross-origin
|
||||||
|
requests to access various APIs offered by other vendors. These are expected
|
||||||
|
to adhere to
|
||||||
|
<br>the <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Access_control_CORS" target="_blank">CORS protocol</a>.
|
||||||
|
Only a <a href="https://fetch.spec.whatwg.org/#concept-filtered-response-cors" target="_blank">limited set</a> of
|
||||||
|
headers is exposed in the Response, but the body is readable. For example,
|
||||||
|
you could get a list of Flickr’s <a href="https://www.flickr.com/services/api/flickr.interestingness.getList.html" target="_blank">most interesting</a> photos
|
||||||
|
today like this:</p>
|
||||||
|
<DIV><pre><span>var</span> u <span>=</span><span>new</span> URLSearchParams<span>(</span><span>)</span><span>;</span>
|
||||||
|
u.<span>append</span><span>(</span><span>'method'</span><span>,</span><span>'flickr.interestingness.getList'</span><span>)</span><span>;</span>
|
||||||
|
u.<span>append</span><span>(</span><span>'api_key'</span><span>,</span><span>'<insert api key here>'</span><span>)</span><span>;</span>
|
||||||
|
u.<span>append</span><span>(</span><span>'format'</span><span>,</span><span>'json'</span><span>)</span><span>;</span>
|
||||||
|
u.<span>append</span><span>(</span><span>'nojsoncallback'</span><span>,</span><span>'1'</span><span>)</span><span>;</span><span>var</span> apiCall <span>=</span> fetch<span>(</span><span>'https://api.flickr.com/services/rest?'</span><span>+</span> u<span>)</span><span>;</span>
|
||||||
|
|
||||||
|
apiCall.<span>then</span><span>(</span><span>function</span><span>(</span>response<span>)</span><span>{</span><span>return</span> response.<span>json</span><span>(</span><span>)</span>.<span>then</span><span>(</span><span>function</span><span>(</span>json<span>)</span><span>{</span><span>// photo is a list of photos.</span><span>return</span> json.<span>photos</span>.<span>photo</span><span>;</span><span>}</span><span>)</span><span>;</span><span>}</span><span>)</span>.<span>then</span><span>(</span><span>function</span><span>(</span>photos<span>)</span><span>{</span>
|
||||||
|
photos.<span>forEach</span><span>(</span><span>function</span><span>(</span>photo<span>)</span><span>{</span>
|
||||||
|
console.<span>log</span><span>(</span>photo.<span>title</span><span>)</span><span>;</span><span>}</span><span>)</span><span>;</span><span>}</span><span>)</span><span>;</span></pre></DIV>
|
||||||
|
<p>You may not read out the “Date” header since Flickr does not allow it
|
||||||
|
via
|
||||||
|
<br><code>Access-Control-Expose-Headers</code>.</p>
|
||||||
|
<DIV><pre>response.<span>headers</span>.<span>get</span><span>(</span><span>"Date"</span><span>)</span><span>;</span><span>// null</span></pre></DIV>
|
||||||
|
<p>The <code>credentials</code> enumeration determines if cookies for the other
|
||||||
|
domain are
|
||||||
|
<br>sent to cross-origin requests. This is similar to XHR’s <code>withCredentials</code><br>flag, but tri-valued as <code>"omit"</code> (default), <code>"same-origin"</code> and <code>"include"</code>.</p>
|
||||||
|
<p>The Request object will also give the ability to offer caching hints to
|
||||||
|
the user-agent. This is currently undergoing some <a href="https://github.com/slightlyoff/ServiceWorker/issues/585" target="_blank">security review</a>.
|
||||||
|
Firefox exposes the attribute, but it has no effect.</p>
|
||||||
|
<p>Requests have two read-only attributes that are relevant to ServiceWorkers
|
||||||
|
<br>intercepting them. There is the string <code>referrer</code>, which is
|
||||||
|
set by the UA to be
|
||||||
|
<br>the referrer of the Request. This may be an empty string. The other is
|
||||||
|
<br><code>context</code> which is a rather <a href="https://fetch.spec.whatwg.org/#requestcredentials" target="_blank">large enumeration</a> defining
|
||||||
|
what sort of resource is being fetched. This could be “image” if the request
|
||||||
|
is from an
|
||||||
|
<img>tag in the controlled document, “worker” if it is an attempt to load a
|
||||||
|
worker script, and so on. When used with the <code>fetch()</code> function,
|
||||||
|
it is “fetch”.</p>
|
||||||
|
<h2>Response</h2>
|
||||||
|
<p><code>Response</code> instances are returned by calls to <code>fetch()</code>.
|
||||||
|
They can also be created by JS, but this is only useful in ServiceWorkers.</p>
|
||||||
|
<p>We have already seen some attributes of Response when we looked at <code>fetch()</code>.
|
||||||
|
The most obvious candidates are <code>status</code>, an integer (default
|
||||||
|
value 200) and <code>statusText</code> (default value “OK”), which correspond
|
||||||
|
to the HTTP status code and reason. The <code>ok</code> attribute is just
|
||||||
|
a shorthand for checking that <code>status</code> is in the range 200-299
|
||||||
|
inclusive.</p>
|
||||||
|
<p><code>headers</code> is the Response’s Headers object, with guard “response”.
|
||||||
|
The <code>url</code> attribute reflects the URL of the corresponding request.</p>
|
||||||
|
<p>Response also has a <code>type</code>, which is “basic”, “cors”, “default”,
|
||||||
|
“error” or
|
||||||
|
<br>“opaque”.</p>
|
||||||
|
<ul>
|
||||||
|
<li>
|
||||||
|
<code>"basic"</code>: normal, same origin response, with all headers exposed
|
||||||
|
except
|
||||||
|
<br>“Set-Cookie” and “Set-Cookie2″.</li>
|
||||||
|
<li>
|
||||||
|
<code>"cors"</code>: response was received from a valid cross-origin request.
|
||||||
|
<a href="https://fetch.spec.whatwg.org/#concept-filtered-response-cors" target="_blank">Certain headers and the body</a>may be accessed.</li>
|
||||||
|
<li>
|
||||||
|
<code>"error"</code>: network error. No useful information describing
|
||||||
|
the error is available. The Response’s status is 0, headers are empty and
|
||||||
|
immutable. This is the type for a Response obtained from <code>Response.error()</code>.</li>
|
||||||
|
<li>
|
||||||
|
<code>"opaque"</code>: response for “no-cors” request to cross-origin
|
||||||
|
resource. <a href="https://fetch.spec.whatwg.org/#concept-filtered-response-opaque" target="_blank">Severely<br>
|
||||||
|
restricted</a>
|
||||||
|
</li>
|
||||||
|
</ul>
|
||||||
|
<p>The “error” type results in the <code>fetch()</code> Promise rejecting with
|
||||||
|
TypeError.</p>
|
||||||
|
<p>There are certain attributes that are useful only in a ServiceWorker scope.
|
||||||
|
The
|
||||||
|
<br>idiomatic way to return a Response to an intercepted request in ServiceWorkers
|
||||||
|
is:</p>
|
||||||
|
<DIV><pre>addEventListener<span>(</span><span>'fetch'</span><span>,</span><span>function</span><span>(</span>event<span>)</span><span>{</span>
|
||||||
|
event.<span>respondWith</span><span>(</span><span>new</span> Response<span>(</span><span>"Response body"</span><span>,</span><span>{</span>
|
||||||
|
headers<span>:</span><span>{</span><span>"Content-Type"</span><span>:</span><span>"text/plain"</span><span>}</span><span>}</span><span>)</span><span>;</span><span>}</span><span>)</span><span>;</span></pre></DIV>
|
||||||
|
<p>As you can see, Response has a two argument constructor, where both arguments
|
||||||
|
are optional. The first argument is a body initializer, and the second
|
||||||
|
is a dictionary to set the <code>status</code>, <code>statusText</code> and <code>headers</code>.</p>
|
||||||
|
<p>The static method <code>Response.error()</code> simply returns an error
|
||||||
|
response. Similarly, <code>Response.redirect(url, status)</code> returns
|
||||||
|
a Response resulting in
|
||||||
|
<br>a redirect to <code>url</code>.</p>
|
||||||
|
<h2>Dealing with bodies</h2>
|
||||||
|
<p>Both Requests and Responses may contain body data. We’ve been glossing
|
||||||
|
over it because of the various data types body may contain, but we will
|
||||||
|
cover it in detail now.</p>
|
||||||
|
<p>A body is an instance of any of the following types.</p>
|
||||||
|
<ul>
|
||||||
|
<li><a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer" target="_blank">ArrayBuffer</a></li>
|
||||||
|
<li>
|
||||||
|
<a href="https://developer.mozilla.org/en-US/docs/Web/API/ArrayBufferView" target="_blank">ArrayBufferView</a> (Uint8Array
|
||||||
|
and friends)</li>
|
||||||
|
<li>
|
||||||
|
<a href="https://developer.mozilla.org/en-US/docs/Web/API/Blob" target="_blank">Blob</a>/
|
||||||
|
<a href="https://developer.mozilla.org/en-US/docs/Web/API/File" target="_blank">File</a>
|
||||||
|
</li>
|
||||||
|
<li>string</li>
|
||||||
|
<li><a href="https://url.spec.whatwg.org/#interface-urlsearchparams" target="_blank">URLSearchParams</a></li>
|
||||||
|
<li>
|
||||||
|
<a href="https://developer.mozilla.org/en-US/docs/Web/API/FormData" target="_blank">FormData</a> –
|
||||||
|
currently not supported by either Gecko or Blink. Firefox expects to ship
|
||||||
|
this in version 39 along with the rest of Fetch.</li>
|
||||||
|
</ul>
|
||||||
|
<p>In addition, Request and Response both offer the following methods to
|
||||||
|
extract their body. These all return a Promise that is eventually resolved
|
||||||
|
with the actual content.</p>
|
||||||
|
<ul>
|
||||||
|
<li><code>arrayBuffer()</code></li>
|
||||||
|
<li><code>blob()</code></li>
|
||||||
|
<li><code>json()</code></li>
|
||||||
|
<li><code>text()</code></li>
|
||||||
|
<li><code>formData()</code></li>
|
||||||
|
</ul>
|
||||||
|
<p>This is a significant improvement over XHR in terms of ease of use of
|
||||||
|
non-text data!</p>
|
||||||
|
<p>Request bodies can be set by passing <code>body</code> parameters:</p>
|
||||||
|
<DIV><pre><span>var</span> form <span>=</span><span>new</span> FormData<span>(</span>document.<span>getElementById</span><span>(</span><span>'login-form'</span><span>)</span><span>)</span><span>;</span>
|
||||||
|
fetch<span>(</span><span>"/login"</span><span>,</span><span>{</span>
|
||||||
|
method<span>:</span><span>"POST"</span><span>,</span>
|
||||||
|
body<span>:</span> form
|
||||||
|
<span>}</span><span>)</span></pre></DIV>
|
||||||
|
<p>Responses take the first argument as the body.</p>
|
||||||
|
<DIV><pre><span>var</span> res <span>=</span><span>new</span> Response<span>(</span><span>new</span> File<span>(</span><span>[</span><span>"chunk"</span><span>,</span><span>"chunk"</span><span>]</span><span>,</span><span>"archive.zip"</span><span>,</span><span>{</span> type<span>:</span><span>"application/zip"</span><span>}</span><span>)</span><span>)</span><span>;</span></pre></DIV>
|
||||||
|
<p>Both Request and Response (and by extension the <code>fetch()</code> function),
|
||||||
|
will try to intelligently <a href="https://fetch.spec.whatwg.org/#concept-bodyinit-extract" target="_blank">determine the content type</a>.
|
||||||
|
Request will also automatically set a “Content-Type” header if none is
|
||||||
|
set in the dictionary.</p>
|
||||||
|
<h3>Streams and cloning</h3>
|
||||||
|
<p>It is important to realise that Request and Response bodies can only be
|
||||||
|
read once! Both interfaces have a boolean attribute <code>bodyUsed</code> to
|
||||||
|
determine if it is safe to read or not.</p>
|
||||||
|
<DIV><pre><span>var</span> res <span>=</span><span>new</span> Response<span>(</span><span>"one time use"</span><span>)</span><span>;</span>
|
||||||
|
console.<span>log</span><span>(</span>res.<span>bodyUsed</span><span>)</span><span>;</span><span>// false</span>
|
||||||
|
res.<span>text</span><span>(</span><span>)</span>.<span>then</span><span>(</span><span>function</span><span>(</span>v<span>)</span><span>{</span>
|
||||||
|
console.<span>log</span><span>(</span>res.<span>bodyUsed</span><span>)</span><span>;</span><span>// true</span><span>}</span><span>)</span><span>;</span>
|
||||||
|
console.<span>log</span><span>(</span>res.<span>bodyUsed</span><span>)</span><span>;</span><span>// true</span>
|
||||||
|
|
||||||
|
res.<span>text</span><span>(</span><span>)</span>.<span>catch</span><span>(</span><span>function</span><span>(</span>e<span>)</span><span>{</span>
|
||||||
|
console.<span>log</span><span>(</span><span>"Tried to read already consumed Response"</span><span>)</span><span>;</span><span>}</span><span>)</span><span>;</span></pre></DIV>
|
||||||
|
<p>This decision allows easing the transition to an eventual <a href="https://streams.spec.whatwg.org/" target="_blank">stream-based</a> Fetch
|
||||||
|
API. The intention is to let applications consume data as it arrives, allowing
|
||||||
|
for JavaScript to deal with larger files like videos, and perform things
|
||||||
|
like compression and editing on the fly.</p>
|
||||||
|
<p>Often, you’ll want access to the body multiple times. For example, you
|
||||||
|
can use the upcoming <a href="http://slightlyoff.github.io/ServiceWorker/spec/service_worker/index.html#cache-objects" target="_blank">Cache API</a> to
|
||||||
|
store Requests and Responses for offline use, and Cache requires bodies
|
||||||
|
to be available for reading.</p>
|
||||||
|
<p>So how do you read out the body multiple times within such constraints?
|
||||||
|
The API provides a <code>clone()</code> method on the two interfaces. This
|
||||||
|
will return a clone of the object, with a ‘new’ body. <code>clone()</code> MUST
|
||||||
|
be called before the body of the corresponding object has been used. That
|
||||||
|
is, <code>clone()</code> first, read later.</p>
|
||||||
|
<DIV><pre>addEventListener<span>(</span><span>'fetch'</span><span>,</span><span>function</span><span>(</span>evt<span>)</span><span>{</span><span>var</span> sheep <span>=</span><span>new</span> Response<span>(</span><span>"Dolly"</span><span>)</span><span>;</span>
|
||||||
|
console.<span>log</span><span>(</span>sheep.<span>bodyUsed</span><span>)</span><span>;</span><span>// false</span><span>var</span> clone <span>=</span> sheep.<span>clone</span><span>(</span><span>)</span><span>;</span>
|
||||||
|
console.<span>log</span><span>(</span>clone.<span>bodyUsed</span><span>)</span><span>;</span><span>// false</span>
|
||||||
|
|
||||||
|
clone.<span>text</span><span>(</span><span>)</span><span>;</span>
|
||||||
|
console.<span>log</span><span>(</span>sheep.<span>bodyUsed</span><span>)</span><span>;</span><span>// false</span>
|
||||||
|
console.<span>log</span><span>(</span>clone.<span>bodyUsed</span><span>)</span><span>;</span><span>// true</span>
|
||||||
|
|
||||||
|
evt.<span>respondWith</span><span>(</span>cache.<span>add</span><span>(</span>sheep.<span>clone</span><span>(</span><span>)</span><span>)</span>.<span>then</span><span>(</span><span>function</span><span>(</span>e<span>)</span><span>{</span><span>return</span> sheep<span>;</span><span>}</span><span>)</span><span>;</span><span>}</span><span>)</span><span>;</span></pre></DIV>
|
||||||
|
<h2>Future improvements</h2>
|
||||||
|
<p>Along with the transition to streams, Fetch will eventually have the ability
|
||||||
|
to abort running <code>fetch()</code>es and some way to report the progress
|
||||||
|
of a fetch. These are provided by XHR, but are a little tricky to fit in
|
||||||
|
the Promise-based nature of the Fetch API.</p>
|
||||||
|
<p>You can contribute to the evolution of this API by participating in discussions
|
||||||
|
on the <a href="https://whatwg.org/mailing-list" target="_blank">WHATWG mailing list</a> and
|
||||||
|
in the issues in the <a href="https://www.w3.org/Bugs/Public/buglist.cgi?product=WHATWG&component=Fetch&resolution=---" target="_blank">Fetch</a> and
|
||||||
|
<a href="https://github.com/slightlyoff/ServiceWorker/issues" target="_blank">ServiceWorker</a>specifications.</p>
|
||||||
|
<p>For a better web!</p>
|
||||||
|
<p><em>The author would like to thank Andrea Marchesini, Anne van Kesteren and Ben<br>
|
||||||
|
Kelly for helping with the specification and implementation.</em></p></article></DIV></article>
|
|
@ -27,19 +27,19 @@
|
||||||
<p>The most useful, high-level part of the Fetch API is the <code>fetch()</code> function.
|
<p>The most useful, high-level part of the Fetch API is the <code>fetch()</code> function.
|
||||||
In its simplest form it takes a URL and returns a promise that resolves
|
In its simplest form it takes a URL and returns a promise that resolves
|
||||||
to the response. The response is captured as a <code>Response</code> object.</p>
|
to the response. The response is captured as a <code>Response</code> object.</p>
|
||||||
<div><DIV><pre>fetch<span>(</span><span>"/data.json"</span><span>)</span>.<span>then</span><span>(</span><span>function</span><span>(</span>res<span>)</span><span>{</span><span>// res instanceof Response == true.</span><span>if</span><span>(</span>res.<span>ok</span><span>)</span><span>{</span>
|
<DIV><pre>fetch<span>(</span><span>"/data.json"</span><span>)</span>.<span>then</span><span>(</span><span>function</span><span>(</span>res<span>)</span><span>{</span><span>// res instanceof Response == true.</span><span>if</span><span>(</span>res.<span>ok</span><span>)</span><span>{</span>
|
||||||
res.<span>json</span><span>(</span><span>)</span>.<span>then</span><span>(</span><span>function</span><span>(</span>data<span>)</span><span>{</span>
|
res.<span>json</span><span>(</span><span>)</span>.<span>then</span><span>(</span><span>function</span><span>(</span>data<span>)</span><span>{</span>
|
||||||
console.<span>log</span><span>(</span>data.<span>entries</span><span>)</span><span>;</span><span>}</span><span>)</span><span>;</span><span>}</span><span>else</span><span>{</span>
|
console.<span>log</span><span>(</span>data.<span>entries</span><span>)</span><span>;</span><span>}</span><span>)</span><span>;</span><span>}</span><span>else</span><span>{</span>
|
||||||
console.<span>log</span><span>(</span><span>"Looks like the response wasn't perfect, got status"</span><span>,</span> res.<span>status</span><span>)</span><span>;</span><span>}</span><span>}</span><span>,</span><span>function</span><span>(</span>e<span>)</span><span>{</span>
|
console.<span>log</span><span>(</span><span>"Looks like the response wasn't perfect, got status"</span><span>,</span> res.<span>status</span><span>)</span><span>;</span><span>}</span><span>}</span><span>,</span><span>function</span><span>(</span>e<span>)</span><span>{</span>
|
||||||
console.<span>log</span><span>(</span><span>"Fetch failed!"</span><span>,</span> e<span>)</span><span>;</span><span>}</span><span>)</span><span>;</span></pre></DIV></div>
|
console.<span>log</span><span>(</span><span>"Fetch failed!"</span><span>,</span> e<span>)</span><span>;</span><span>}</span><span>)</span><span>;</span></pre></DIV>
|
||||||
<p>Submitting some parameters, it would look like this:</p>
|
<p>Submitting some parameters, it would look like this:</p>
|
||||||
<div><DIV><pre>fetch<span>(</span><span>"http://www.example.org/submit.php"</span><span>,</span><span>{</span>
|
<DIV><pre>fetch<span>(</span><span>"http://www.example.org/submit.php"</span><span>,</span><span>{</span>
|
||||||
method<span>:</span><span>"POST"</span><span>,</span>
|
method<span>:</span><span>"POST"</span><span>,</span>
|
||||||
headers<span>:</span><span>{</span><span>"Content-Type"</span><span>:</span><span>"application/x-www-form-urlencoded"</span><span>}</span><span>,</span>
|
headers<span>:</span><span>{</span><span>"Content-Type"</span><span>:</span><span>"application/x-www-form-urlencoded"</span><span>}</span><span>,</span>
|
||||||
body<span>:</span><span>"firstName=Nikhil&favColor=blue&password=easytoguess"</span><span>}</span><span>)</span>.<span>then</span><span>(</span><span>function</span><span>(</span>res<span>)</span><span>{</span><span>if</span><span>(</span>res.<span>ok</span><span>)</span><span>{</span>
|
body<span>:</span><span>"firstName=Nikhil&favColor=blue&password=easytoguess"</span><span>}</span><span>)</span>.<span>then</span><span>(</span><span>function</span><span>(</span>res<span>)</span><span>{</span><span>if</span><span>(</span>res.<span>ok</span><span>)</span><span>{</span>
|
||||||
alert<span>(</span><span>"Perfect! Your settings are saved."</span><span>)</span><span>;</span><span>}</span><span>else</span><span>if</span><span>(</span>res.<span>status</span><span>==</span><span>401</span><span>)</span><span>{</span>
|
alert<span>(</span><span>"Perfect! Your settings are saved."</span><span>)</span><span>;</span><span>}</span><span>else</span><span>if</span><span>(</span>res.<span>status</span><span>==</span><span>401</span><span>)</span><span>{</span>
|
||||||
alert<span>(</span><span>"Oops! You are not authorized."</span><span>)</span><span>;</span><span>}</span><span>}</span><span>,</span><span>function</span><span>(</span>e<span>)</span><span>{</span>
|
alert<span>(</span><span>"Oops! You are not authorized."</span><span>)</span><span>;</span><span>}</span><span>}</span><span>,</span><span>function</span><span>(</span>e<span>)</span><span>{</span>
|
||||||
alert<span>(</span><span>"Error submitting form!"</span><span>)</span><span>;</span><span>}</span><span>)</span><span>;</span></pre></DIV></div>
|
alert<span>(</span><span>"Error submitting form!"</span><span>)</span><span>;</span><span>}</span><span>)</span><span>;</span></pre></DIV>
|
||||||
<p>The <code>fetch()</code> function’s arguments are the same as those passed
|
<p>The <code>fetch()</code> function’s arguments are the same as those passed
|
||||||
to the
|
to the
|
||||||
<br><code>Request()</code> constructor, so you may directly pass arbitrarily
|
<br><code>Request()</code> constructor, so you may directly pass arbitrarily
|
||||||
|
@ -53,16 +53,16 @@
|
||||||
<br>supporting CORS rules and ensuring cookies aren’t readable by third parties.</p>
|
<br>supporting CORS rules and ensuring cookies aren’t readable by third parties.</p>
|
||||||
<p>The <a href="https://fetch.spec.whatwg.org/#headers-class" target="_blank">Headers interface</a> is
|
<p>The <a href="https://fetch.spec.whatwg.org/#headers-class" target="_blank">Headers interface</a> is
|
||||||
a simple multi-map of names to values:</p>
|
a simple multi-map of names to values:</p>
|
||||||
<div><DIV><pre><span>var</span> content <span>=</span><span>"Hello World"</span><span>;</span><span>var</span> reqHeaders <span>=</span><span>new</span> Headers<span>(</span><span>)</span><span>;</span>
|
<DIV><pre><span>var</span> content <span>=</span><span>"Hello World"</span><span>;</span><span>var</span> reqHeaders <span>=</span><span>new</span> Headers<span>(</span><span>)</span><span>;</span>
|
||||||
reqHeaders.<span>append</span><span>(</span><span>"Content-Type"</span><span>,</span><span>"text/plain"</span>
|
reqHeaders.<span>append</span><span>(</span><span>"Content-Type"</span><span>,</span><span>"text/plain"</span>
|
||||||
reqHeaders.<span>append</span><span>(</span><span>"Content-Length"</span><span>,</span> content.<span>length</span>.<span>toString</span><span>(</span><span>)</span><span>)</span><span>;</span>
|
reqHeaders.<span>append</span><span>(</span><span>"Content-Length"</span><span>,</span> content.<span>length</span>.<span>toString</span><span>(</span><span>)</span><span>)</span><span>;</span>
|
||||||
reqHeaders.<span>append</span><span>(</span><span>"X-Custom-Header"</span><span>,</span><span>"ProcessThisImmediately"</span><span>)</span><span>;</span></pre></DIV></div>
|
reqHeaders.<span>append</span><span>(</span><span>"X-Custom-Header"</span><span>,</span><span>"ProcessThisImmediately"</span><span>)</span><span>;</span></pre></DIV>
|
||||||
<p>The same can be achieved by passing an array of arrays or a JS object
|
<p>The same can be achieved by passing an array of arrays or a JS object
|
||||||
literal
|
literal
|
||||||
<br>to the constructor:</p>
|
<br>to the constructor:</p>
|
||||||
<div><DIV><pre>reqHeaders <span>=</span><span>new</span> Headers<span>(</span><span>{</span><span>"Content-Type"</span><span>:</span><span>"text/plain"</span><span>,</span><span>"Content-Length"</span><span>:</span> content.<span>length</span>.<span>toString</span><span>(</span><span>)</span><span>,</span><span>"X-Custom-Header"</span><span>:</span><span>"ProcessThisImmediately"</span><span>,</span><span>}</span><span>)</span><span>;</span></pre></DIV></div>
|
<DIV><pre>reqHeaders <span>=</span><span>new</span> Headers<span>(</span><span>{</span><span>"Content-Type"</span><span>:</span><span>"text/plain"</span><span>,</span><span>"Content-Length"</span><span>:</span> content.<span>length</span>.<span>toString</span><span>(</span><span>)</span><span>,</span><span>"X-Custom-Header"</span><span>:</span><span>"ProcessThisImmediately"</span><span>,</span><span>}</span><span>)</span><span>;</span></pre></DIV>
|
||||||
<p>The contents can be queried and retrieved:</p>
|
<p>The contents can be queried and retrieved:</p>
|
||||||
<div><DIV><pre>console.<span>log</span><span>(</span>reqHeaders.<span>has</span><span>(</span><span>"Content-Type"</span><span>)</span><span>)</span><span>;</span><span>// true</span>
|
<DIV><pre>console.<span>log</span><span>(</span>reqHeaders.<span>has</span><span>(</span><span>"Content-Type"</span><span>)</span><span>)</span><span>;</span><span>// true</span>
|
||||||
console.<span>log</span><span>(</span>reqHeaders.<span>has</span><span>(</span><span>"Set-Cookie"</span><span>)</span><span>)</span><span>;</span><span>// false</span>
|
console.<span>log</span><span>(</span>reqHeaders.<span>has</span><span>(</span><span>"Set-Cookie"</span><span>)</span><span>)</span><span>;</span><span>// false</span>
|
||||||
reqHeaders.<span>set</span><span>(</span><span>"Content-Type"</span><span>,</span><span>"text/html"</span><span>)</span><span>;</span>
|
reqHeaders.<span>set</span><span>(</span><span>"Content-Type"</span><span>,</span><span>"text/html"</span><span>)</span><span>;</span>
|
||||||
reqHeaders.<span>append</span><span>(</span><span>"X-Custom-Header"</span><span>,</span><span>"AnotherValue"</span><span>)</span><span>;</span>
|
reqHeaders.<span>append</span><span>(</span><span>"X-Custom-Header"</span><span>,</span><span>"AnotherValue"</span><span>)</span><span>;</span>
|
||||||
|
@ -71,7 +71,7 @@ console.<span>log</span><span>(</span>reqHeaders.<span>get</span><span>(</span><
|
||||||
console.<span>log</span><span>(</span>reqHeaders.<span>getAll</span><span>(</span><span>"X-Custom-Header"</span><span>)</span><span>)</span><span>;</span><span>// ["ProcessThisImmediately", "AnotherValue"]</span>
|
console.<span>log</span><span>(</span>reqHeaders.<span>getAll</span><span>(</span><span>"X-Custom-Header"</span><span>)</span><span>)</span><span>;</span><span>// ["ProcessThisImmediately", "AnotherValue"]</span>
|
||||||
|
|
||||||
reqHeaders.<span>delete</span><span>(</span><span>"X-Custom-Header"</span><span>)</span><span>;</span>
|
reqHeaders.<span>delete</span><span>(</span><span>"X-Custom-Header"</span><span>)</span><span>;</span>
|
||||||
console.<span>log</span><span>(</span>reqHeaders.<span>getAll</span><span>(</span><span>"X-Custom-Header"</span><span>)</span><span>)</span><span>;</span><span>// []</span></pre></DIV></div>
|
console.<span>log</span><span>(</span>reqHeaders.<span>getAll</span><span>(</span><span>"X-Custom-Header"</span><span>)</span><span>)</span><span>;</span><span>// []</span></pre></DIV>
|
||||||
<p>Some of these operations are only useful in ServiceWorkers, but they provide
|
<p>Some of these operations are only useful in ServiceWorkers, but they provide
|
||||||
<br>a much nicer API to Headers.</p>
|
<br>a much nicer API to Headers.</p>
|
||||||
<p>Since Headers can be sent in requests, or received in responses, and have
|
<p>Since Headers can be sent in requests, or received in responses, and have
|
||||||
|
@ -98,34 +98,34 @@ console.<span>log</span><span>(</span>reqHeaders.<span>getAll</span><span>(</spa
|
||||||
<p>All of the Headers methods throw TypeError if <code>name</code> is not a
|
<p>All of the Headers methods throw TypeError if <code>name</code> is not a
|
||||||
<a href="https://fetch.spec.whatwg.org/#concept-header-name" target="_blank">valid HTTP Header name</a>. The mutation operations will throw TypeError
|
<a href="https://fetch.spec.whatwg.org/#concept-header-name" target="_blank">valid HTTP Header name</a>. The mutation operations will throw TypeError
|
||||||
if there is an immutable guard. Otherwise they fail silently. For example:</p>
|
if there is an immutable guard. Otherwise they fail silently. For example:</p>
|
||||||
<div><DIV><pre><span>var</span> res <span>=</span> Response.<span>error</span><span>(</span><span>)</span><span>;</span><span>try</span><span>{</span>
|
<DIV><pre><span>var</span> res <span>=</span> Response.<span>error</span><span>(</span><span>)</span><span>;</span><span>try</span><span>{</span>
|
||||||
res.<span>headers</span>.<span>set</span><span>(</span><span>"Origin"</span><span>,</span><span>"http://mybank.com"</span><span>)</span><span>;</span><span>}</span><span>catch</span><span>(</span>e<span>)</span><span>{</span>
|
res.<span>headers</span>.<span>set</span><span>(</span><span>"Origin"</span><span>,</span><span>"http://mybank.com"</span><span>)</span><span>;</span><span>}</span><span>catch</span><span>(</span>e<span>)</span><span>{</span>
|
||||||
console.<span>log</span><span>(</span><span>"Cannot pretend to be a bank!"</span><span>)</span><span>;</span><span>}</span></pre></DIV></div>
|
console.<span>log</span><span>(</span><span>"Cannot pretend to be a bank!"</span><span>)</span><span>;</span><span>}</span></pre></DIV>
|
||||||
<h2>Request</h2>
|
<h2>Request</h2>
|
||||||
<p>The Request interface defines a request to fetch a resource over HTTP.
|
<p>The Request interface defines a request to fetch a resource over HTTP.
|
||||||
URL, method and headers are expected, but the Request also allows specifying
|
URL, method and headers are expected, but the Request also allows specifying
|
||||||
a body, a request mode, credentials and cache hints.</p>
|
a body, a request mode, credentials and cache hints.</p>
|
||||||
<p>The simplest Request is of course, just a URL, as you may do to GET a
|
<p>The simplest Request is of course, just a URL, as you may do to GET a
|
||||||
resource.</p>
|
resource.</p>
|
||||||
<div><DIV><pre><span>var</span> req <span>=</span><span>new</span> Request<span>(</span><span>"/index.html"</span><span>)</span><span>;</span>
|
<DIV><pre><span>var</span> req <span>=</span><span>new</span> Request<span>(</span><span>"/index.html"</span><span>)</span><span>;</span>
|
||||||
console.<span>log</span><span>(</span>req.<span>method</span><span>)</span><span>;</span><span>// "GET"</span>
|
console.<span>log</span><span>(</span>req.<span>method</span><span>)</span><span>;</span><span>// "GET"</span>
|
||||||
console.<span>log</span><span>(</span>req.<span>url</span><span>)</span><span>;</span><span>// "http://example.com/index.html"</span></pre></DIV></div>
|
console.<span>log</span><span>(</span>req.<span>url</span><span>)</span><span>;</span><span>// "http://example.com/index.html"</span></pre></DIV>
|
||||||
<p>You may also pass a Request to the <code>Request()</code> constructor to
|
<p>You may also pass a Request to the <code>Request()</code> constructor to
|
||||||
create a copy.
|
create a copy.
|
||||||
<br>(This is not the same as calling the <code>clone()</code> method, which
|
<br>(This is not the same as calling the <code>clone()</code> method, which
|
||||||
is covered in
|
is covered in
|
||||||
<br>the “Reading bodies” section.).</p>
|
<br>the “Reading bodies” section.).</p>
|
||||||
<div><DIV><pre><span>var</span> copy <span>=</span><span>new</span> Request<span>(</span>req<span>)</span><span>;</span>
|
<DIV><pre><span>var</span> copy <span>=</span><span>new</span> Request<span>(</span>req<span>)</span><span>;</span>
|
||||||
console.<span>log</span><span>(</span>copy.<span>method</span><span>)</span><span>;</span><span>// "GET"</span>
|
console.<span>log</span><span>(</span>copy.<span>method</span><span>)</span><span>;</span><span>// "GET"</span>
|
||||||
console.<span>log</span><span>(</span>copy.<span>url</span><span>)</span><span>;</span><span>// "http://example.com/index.html"</span></pre></DIV></div>
|
console.<span>log</span><span>(</span>copy.<span>url</span><span>)</span><span>;</span><span>// "http://example.com/index.html"</span></pre></DIV>
|
||||||
<p>Again, this form is probably only useful in ServiceWorkers.</p>
|
<p>Again, this form is probably only useful in ServiceWorkers.</p>
|
||||||
<p>The non-URL attributes of the <code>Request</code> can only be set by passing
|
<p>The non-URL attributes of the <code>Request</code> can only be set by passing
|
||||||
initial
|
initial
|
||||||
<br>values as a second argument to the constructor. This argument is a dictionary.</p>
|
<br>values as a second argument to the constructor. This argument is a dictionary.</p>
|
||||||
<div><DIV><pre><span>var</span> uploadReq <span>=</span><span>new</span> Request<span>(</span><span>"/uploadImage"</span><span>,</span><span>{</span>
|
<DIV><pre><span>var</span> uploadReq <span>=</span><span>new</span> Request<span>(</span><span>"/uploadImage"</span><span>,</span><span>{</span>
|
||||||
method<span>:</span><span>"POST"</span><span>,</span>
|
method<span>:</span><span>"POST"</span><span>,</span>
|
||||||
headers<span>:</span><span>{</span><span>"Content-Type"</span><span>:</span><span>"image/png"</span><span>,</span><span>}</span><span>,</span>
|
headers<span>:</span><span>{</span><span>"Content-Type"</span><span>:</span><span>"image/png"</span><span>,</span><span>}</span><span>,</span>
|
||||||
body<span>:</span><span>"image data"</span><span>}</span><span>)</span><span>;</span></pre></DIV></div>
|
body<span>:</span><span>"image data"</span><span>}</span><span>)</span><span>;</span></pre></DIV>
|
||||||
<p>The Request’s mode is used to determine if cross-origin requests lead
|
<p>The Request’s mode is used to determine if cross-origin requests lead
|
||||||
to valid responses, and which properties on the response are readable.
|
to valid responses, and which properties on the response are readable.
|
||||||
Legal mode values are <code>"same-origin"</code>, <code>"no-cors"</code> (default)
|
Legal mode values are <code>"same-origin"</code>, <code>"no-cors"</code> (default)
|
||||||
|
@ -134,10 +134,10 @@ console.<span>log</span><span>(</span>copy.<span>url</span><span>)</span><span>;
|
||||||
origin with this mode set, the result is simply an error. You could use
|
origin with this mode set, the result is simply an error. You could use
|
||||||
this to ensure that
|
this to ensure that
|
||||||
<br>a request is always being made to your origin.</p>
|
<br>a request is always being made to your origin.</p>
|
||||||
<div><DIV><pre><span>var</span> arbitraryUrl <span>=</span> document.<span>getElementById</span><span>(</span><span>"url-input"</span><span>)</span>.<span>value</span><span>;</span>
|
<DIV><pre><span>var</span> arbitraryUrl <span>=</span> document.<span>getElementById</span><span>(</span><span>"url-input"</span><span>)</span>.<span>value</span><span>;</span>
|
||||||
fetch<span>(</span>arbitraryUrl<span>,</span><span>{</span> mode<span>:</span><span>"same-origin"</span><span>}</span><span>)</span>.<span>then</span><span>(</span><span>function</span><span>(</span>res<span>)</span><span>{</span>
|
fetch<span>(</span>arbitraryUrl<span>,</span><span>{</span> mode<span>:</span><span>"same-origin"</span><span>}</span><span>)</span>.<span>then</span><span>(</span><span>function</span><span>(</span>res<span>)</span><span>{</span>
|
||||||
console.<span>log</span><span>(</span><span>"Response succeeded?"</span><span>,</span> res.<span>ok</span><span>)</span><span>;</span><span>}</span><span>,</span><span>function</span><span>(</span>e<span>)</span><span>{</span>
|
console.<span>log</span><span>(</span><span>"Response succeeded?"</span><span>,</span> res.<span>ok</span><span>)</span><span>;</span><span>}</span><span>,</span><span>function</span><span>(</span>e<span>)</span><span>{</span>
|
||||||
console.<span>log</span><span>(</span><span>"Please enter a same-origin URL!"</span><span>)</span><span>;</span><span>}</span><span>)</span><span>;</span></pre></DIV></div>
|
console.<span>log</span><span>(</span><span>"Please enter a same-origin URL!"</span><span>)</span><span>;</span><span>}</span><span>)</span><span>;</span></pre></DIV>
|
||||||
<p>The <code>"no-cors"</code> mode captures what the web platform does by default
|
<p>The <code>"no-cors"</code> mode captures what the web platform does by default
|
||||||
for scripts you import from CDNs, images hosted on other domains, and so
|
for scripts you import from CDNs, images hosted on other domains, and so
|
||||||
on. First, it prevents the method from being anything other than “HEAD”,
|
on. First, it prevents the method from being anything other than “HEAD”,
|
||||||
|
@ -155,7 +155,7 @@ fetch<span>(</span>arbitraryUrl<span>,</span><span>{</span> mode<span>:</span><s
|
||||||
headers is exposed in the Response, but the body is readable. For example,
|
headers is exposed in the Response, but the body is readable. For example,
|
||||||
you could get a list of Flickr’s <a href="https://www.flickr.com/services/api/flickr.interestingness.getList.html" target="_blank">most interesting</a> photos
|
you could get a list of Flickr’s <a href="https://www.flickr.com/services/api/flickr.interestingness.getList.html" target="_blank">most interesting</a> photos
|
||||||
today like this:</p>
|
today like this:</p>
|
||||||
<div><DIV><pre><span>var</span> u <span>=</span><span>new</span> URLSearchParams<span>(</span><span>)</span><span>;</span>
|
<DIV><pre><span>var</span> u <span>=</span><span>new</span> URLSearchParams<span>(</span><span>)</span><span>;</span>
|
||||||
u.<span>append</span><span>(</span><span>'method'</span><span>,</span><span>'flickr.interestingness.getList'</span><span>)</span><span>;</span>
|
u.<span>append</span><span>(</span><span>'method'</span><span>,</span><span>'flickr.interestingness.getList'</span><span>)</span><span>;</span>
|
||||||
u.<span>append</span><span>(</span><span>'api_key'</span><span>,</span><span>'<insert api key here>'</span><span>)</span><span>;</span>
|
u.<span>append</span><span>(</span><span>'api_key'</span><span>,</span><span>'<insert api key here>'</span><span>)</span><span>;</span>
|
||||||
u.<span>append</span><span>(</span><span>'format'</span><span>,</span><span>'json'</span><span>)</span><span>;</span>
|
u.<span>append</span><span>(</span><span>'format'</span><span>,</span><span>'json'</span><span>)</span><span>;</span>
|
||||||
|
@ -163,11 +163,11 @@ u.<span>append</span><span>(</span><span>'nojsoncallback'</span><span>,</span><s
|
||||||
|
|
||||||
apiCall.<span>then</span><span>(</span><span>function</span><span>(</span>response<span>)</span><span>{</span><span>return</span> response.<span>json</span><span>(</span><span>)</span>.<span>then</span><span>(</span><span>function</span><span>(</span>json<span>)</span><span>{</span><span>// photo is a list of photos.</span><span>return</span> json.<span>photos</span>.<span>photo</span><span>;</span><span>}</span><span>)</span><span>;</span><span>}</span><span>)</span>.<span>then</span><span>(</span><span>function</span><span>(</span>photos<span>)</span><span>{</span>
|
apiCall.<span>then</span><span>(</span><span>function</span><span>(</span>response<span>)</span><span>{</span><span>return</span> response.<span>json</span><span>(</span><span>)</span>.<span>then</span><span>(</span><span>function</span><span>(</span>json<span>)</span><span>{</span><span>// photo is a list of photos.</span><span>return</span> json.<span>photos</span>.<span>photo</span><span>;</span><span>}</span><span>)</span><span>;</span><span>}</span><span>)</span>.<span>then</span><span>(</span><span>function</span><span>(</span>photos<span>)</span><span>{</span>
|
||||||
photos.<span>forEach</span><span>(</span><span>function</span><span>(</span>photo<span>)</span><span>{</span>
|
photos.<span>forEach</span><span>(</span><span>function</span><span>(</span>photo<span>)</span><span>{</span>
|
||||||
console.<span>log</span><span>(</span>photo.<span>title</span><span>)</span><span>;</span><span>}</span><span>)</span><span>;</span><span>}</span><span>)</span><span>;</span></pre></DIV></div>
|
console.<span>log</span><span>(</span>photo.<span>title</span><span>)</span><span>;</span><span>}</span><span>)</span><span>;</span><span>}</span><span>)</span><span>;</span></pre></DIV>
|
||||||
<p>You may not read out the “Date” header since Flickr does not allow it
|
<p>You may not read out the “Date” header since Flickr does not allow it
|
||||||
via
|
via
|
||||||
<br><code>Access-Control-Expose-Headers</code>.</p>
|
<br><code>Access-Control-Expose-Headers</code>.</p>
|
||||||
<div><DIV><pre>response.<span>headers</span>.<span>get</span><span>(</span><span>"Date"</span><span>)</span><span>;</span><span>// null</span></pre></DIV></div>
|
<DIV><pre>response.<span>headers</span>.<span>get</span><span>(</span><span>"Date"</span><span>)</span><span>;</span><span>// null</span></pre></DIV>
|
||||||
<p>The <code>credentials</code> enumeration determines if cookies for the other
|
<p>The <code>credentials</code> enumeration determines if cookies for the other
|
||||||
domain are
|
domain are
|
||||||
<br>sent to cross-origin requests. This is similar to XHR’s <code>withCredentials</code><br>flag, but tri-valued as <code>"omit"</code> (default), <code>"same-origin"</code> and <code>"include"</code>.</p>
|
<br>sent to cross-origin requests. This is similar to XHR’s <code>withCredentials</code><br>flag, but tri-valued as <code>"omit"</code> (default), <code>"same-origin"</code> and <code>"include"</code>.</p>
|
||||||
|
@ -222,9 +222,9 @@ apiCall.<span>then</span><span>(</span><span>function</span><span>(</span>respon
|
||||||
The
|
The
|
||||||
<br>idiomatic way to return a Response to an intercepted request in ServiceWorkers
|
<br>idiomatic way to return a Response to an intercepted request in ServiceWorkers
|
||||||
is:</p>
|
is:</p>
|
||||||
<div><DIV><pre>addEventListener<span>(</span><span>'fetch'</span><span>,</span><span>function</span><span>(</span>event<span>)</span><span>{</span>
|
<DIV><pre>addEventListener<span>(</span><span>'fetch'</span><span>,</span><span>function</span><span>(</span>event<span>)</span><span>{</span>
|
||||||
event.<span>respondWith</span><span>(</span><span>new</span> Response<span>(</span><span>"Response body"</span><span>,</span><span>{</span>
|
event.<span>respondWith</span><span>(</span><span>new</span> Response<span>(</span><span>"Response body"</span><span>,</span><span>{</span>
|
||||||
headers<span>:</span><span>{</span><span>"Content-Type"</span><span>:</span><span>"text/plain"</span><span>}</span><span>}</span><span>)</span><span>;</span><span>}</span><span>)</span><span>;</span></pre></DIV></div>
|
headers<span>:</span><span>{</span><span>"Content-Type"</span><span>:</span><span>"text/plain"</span><span>}</span><span>}</span><span>)</span><span>;</span><span>}</span><span>)</span><span>;</span></pre></DIV>
|
||||||
<p>As you can see, Response has a two argument constructor, where both arguments
|
<p>As you can see, Response has a two argument constructor, where both arguments
|
||||||
are optional. The first argument is a body initializer, and the second
|
are optional. The first argument is a body initializer, and the second
|
||||||
is a dictionary to set the <code>status</code>, <code>statusText</code> and <code>headers</code>.</p>
|
is a dictionary to set the <code>status</code>, <code>statusText</code> and <code>headers</code>.</p>
|
||||||
|
@ -266,13 +266,13 @@ apiCall.<span>then</span><span>(</span><span>function</span><span>(</span>respon
|
||||||
<p>This is a significant improvement over XHR in terms of ease of use of
|
<p>This is a significant improvement over XHR in terms of ease of use of
|
||||||
non-text data!</p>
|
non-text data!</p>
|
||||||
<p>Request bodies can be set by passing <code>body</code> parameters:</p>
|
<p>Request bodies can be set by passing <code>body</code> parameters:</p>
|
||||||
<div><DIV><pre><span>var</span> form <span>=</span><span>new</span> FormData<span>(</span>document.<span>getElementById</span><span>(</span><span>'login-form'</span><span>)</span><span>)</span><span>;</span>
|
<DIV><pre><span>var</span> form <span>=</span><span>new</span> FormData<span>(</span>document.<span>getElementById</span><span>(</span><span>'login-form'</span><span>)</span><span>)</span><span>;</span>
|
||||||
fetch<span>(</span><span>"/login"</span><span>,</span><span>{</span>
|
fetch<span>(</span><span>"/login"</span><span>,</span><span>{</span>
|
||||||
method<span>:</span><span>"POST"</span><span>,</span>
|
method<span>:</span><span>"POST"</span><span>,</span>
|
||||||
body<span>:</span> form
|
body<span>:</span> form
|
||||||
<span>}</span><span>)</span></pre></DIV></div>
|
<span>}</span><span>)</span></pre></DIV>
|
||||||
<p>Responses take the first argument as the body.</p>
|
<p>Responses take the first argument as the body.</p>
|
||||||
<div><DIV><pre><span>var</span> res <span>=</span><span>new</span> Response<span>(</span><span>new</span> File<span>(</span><span>[</span><span>"chunk"</span><span>,</span><span>"chunk"</span><span>]</span><span>,</span><span>"archive.zip"</span><span>,</span><span>{</span> type<span>:</span><span>"application/zip"</span><span>}</span><span>)</span><span>)</span><span>;</span></pre></DIV></div>
|
<DIV><pre><span>var</span> res <span>=</span><span>new</span> Response<span>(</span><span>new</span> File<span>(</span><span>[</span><span>"chunk"</span><span>,</span><span>"chunk"</span><span>]</span><span>,</span><span>"archive.zip"</span><span>,</span><span>{</span> type<span>:</span><span>"application/zip"</span><span>}</span><span>)</span><span>)</span><span>;</span></pre></DIV>
|
||||||
<p>Both Request and Response (and by extension the <code>fetch()</code> function),
|
<p>Both Request and Response (and by extension the <code>fetch()</code> function),
|
||||||
will try to intelligently <a href="https://fetch.spec.whatwg.org/#concept-bodyinit-extract" target="_blank">determine the content type</a>.
|
will try to intelligently <a href="https://fetch.spec.whatwg.org/#concept-bodyinit-extract" target="_blank">determine the content type</a>.
|
||||||
Request will also automatically set a “Content-Type” header if none is
|
Request will also automatically set a “Content-Type” header if none is
|
||||||
|
@ -281,14 +281,14 @@ fetch<span>(</span><span>"/login"</span><span>,</span><span>{</span>
|
||||||
<p>It is important to realise that Request and Response bodies can only be
|
<p>It is important to realise that Request and Response bodies can only be
|
||||||
read once! Both interfaces have a boolean attribute <code>bodyUsed</code> to
|
read once! Both interfaces have a boolean attribute <code>bodyUsed</code> to
|
||||||
determine if it is safe to read or not.</p>
|
determine if it is safe to read or not.</p>
|
||||||
<div><DIV><pre><span>var</span> res <span>=</span><span>new</span> Response<span>(</span><span>"one time use"</span><span>)</span><span>;</span>
|
<DIV><pre><span>var</span> res <span>=</span><span>new</span> Response<span>(</span><span>"one time use"</span><span>)</span><span>;</span>
|
||||||
console.<span>log</span><span>(</span>res.<span>bodyUsed</span><span>)</span><span>;</span><span>// false</span>
|
console.<span>log</span><span>(</span>res.<span>bodyUsed</span><span>)</span><span>;</span><span>// false</span>
|
||||||
res.<span>text</span><span>(</span><span>)</span>.<span>then</span><span>(</span><span>function</span><span>(</span>v<span>)</span><span>{</span>
|
res.<span>text</span><span>(</span><span>)</span>.<span>then</span><span>(</span><span>function</span><span>(</span>v<span>)</span><span>{</span>
|
||||||
console.<span>log</span><span>(</span>res.<span>bodyUsed</span><span>)</span><span>;</span><span>// true</span><span>}</span><span>)</span><span>;</span>
|
console.<span>log</span><span>(</span>res.<span>bodyUsed</span><span>)</span><span>;</span><span>// true</span><span>}</span><span>)</span><span>;</span>
|
||||||
console.<span>log</span><span>(</span>res.<span>bodyUsed</span><span>)</span><span>;</span><span>// true</span>
|
console.<span>log</span><span>(</span>res.<span>bodyUsed</span><span>)</span><span>;</span><span>// true</span>
|
||||||
|
|
||||||
res.<span>text</span><span>(</span><span>)</span>.<span>catch</span><span>(</span><span>function</span><span>(</span>e<span>)</span><span>{</span>
|
res.<span>text</span><span>(</span><span>)</span>.<span>catch</span><span>(</span><span>function</span><span>(</span>e<span>)</span><span>{</span>
|
||||||
console.<span>log</span><span>(</span><span>"Tried to read already consumed Response"</span><span>)</span><span>;</span><span>}</span><span>)</span><span>;</span></pre></DIV></div>
|
console.<span>log</span><span>(</span><span>"Tried to read already consumed Response"</span><span>)</span><span>;</span><span>}</span><span>)</span><span>;</span></pre></DIV>
|
||||||
<p>This decision allows easing the transition to an eventual <a href="https://streams.spec.whatwg.org/" target="_blank">stream-based</a> Fetch
|
<p>This decision allows easing the transition to an eventual <a href="https://streams.spec.whatwg.org/" target="_blank">stream-based</a> Fetch
|
||||||
API. The intention is to let applications consume data as it arrives, allowing
|
API. The intention is to let applications consume data as it arrives, allowing
|
||||||
for JavaScript to deal with larger files like videos, and perform things
|
for JavaScript to deal with larger files like videos, and perform things
|
||||||
|
@ -302,7 +302,7 @@ res.<span>text</span><span>(</span><span>)</span>.<span>catch</span><span>(</spa
|
||||||
will return a clone of the object, with a ‘new’ body. <code>clone()</code> MUST
|
will return a clone of the object, with a ‘new’ body. <code>clone()</code> MUST
|
||||||
be called before the body of the corresponding object has been used. That
|
be called before the body of the corresponding object has been used. That
|
||||||
is, <code>clone()</code> first, read later.</p>
|
is, <code>clone()</code> first, read later.</p>
|
||||||
<div><DIV><pre>addEventListener<span>(</span><span>'fetch'</span><span>,</span><span>function</span><span>(</span>evt<span>)</span><span>{</span><span>var</span> sheep <span>=</span><span>new</span> Response<span>(</span><span>"Dolly"</span><span>)</span><span>;</span>
|
<DIV><pre>addEventListener<span>(</span><span>'fetch'</span><span>,</span><span>function</span><span>(</span>evt<span>)</span><span>{</span><span>var</span> sheep <span>=</span><span>new</span> Response<span>(</span><span>"Dolly"</span><span>)</span><span>;</span>
|
||||||
console.<span>log</span><span>(</span>sheep.<span>bodyUsed</span><span>)</span><span>;</span><span>// false</span><span>var</span> clone <span>=</span> sheep.<span>clone</span><span>(</span><span>)</span><span>;</span>
|
console.<span>log</span><span>(</span>sheep.<span>bodyUsed</span><span>)</span><span>;</span><span>// false</span><span>var</span> clone <span>=</span> sheep.<span>clone</span><span>(</span><span>)</span><span>;</span>
|
||||||
console.<span>log</span><span>(</span>clone.<span>bodyUsed</span><span>)</span><span>;</span><span>// false</span>
|
console.<span>log</span><span>(</span>clone.<span>bodyUsed</span><span>)</span><span>;</span><span>// false</span>
|
||||||
|
|
||||||
|
@ -310,7 +310,7 @@ res.<span>text</span><span>(</span><span>)</span>.<span>catch</span><span>(</spa
|
||||||
console.<span>log</span><span>(</span>sheep.<span>bodyUsed</span><span>)</span><span>;</span><span>// false</span>
|
console.<span>log</span><span>(</span>sheep.<span>bodyUsed</span><span>)</span><span>;</span><span>// false</span>
|
||||||
console.<span>log</span><span>(</span>clone.<span>bodyUsed</span><span>)</span><span>;</span><span>// true</span>
|
console.<span>log</span><span>(</span>clone.<span>bodyUsed</span><span>)</span><span>;</span><span>// true</span>
|
||||||
|
|
||||||
evt.<span>respondWith</span><span>(</span>cache.<span>add</span><span>(</span>sheep.<span>clone</span><span>(</span><span>)</span><span>)</span>.<span>then</span><span>(</span><span>function</span><span>(</span>e<span>)</span><span>{</span><span>return</span> sheep<span>;</span><span>}</span><span>)</span><span>;</span><span>}</span><span>)</span><span>;</span></pre></DIV></div>
|
evt.<span>respondWith</span><span>(</span>cache.<span>add</span><span>(</span>sheep.<span>clone</span><span>(</span><span>)</span><span>)</span>.<span>then</span><span>(</span><span>function</span><span>(</span>e<span>)</span><span>{</span><span>return</span> sheep<span>;</span><span>}</span><span>)</span><span>;</span><span>}</span><span>)</span><span>;</span></pre></DIV>
|
||||||
<h2>Future improvements</h2>
|
<h2>Future improvements</h2>
|
||||||
<p>Along with the transition to streams, Fetch will eventually have the ability
|
<p>Along with the transition to streams, Fetch will eventually have the ability
|
||||||
to abort running <code>fetch()</code>es and some way to report the progress
|
to abort running <code>fetch()</code>es and some way to report the progress
|
||||||
|
|
|
@ -4,14 +4,14 @@
|
||||||
<h2>
|
<h2>
|
||||||
So what's a GreenPak?</h2>
|
So what's a GreenPak?</h2>
|
||||||
<br><p> Silego Technology is a fabless semiconductor company located in the SF Bay area, which makes (among other things) a line of programmable logic devices known as GreenPak. Their <a href="http://www.silego.com/products/greenpak5.html" target="_blank">5th generation parts</a> were just announced, but I started this project before that happened so I'm still targeting the <a href="http://www.silego.com/products/greenpak4.html" target="_blank">4th generation</a>.<br><br> GreenPak devices are kind of like itty bitty <a href="http://www.cypress.com/products/32-bit-arm-cortex-m-psoc" target="_blank">PSoCs</a> - they have a mixed signal fabric with an ADC, DACs, comparators, voltage references, plus a digital LUT/FF fabric and some typical digital MCU peripherals like counters and oscillators (but no CPU).<br><br> It's actually an interesting architecture - FPGAs (including some devices marketed as CPLDs) are a 2D array of LUTs connected via wires to adjacent cells, and true (product term) CPLDs are a star topology of AND-OR arrays connected by a crossbar. GreenPak, on the other hand, is a star topology of LUTs, flipflops, and analog/digital hard IP connected to a crossbar.<br><br> Without further ado, here's a block diagram showing all the cool stuff you get in the SLG46620V:</p>
|
<br><p> Silego Technology is a fabless semiconductor company located in the SF Bay area, which makes (among other things) a line of programmable logic devices known as GreenPak. Their <a href="http://www.silego.com/products/greenpak5.html" target="_blank">5th generation parts</a> were just announced, but I started this project before that happened so I'm still targeting the <a href="http://www.silego.com/products/greenpak4.html" target="_blank">4th generation</a>.<br><br> GreenPak devices are kind of like itty bitty <a href="http://www.cypress.com/products/32-bit-arm-cortex-m-psoc" target="_blank">PSoCs</a> - they have a mixed signal fabric with an ADC, DACs, comparators, voltage references, plus a digital LUT/FF fabric and some typical digital MCU peripherals like counters and oscillators (but no CPU).<br><br> It's actually an interesting architecture - FPGAs (including some devices marketed as CPLDs) are a 2D array of LUTs connected via wires to adjacent cells, and true (product term) CPLDs are a star topology of AND-OR arrays connected by a crossbar. GreenPak, on the other hand, is a star topology of LUTs, flipflops, and analog/digital hard IP connected to a crossbar.<br><br> Without further ado, here's a block diagram showing all the cool stuff you get in the SLG46620V:</p>
|
||||||
<table cellpadding="0" cellspacing="0"><tbody>
|
<table><tbody>
|
||||||
<tr><td><a href="https://1.bp.blogspot.com/-YIPC5jkXkDE/Vy7YPSqFKWI/AAAAAAAAAxI/a7D6Ji2GxoUvcrwUkI4RLZcr2LFQEJCTACLcB/s1600/block-diagram.png" imageanchor="1" target="_blank"><img border="0" height="512" src="https://1.bp.blogspot.com/-YIPC5jkXkDE/Vy7YPSqFKWI/AAAAAAAAAxI/a7D6Ji2GxoUvcrwUkI4RLZcr2LFQEJCTACLcB/s640/block-diagram.png" width="640"></a></td></tr>
|
<tr><td><a href="https://1.bp.blogspot.com/-YIPC5jkXkDE/Vy7YPSqFKWI/AAAAAAAAAxI/a7D6Ji2GxoUvcrwUkI4RLZcr2LFQEJCTACLcB/s1600/block-diagram.png" imageanchor="1" target="_blank"><img height="512" src="https://1.bp.blogspot.com/-YIPC5jkXkDE/Vy7YPSqFKWI/AAAAAAAAAxI/a7D6Ji2GxoUvcrwUkI4RLZcr2LFQEJCTACLcB/s640/block-diagram.png" width="640"></a></td></tr>
|
||||||
<tr><td>SLG46620V block diagram (from device datasheet)</td></tr>
|
<tr><td>SLG46620V block diagram (from device datasheet)</td></tr>
|
||||||
</tbody></table>
|
</tbody></table>
|
||||||
<p>
|
<p>
|
||||||
They're also tiny (the SLG46620V is a 20-pin 0.4mm pitch STQFN measuring 2x3 mm, and the lower gate count SLG46140V is a mere 1.6x2 mm) and probably the cheapest programmable logic device on the market - $0.50 in low volume and less than $0.40 in larger quantities.<br><br> The Vdd range of GreenPak4 is huge, more like what you'd expect from an MCU than an FPGA! It can run on anything from 1.8 to 5V, although performance is only specified at 1.8, 3.3, and 5V nominal voltages. There's also a dual-rail version that trades one of the GPIO pins for a second power supply pin, allowing you to interface to logic at two different voltage levels.<br><br> To support low-cost/space-constrained applications, they even have the configuration memory on die. It's one-time programmable and needs external Vpp to program (presumably Silego didn't want to waste die area on charge pumps that would only be used once) but has a SRAM programming mode for prototyping.<br><br> The best part is that the development software (GreenPak Designer) is free of charge and provided for all major operating systems including Linux! Unfortunately, the only supported design entry method is schematic entry and there's no way to write your design in a HDL.<br><br> While schematics may be fine for quick tinkering on really simple designs, they quickly get unwieldy. The nightmare of a circuit shown below is just a bunch of counters hooked up to LEDs that blink at various rates.</p>
|
They're also tiny (the SLG46620V is a 20-pin 0.4mm pitch STQFN measuring 2x3 mm, and the lower gate count SLG46140V is a mere 1.6x2 mm) and probably the cheapest programmable logic device on the market - $0.50 in low volume and less than $0.40 in larger quantities.<br><br> The Vdd range of GreenPak4 is huge, more like what you'd expect from an MCU than an FPGA! It can run on anything from 1.8 to 5V, although performance is only specified at 1.8, 3.3, and 5V nominal voltages. There's also a dual-rail version that trades one of the GPIO pins for a second power supply pin, allowing you to interface to logic at two different voltage levels.<br><br> To support low-cost/space-constrained applications, they even have the configuration memory on die. It's one-time programmable and needs external Vpp to program (presumably Silego didn't want to waste die area on charge pumps that would only be used once) but has a SRAM programming mode for prototyping.<br><br> The best part is that the development software (GreenPak Designer) is free of charge and provided for all major operating systems including Linux! Unfortunately, the only supported design entry method is schematic entry and there's no way to write your design in a HDL.<br><br> While schematics may be fine for quick tinkering on really simple designs, they quickly get unwieldy. The nightmare of a circuit shown below is just a bunch of counters hooked up to LEDs that blink at various rates.</p>
|
||||||
<table cellpadding="0" cellspacing="0"><tbody>
|
<table><tbody>
|
||||||
<tr><td><a href="https://1.bp.blogspot.com/-k3naUT3uXao/Vy7WFac246I/AAAAAAAAAw8/mePy_ostO8QJra5ZJrbP2WGhTlJ0B_r8gCLcB/s1600/schematic-from-hell.png" imageanchor="1" target="_blank"><img border="0" height="334" src="https://1.bp.blogspot.com/-k3naUT3uXao/Vy7WFac246I/AAAAAAAAAw8/mePy_ostO8QJra5ZJrbP2WGhTlJ0B_r8gCLcB/s640/schematic-from-hell.png" width="640"></a></td></tr>
|
<tr><td><a href="https://1.bp.blogspot.com/-k3naUT3uXao/Vy7WFac246I/AAAAAAAAAw8/mePy_ostO8QJra5ZJrbP2WGhTlJ0B_r8gCLcB/s1600/schematic-from-hell.png" imageanchor="1" target="_blank"><img height="334" src="https://1.bp.blogspot.com/-k3naUT3uXao/Vy7WFac246I/AAAAAAAAAw8/mePy_ostO8QJra5ZJrbP2WGhTlJ0B_r8gCLcB/s640/schematic-from-hell.png" width="640"></a></td></tr>
|
||||||
<tr><td>Schematic from hell!</td></tr>
|
<tr><td>Schematic from hell!</td></tr>
|
||||||
</tbody></table>
|
</tbody></table>
|
||||||
<p>
|
<p>
|
||||||
|
@ -19,8 +19,8 @@
|
||||||
<h2>
|
<h2>
|
||||||
Great! How does it work?</h2>
|
Great! How does it work?</h2>
|
||||||
<br><p> Rather than wasting time writing a synthesizer, I decided to write a GreenPak technology library for Clifford Wolf's excellent open source synthesis tool, <a href="http://www.clifford.at/yosys/" target="_blank">Yosys</a>, and then make a place-and-route tool to turn that into a final netlist. The post-PAR netlist can then be loaded into GreenPak Designer in order to program the device.<br><br> The first step of the process is to run the "synth_greenpak4" Yosys flow on the Verilog source. This runs a generic RTL synthesis pass, then some coarse-grained extraction passes to infer shift register and counter cells from behavioral logic, and finally maps the remaining logic to LUT/FF cells and outputs a JSON-formatted netlist.<br><br> Once the design has been synthesized, my tool (named, surprisingly, gp4par) is then launched on the netlist. It begins by parsing the JSON and constructing a directed graph of cell objects in memory. A second graph, containing all of the primitives in the device and the legal connections between them, is then created based on the device specified on the command line. (As of now only the SLG46620V is supported; the SLG46621V can be added fairly easily but the SLG46140V has a slightly different microarchitecture which will require a bit more work to support.)<br><br> After the graphs are generated, each node in the netlist graph is assigned a numeric label identifying the type of cell and each node in the device graph is assigned a list of legal labels: for example, an I/O buffer site is legal for an input buffer, output buffer, or bidirectional buffer.</p>
|
<br><p> Rather than wasting time writing a synthesizer, I decided to write a GreenPak technology library for Clifford Wolf's excellent open source synthesis tool, <a href="http://www.clifford.at/yosys/" target="_blank">Yosys</a>, and then make a place-and-route tool to turn that into a final netlist. The post-PAR netlist can then be loaded into GreenPak Designer in order to program the device.<br><br> The first step of the process is to run the "synth_greenpak4" Yosys flow on the Verilog source. This runs a generic RTL synthesis pass, then some coarse-grained extraction passes to infer shift register and counter cells from behavioral logic, and finally maps the remaining logic to LUT/FF cells and outputs a JSON-formatted netlist.<br><br> Once the design has been synthesized, my tool (named, surprisingly, gp4par) is then launched on the netlist. It begins by parsing the JSON and constructing a directed graph of cell objects in memory. A second graph, containing all of the primitives in the device and the legal connections between them, is then created based on the device specified on the command line. (As of now only the SLG46620V is supported; the SLG46621V can be added fairly easily but the SLG46140V has a slightly different microarchitecture which will require a bit more work to support.)<br><br> After the graphs are generated, each node in the netlist graph is assigned a numeric label identifying the type of cell and each node in the device graph is assigned a list of legal labels: for example, an I/O buffer site is legal for an input buffer, output buffer, or bidirectional buffer.</p>
|
||||||
<table cellpadding="0" cellspacing="0"><tbody>
|
<table><tbody>
|
||||||
<tr><td><a href="https://2.bp.blogspot.com/-kIekczO693g/Vy7dBqYifXI/AAAAAAAAAxc/hMNJBs5bedIQOrBzzkhq4gbmhR-n58EQwCLcB/s1600/graph-labels.png" imageanchor="1" target="_blank"><img border="0" height="141" src="https://2.bp.blogspot.com/-kIekczO693g/Vy7dBqYifXI/AAAAAAAAAxc/hMNJBs5bedIQOrBzzkhq4gbmhR-n58EQwCLcB/s400/graph-labels.png" width="400"></a></td></tr>
|
<tr><td><a href="https://2.bp.blogspot.com/-kIekczO693g/Vy7dBqYifXI/AAAAAAAAAxc/hMNJBs5bedIQOrBzzkhq4gbmhR-n58EQwCLcB/s1600/graph-labels.png" imageanchor="1" target="_blank"><img height="141" src="https://2.bp.blogspot.com/-kIekczO693g/Vy7dBqYifXI/AAAAAAAAAxc/hMNJBs5bedIQOrBzzkhq4gbmhR-n58EQwCLcB/s400/graph-labels.png" width="400"></a></td></tr>
|
||||||
<tr><td>Example labeling for a subset of the netlist and device graphs</td></tr>
|
<tr><td>Example labeling for a subset of the netlist and device graphs</td></tr>
|
||||||
</tbody></table>
|
</tbody></table>
|
||||||
<p>
|
<p>
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
<article><DIV id="readability-page-1"><DIV>
|
<article><DIV id="readability-page-1">
|
||||||
<h3>Study Webtext</h3>
|
<h3>Study Webtext</h3>
|
||||||
<h2>
|
<h2>
|
||||||
<span face="Lucida Handwriting " color="Maroon
|
<span face="Lucida Handwriting " color="Maroon
|
||||||
|
@ -1366,4 +1366,4 @@
|
||||||
errands of life, these letters speed to death.</a></p>
|
errands of life, these letters speed to death.</a></p>
|
||||||
<p><a href="javascript:void(0);" onmouseout="nd();" target="_blank">Ah
|
<p><a href="javascript:void(0);" onmouseout="nd();" target="_blank">Ah
|
||||||
Bartleby! Ah humanity</a>!</p>
|
Bartleby! Ah humanity</a>!</p>
|
||||||
</DIV></DIV></article>
|
</DIV></article>
|
||||||
|
|
107
resources/tests/readability/hukumusume/expected.html
Normal file
107
resources/tests/readability/hukumusume/expected.html
Normal file
|
@ -0,0 +1,107 @@
|
||||||
|
<article><DIV id="readability-page-1">
|
||||||
|
<td>
|
||||||
|
<table><tbody><tr>
|
||||||
|
<td><img src="http://fakehost/366/logo_bana/corner_1.gif" width="7" height="7"></td>
|
||||||
|
<td></td>
|
||||||
|
<td><img src="http://fakehost/366/logo_bana/corner_2.gif" width="7" height="7"></td>
|
||||||
|
</tr></tbody></table>
|
||||||
|
<table><tbody>
|
||||||
|
<tr><td></td></tr>
|
||||||
|
<tr><td></td></tr>
|
||||||
|
</tbody></table>
|
||||||
|
</td>
|
||||||
|
<td>
|
||||||
|
<p><a href="http://fakehost/index.html" target="_blank">福娘童話集</a> > <a href="http://fakehost/test/index.html" target="_blank">きょうのイソップ童話</a> > <a href="http://fakehost/test/itiran/01gatu.htm" target="_blank">1月のイソップ童話</a> > 欲張りなイヌ
|
||||||
|
</p>
|
||||||
|
<p><span color="#FF0000" size="+2">元旦のイソップ童話</span><br><br><br><br><img src="http://fakehost/gazou/pc_gazou/aesop/aesop052.jpg" alt="よくばりなイヌ" width="480" height="360"><br><br><br><br>
|
||||||
|
欲張りなイヌ<br><br><br><br><a href="http://hukumusume.com/douwa/English/aesop/01/01_j.html" target="_blank">ひらがな</a> ←→ <a href="http://hukumusume.com/douwa/English/aesop/01/01_j&E.html" target="_blank">日本語・英語</a> ←→ <a href="http://hukumusume.com/douwa/English/aesop/01/01_E.html" target="_blank">English</a></p>
|
||||||
|
<DIV><table><tbody>
|
||||||
|
<tr>
|
||||||
|
<td><img src="http://fakehost/366/logo_bana/corner_1.gif" width="7" height="7"></td>
|
||||||
|
<td><span color="#FF0000"><b>おりがみをつくろう</b></span></td>
|
||||||
|
<td><span size="-1">( <a href="http://www.origami-club.com/index.html" target="_blank">おりがみくらぶ</a> より)</span></td>
|
||||||
|
<td><img src="http://fakehost/366/logo_bana/corner_2.gif" width="7" height="7"></td>
|
||||||
|
</tr>
|
||||||
|
<tr><td colspan="4"><P>
|
||||||
|
<a href="http://www.origami-club.com/easy/dogfase/index.html" target="_blank"><span size="+2"><img src="http://fakehost/gazou/origami_gazou/kantan/dogface.gif" alt="犬の顔の折り紙" width="73" height="51">いぬのかお</span></a><a href="http://www.origami-club.com/easy/dog/index.html" target="_blank"><img src="http://fakehost/gazou/origami_gazou/kantan/dog.gif" alt="犬の顔の紙" width="62" height="43"><span size="+2">いぬ</span></a>
|
||||||
|
</P></td></tr>
|
||||||
|
</tbody></table></DIV>
|
||||||
|
<table><tbody>
|
||||||
|
<tr><td>
|
||||||
|
♪音声配信(html5)
|
||||||
|
</td></tr>
|
||||||
|
<tr><td><audio src="http://ohanashi2.up.seesaa.net/mp3/ae_0101.mp3" controls=""></audio></td></tr>
|
||||||
|
<tr><td><a href="http://www.voiceblog.jp/onokuboaki/" target="_blank"><span size="-1">亜姫の朗読☆ イソップ童話より</span></a></td></tr>
|
||||||
|
</tbody></table>
|
||||||
|
<p>
|
||||||
|
肉をくわえたイヌが、橋を渡っていました。 ふと下を見ると、川の中にも肉をくわえたイヌがいます。 イヌはそれを見て、思いました。(あいつの肉の方が、大きそうだ) イヌは、くやしくてたまりません。 (そうだ、あいつをおどかして、あの肉を取ってやろう) そこでイヌは、川の中のイヌに向かって思いっきり吠えました。 「ウゥー、ワン!!」 そのとたん、くわえていた肉はポチャンと川の中に落ちてしまいました。 「ああー、ぁぁー」 川の中には、がっかりしたイヌの顔がうつっています。 さっきの川の中のイヌは、水にうつった自分の顔だったのです。 同じ物を持っていても、人が持っている物の方が良く見え、また、欲張るとけっきょく損をするというお話しです。
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
おしまい
|
||||||
|
</p>
|
||||||
|
<p><a href="javascript:history.back();" onmouseout="MM_swapImgRestore()" onmouseover="MM_swapImage('Image10','','../../../gazou/pc_gazou/all/top_bana/back_logo_b.gif',1)" target="_blank"><img src="http://fakehost/gazou/pc_gazou/all/top_bana/back_logo_r.gif" alt="前のページへ戻る" name="Image10" width="175" height="32" id="Image10"></a><br><br><br><br></p>
|
||||||
|
</td>
|
||||||
|
<td><img src="file:///C:/Documents%20and%20Settings/%E7%A6%8F%E5%A8%98note/%E3%83%87%E3%82%B9%E3%82%AF%E3%83%88%E3%83%83%E3%83%97/company_website15/image/spacer.gif" width="1" height="1"></td>
|
||||||
|
<td>
|
||||||
|
<table><tbody><tr>
|
||||||
|
<td><img src="http://fakehost/366/logo_bana/corner_1.gif" width="7" height="7"></td>
|
||||||
|
<td></td>
|
||||||
|
<td><img src="http://fakehost/366/logo_bana/corner_2.gif" width="7" height="7"></td>
|
||||||
|
</tr></tbody></table>
|
||||||
|
<table><tbody>
|
||||||
|
<tr><td>
|
||||||
|
<span size="-1"><b>1月 1日の豆知識</b></span><br><br><span size="-2"><u><br><br>
|
||||||
|
366日への旅</u></span>
|
||||||
|
</td></tr>
|
||||||
|
<tr><td>
|
||||||
|
<img src="file:///C:/Documents%20and%20Settings/%E7%A6%8F%E5%A8%98note/%E3%83%87%E3%82%B9%E3%82%AF%E3%83%88%E3%83%83%E3%83%97" width="1" height="1"><b><span size="-1">きょうの記念日</span></b><br><br><a href="http://fakehost/366/kinenbi/pc/01gatu/1_01.htm" target="_blank"><span size="-1">元旦</span></a>
|
||||||
|
</td></tr>
|
||||||
|
<tr><td>
|
||||||
|
<img src="file:///C:/Documents%20and%20Settings/%E7%A6%8F%E5%A8%98note/%E3%83%87%E3%82%B9%E3%82%AF%E3%83%88%E3%83%83%E3%83%97/company_website15/image/spacer.gif" width="1" height="1"><b><span size="-1">きょうの誕生花</span></b><br><br><a href="http://fakehost/366/hana/pc/01gatu/1_01.htm" target="_blank"><span size="-1">松(まつ)</span></a>
|
||||||
|
</td></tr>
|
||||||
|
<tr><td>
|
||||||
|
<b><span size="-1">きょうの誕生日・出来事</span></b><br><br><a href="http://fakehost/366/birthday/pc/01gatu/1_01.htm" target="_blank"><span size="-1">1949年 Mr.マリック(マジシャン)</span></a>
|
||||||
|
</td></tr>
|
||||||
|
<tr><td>
|
||||||
|
<b><span size="-1">恋の誕生日占い</span></b><br><br><a href="http://fakehost/sakura/uranai/birthday/01/01.html" target="_blank"><span size="-1">自分の考えをしっかりと持った女の子。</span></a>
|
||||||
|
</td></tr>
|
||||||
|
<tr><td>
|
||||||
|
<b><span size="-1">なぞなぞ小学校</span></b><br><br><a href="http://fakehost/nazonazo/new/2012/04/02.html" target="_blank"><span size="-1">○(丸)を取ったらお母さんになってしまう男の人は?</span></a>
|
||||||
|
</td></tr>
|
||||||
|
<tr><td>
|
||||||
|
<b><span size="-1">あこがれの職業紹介</span></b><br><br><a href="http://fakehost/sakura/navi/work/2017/041.html" target="_blank"><span size="-1">歌手</span></a>
|
||||||
|
</td></tr>
|
||||||
|
<tr><td>
|
||||||
|
<b><span size="-1">恋の魔法とおまじない</span></b> 001<br><br><a href="http://fakehost/omajinai/new/2012/00/re01.html" target="_blank"><span size="-1">両思いになれる おまじない</span></a>
|
||||||
|
</td></tr>
|
||||||
|
<tr><td><span size="-1"><b>1月 1日の童話・昔話</b><br><br><u><span size="-2"><br><br>
|
||||||
|
福娘童話集</span></u></span></td></tr>
|
||||||
|
<tr><td>
|
||||||
|
<b><span size="-1">きょうの日本昔話</span></b><br><br><a href="http://fakehost/douwa/pc/jap/01/01.htm" target="_blank"><span size="-1">ネコがネズミを追いかける訳</span></a>
|
||||||
|
</td></tr>
|
||||||
|
<tr><td>
|
||||||
|
<b><span size="-1">きょうの世界昔話<img src="file:///C:/Documents%20and%20Settings/%E7%A6%8F%E5%A8%98note/%E3%83%87%E3%82%B9%E3%82%AF%E3%83%88%E3%83%83%E3%83%97/company_website15/image/spacer.gif" width="1" height="1"></span></b><br><br><a href="http://fakehost/douwa/pc/world/01/01a.htm" target="_blank"><span size="-1">モンゴルの十二支話</span></a>
|
||||||
|
</td></tr>
|
||||||
|
<tr><td>
|
||||||
|
<img src="file:///C:/Documents%20and%20Settings/%E7%A6%8F%E5%A8%98note/%E3%83%87%E3%82%B9%E3%82%AF%E3%83%88%E3%83%83%E3%83%97/company_website15/image/spacer.gif" width="1" height="1"><b><span size="-1">きょうの日本民話</span></b><br><br><a href="http://fakehost/douwa/pc/minwa/01/01c.html" target="_blank"><span size="-1">仕事の取替えっこ</span></a>
|
||||||
|
</td></tr>
|
||||||
|
<tr><td>
|
||||||
|
<b><span size="-1">きょうのイソップ童話</span></b><br><br><a href="http://fakehost/douwa/pc/aesop/01/01.htm" target="_blank"><span size="-1">欲張りなイヌ</span></a>
|
||||||
|
</td></tr>
|
||||||
|
<tr><td>
|
||||||
|
<b><span size="-1">きょうの江戸小話</span></b><br><br><a href="http://fakehost/douwa/pc/kobanashi/01/01.htm" target="_blank"><span size="-1">ぞうきんとお年玉</span></a>
|
||||||
|
</td></tr>
|
||||||
|
<tr><td>
|
||||||
|
<b><span size="-1">きょうの百物語</span></b><br><br><a href="http://fakehost/douwa/pc/kaidan/01/01.htm" target="_blank"><span size="-1">百物語の幽霊</span></a>
|
||||||
|
</td></tr>
|
||||||
|
</tbody></table>
|
||||||
|
<table><tbody>
|
||||||
|
<tr><td><b><span size="-1">福娘のサイト</span></b></td></tr>
|
||||||
|
<tr><td><span size="-1"><b>366日への旅</b><br><br><a href="http://hukumusume.com/366/" target="_blank">毎日の記念日・誕生花 ・有名人の誕生日と性格判断</a></span></td></tr>
|
||||||
|
<tr><td><span size="-1"><b>福娘童話集</b><br><br><a href="http://hukumusume.com/douwa/" target="_blank">世界と日本の童話と昔話</a></span></td></tr>
|
||||||
|
<tr><td><span size="-1"><b>女の子応援サイト -さくら-</b><br><br><a href="http://hukumusume.com/sakura/index.html" target="_blank">誕生日占い、お仕事紹介、おまじない、など</a></span></td></tr>
|
||||||
|
<tr><td><span size="-1"><b>子どもの病気相談所</b><br><br><a href="http://hukumusume.com/my_baby/sick/" target="_blank">病気検索と対応方法、症状から検索するWEB問診</a></span></td></tr>
|
||||||
|
<tr><td><span size="-1"><b>世界60秒巡り</b><br><br><a href="http://hukumusume.com/366/world/" target="_blank">国旗国歌や世界遺産など、世界の国々の豆知識</a></span></td></tr>
|
||||||
|
</tbody></table>
|
||||||
|
</td>
|
||||||
|
</DIV></article>
|
356
resources/tests/readability/hukumusume/source.html
Normal file
356
resources/tests/readability/hukumusume/source.html
Normal file
|
@ -0,0 +1,356 @@
|
||||||
|
<!DOCTYPE html>
|
||||||
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
||||||
|
<head>
|
||||||
|
<title>
|
||||||
|
欲張りなイヌ <福娘童話集 きょうのイソップ童話>
|
||||||
|
</title>
|
||||||
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
|
||||||
|
</head>
|
||||||
|
<body bgcolor="#FFFFFF" text="#000000">
|
||||||
|
<table width="969" border="0" align="center" cellpadding="0" cellspacing="0">
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<td colspan="5" height="12">
|
||||||
|
<div align="center">
|
||||||
|
<table width="100%" border="0">
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<td width="27%" align="center">
|
||||||
|
<a href="../../../index.html">福娘童話集</a> > <a href="../index.html">きょうのイソップ童話</a>
|
||||||
|
</td>
|
||||||
|
<td width="46%" align="center">
|
||||||
|
<a href="http://hukumusume.com/douwa/pc/aesop/index.html"><img src="../../../gazou/pc_gazou/all/aesop_logo_llll.gif" alt="福娘童話集 きょうのイソップ童話" width="320" height="100" border="0" /></a>
|
||||||
|
</td>
|
||||||
|
<td width="27%" align="center" valign="bottom">
|
||||||
|
<a href="http://hukumusume.com/douwa/index.html"><img src="../../../gazou/pc_gazou/all/douwa_logo_top_.gif" alt="童話・昔話・おとぎ話の福娘童話集" width="170" height="50" border="0" /></a>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
</div>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td width="166" height="830" valign="top">
|
||||||
|
<table width="166" border="0" cellpadding="0" cellspacing="0" bgcolor="#C8FFC8">
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<td height="7" valign="top">
|
||||||
|
<img src="../../../../366/logo_bana/corner_1.gif" width="7" height="7" />
|
||||||
|
</td>
|
||||||
|
<td></td>
|
||||||
|
<td align="right" valign="top">
|
||||||
|
<img src="../../../../366/logo_bana/corner_2.gif" width="7" height="7" />
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
<table width="166" border="0" cellpadding="0" cellspacing="0">
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<td width="310" height="610" align="center" bgcolor="#C8FFC8">
|
||||||
|
<script type="text/javascript">
|
||||||
|
//<![CDATA[
|
||||||
|
<!--
|
||||||
|
google_ad_client = "ca-pub-2746615155806331";
|
||||||
|
/* 1a月160x600 */
|
||||||
|
google_ad_slot = "0764542773";
|
||||||
|
google_ad_width = 160;
|
||||||
|
google_ad_height = 600;
|
||||||
|
//-->
|
||||||
|
//]]>
|
||||||
|
</script>
|
||||||
|
<script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"></script>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="center">
|
||||||
|
 
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
</td>
|
||||||
|
<td width="619" valign="top">
|
||||||
|
<p align="center">
|
||||||
|
<a href="../../../index.html">福娘童話集</a> > <a href="../index.html">きょうのイソップ童話</a> > <a href="../itiran/01gatu.htm">1月のイソップ童話</a> > 欲張りなイヌ
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<font color="#FF0000" size="+2">元旦のイソップ童話</font><br />
|
||||||
|
<br />
|
||||||
|
<br />
|
||||||
|
<br />
|
||||||
|
<img src="../../../gazou/pc_gazou/aesop/aesop052.jpg" alt="よくばりなイヌ" width="480" height="360" border="1" /><br />
|
||||||
|
<br />
|
||||||
|
<br />
|
||||||
|
<br />
|
||||||
|
欲張りなイヌ<br />
|
||||||
|
<br />
|
||||||
|
<br />
|
||||||
|
<br />
|
||||||
|
<a href="http://hukumusume.com/douwa/English/aesop/01/01_j.html">ひらがな</a> ←→ <a href="http://hukumusume.com/douwa/English/aesop/01/01_j&E.html">日本語・英語</a> ←→ <a href="http://hukumusume.com/douwa/English/aesop/01/01_E.html">English</a>
|
||||||
|
</p>
|
||||||
|
<table width="100%" border="0" cellspacing="0" cellpadding="0">
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<td height="90" align="center">
|
||||||
|
<table width="80%" border="0" cellpadding="0" cellspacing="0" bgcolor="#C8FFC8">
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<td width="35%" height="25" valign="top">
|
||||||
|
<img src="../../../../366/logo_bana/corner_1.gif" width="7" height="7" />
|
||||||
|
</td>
|
||||||
|
<td width="29%" align="center">
|
||||||
|
<font color="#FF0000"><b>おりがみをつくろう</b></font>
|
||||||
|
</td>
|
||||||
|
<td width="35%" valign="bottom">
|
||||||
|
<font size="-1">( <a href="http://www.origami-club.com/index.html">おりがみくらぶ</a> より)</font>
|
||||||
|
</td>
|
||||||
|
<td width="1%" align="right" valign="top">
|
||||||
|
<img src="../../../../366/logo_bana/corner_2.gif" width="7" height="7" />
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td height="80" colspan="4" align="center" valign="top">
|
||||||
|
<table width="98%" border="0" cellspacing="0" cellpadding="0">
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<td height="75" align="center" valign="middle" bgcolor="#ECFFEC">
|
||||||
|
<a href="http://www.origami-club.com/easy/dogfase/index.html"><font size="+2"><img src="../../../gazou/origami_gazou/kantan/dogface.gif" alt="犬の顔の折り紙" width="73" height="51" border="0" />いぬのかお</font></a> <a href="http://www.origami-club.com/easy/dog/index.html"><img src="../../../gazou/origami_gazou/kantan/dog.gif" alt="犬の顔の紙" width="62" height="43" border="0" /><font size="+2">いぬ</font></a>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
<table width="100%" border="0">
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<td align="center">
|
||||||
|
♪音声配信(html5)
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="center">
|
||||||
|
<audio src="http://ohanashi2.up.seesaa.net/mp3/ae_0101.mp3" controls=""></audio>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="center">
|
||||||
|
<a href="http://www.voiceblog.jp/onokuboaki/"><font size="-1">亜姫の朗読☆ イソップ童話より</font></a>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
<p>
|
||||||
|
肉をくわえたイヌが、橋を渡っていました。 ふと下を見ると、川の中にも肉をくわえたイヌがいます。 イヌはそれを見て、思いました。(あいつの肉の方が、大きそうだ) イヌは、くやしくてたまりません。 (そうだ、あいつをおどかして、あの肉を取ってやろう) そこでイヌは、川の中のイヌに向かって思いっきり吠えました。 「ウゥー、ワン!!」 そのとたん、くわえていた肉はポチャンと川の中に落ちてしまいました。 「ああー、ぁぁー」 川の中には、がっかりしたイヌの顔がうつっています。 さっきの川の中のイヌは、水にうつった自分の顔だったのです。 同じ物を持っていても、人が持っている物の方が良く見え、また、欲張るとけっきょく損をするというお話しです。
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
おしまい
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<a href="javascript:history.back();" onmouseout="MM_swapImgRestore()" onmouseover="MM_swapImage('Image10','','../../../gazou/pc_gazou/all/top_bana/back_logo_b.gif',1)"><img src="../../../gazou/pc_gazou/all/top_bana/back_logo_r.gif" alt="前のページへ戻る" name="Image10" width="175" height="32" border="0" id="Image10" /></a><br />
|
||||||
|
<br />
|
||||||
|
<br />
|
||||||
|
<br />
|
||||||
|
<script type="text/javascript">
|
||||||
|
//<![CDATA[
|
||||||
|
|
||||||
|
<!--
|
||||||
|
google_ad_client = "ca-pub-2746615155806331";
|
||||||
|
/* 1月336x280 */
|
||||||
|
google_ad_slot = "6046482409";
|
||||||
|
google_ad_width = 336;
|
||||||
|
google_ad_height = 280;
|
||||||
|
//-->
|
||||||
|
//]]>
|
||||||
|
</script>
|
||||||
|
<script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"></script>
|
||||||
|
</p>
|
||||||
|
</td>
|
||||||
|
<td width="10">
|
||||||
|
<img src="file:///C|/Documents%20and%20Settings/%E7%A6%8F%E5%A8%98note/%E3%83%87%E3%82%B9%E3%82%AF%E3%83%88%E3%83%83%E3%83%97/company_website15/image/spacer.gif" width="1" height="1" />
|
||||||
|
</td>
|
||||||
|
<td width="166" valign="top">
|
||||||
|
<table width="100%" border="0" cellpadding="0" cellspacing="0" bgcolor="#C8FFC8">
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<td height="7" valign="top">
|
||||||
|
<img src="../../../../366/logo_bana/corner_1.gif" width="7" height="7" />
|
||||||
|
</td>
|
||||||
|
<td></td>
|
||||||
|
<td align="right" valign="top">
|
||||||
|
<img src="../../../../366/logo_bana/corner_2.gif" width="7" height="7" />
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
<table width="166" border="0" bgcolor="#C8FFC8">
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<td width="156" height="50">
|
||||||
|
     <font size="-1"><b>1月 1日の豆知識</b></font><br />
|
||||||
|
<br />
|
||||||
|
<font size="-2"><u><br />
|
||||||
|
<br />
|
||||||
|
366日への旅</u></font>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td height="50" bgcolor="#ECFFEC">
|
||||||
|
<img src="file:///C|/Documents%20and%20Settings/%E7%A6%8F%E5%A8%98note/%E3%83%87%E3%82%B9%E3%82%AF%E3%83%88%E3%83%83%E3%83%97" width="1" height="1" /><b><font size="-1">きょうの記念日</font></b><br />
|
||||||
|
<br />
|
||||||
|
<a href="../../../../366/kinenbi/pc/01gatu/1_01.htm"><font size="-1">元旦</font></a>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td height="50" bgcolor="#ECFFEC">
|
||||||
|
<img src="file:///C|/Documents%20and%20Settings/%E7%A6%8F%E5%A8%98note/%E3%83%87%E3%82%B9%E3%82%AF%E3%83%88%E3%83%83%E3%83%97/company_website15/image/spacer.gif" width="1" height="1" /><b><font size="-1">きょうの誕生花</font></b><br />
|
||||||
|
<br />
|
||||||
|
<a href="../../../../366/hana/pc/01gatu/1_01.htm"><font size="-1">松(まつ)</font></a>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td height="50" bgcolor="#ECFFEC">
|
||||||
|
<b><font size="-1">きょうの誕生日・出来事</font></b><br />
|
||||||
|
<br />
|
||||||
|
<a href="../../../../366/birthday/pc/01gatu/1_01.htm"><font size="-1">1949年 Mr.マリック(マジシャン)</font></a>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td height="50" bgcolor="#ECFFEC">
|
||||||
|
<b><font size="-1">恋の誕生日占い</font></b><br />
|
||||||
|
<br />
|
||||||
|
<a href="../../../../sakura/uranai/birthday/01/01.html"><font size="-1">自分の考えをしっかりと持った女の子。</font></a>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td height="50" bgcolor="#ECFFEC">
|
||||||
|
<b><font size="-1">なぞなぞ小学校</font></b><br />
|
||||||
|
<br />
|
||||||
|
<a href="../../../../nazonazo/new/2012/04/02.html"><font size="-1">○(丸)を取ったらお母さんになってしまう男の人は?</font></a>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td height="50" bgcolor="#ECFFEC">
|
||||||
|
<b><font size="-1">あこがれの職業紹介</font></b><br />
|
||||||
|
<br />
|
||||||
|
<a href="../../../../sakura/navi/work/2017/041.html"><font size="-1">歌手</font></a>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td height="50" bgcolor="#ECFFEC">
|
||||||
|
<b><font size="-1">恋の魔法とおまじない</font></b> 001<br />
|
||||||
|
<br />
|
||||||
|
<a href="../../../../omajinai/new/2012/00/re01.html"><font size="-1">両思いになれる おまじない</font></a>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td height="50" bgcolor="#C8FFC8">
|
||||||
|
<font size="-1"> <b>1月 1日の童話・昔話</b><br />
|
||||||
|
<br />
|
||||||
|
<u><font size="-2"><br />
|
||||||
|
<br />
|
||||||
|
福娘童話集</font></u></font>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td height="50" bgcolor="#ECFFEC">
|
||||||
|
<b><font size="-1">きょうの日本昔話</font></b><br />
|
||||||
|
<br />
|
||||||
|
<a href="../../../../douwa/pc/jap/01/01.htm"><font size="-1">ネコがネズミを追いかける訳</font></a>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td height="50" bgcolor="#ECFFEC">
|
||||||
|
<b><font size="-1">きょうの世界昔話<img src="file:///C|/Documents%20and%20Settings/%E7%A6%8F%E5%A8%98note/%E3%83%87%E3%82%B9%E3%82%AF%E3%83%88%E3%83%83%E3%83%97/company_website15/image/spacer.gif" width="1" height="1" /></font></b><br />
|
||||||
|
<br />
|
||||||
|
<a href="../../../../douwa/pc/world/01/01a.htm"><font size="-1">モンゴルの十二支話</font></a>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td height="50" bgcolor="#ECFFEC">
|
||||||
|
<img src="file:///C|/Documents%20and%20Settings/%E7%A6%8F%E5%A8%98note/%E3%83%87%E3%82%B9%E3%82%AF%E3%83%88%E3%83%83%E3%83%97/company_website15/image/spacer.gif" width="1" height="1" /><b><font size="-1">きょうの日本民話</font></b><br />
|
||||||
|
<br />
|
||||||
|
<a href="../../../../douwa/pc/minwa/01/01c.html"><font size="-1">仕事の取替えっこ</font></a>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td height="50" bgcolor="#ECFFEC">
|
||||||
|
<b><font size="-1">きょうのイソップ童話</font></b><br />
|
||||||
|
<br />
|
||||||
|
<a href="../../../../douwa/pc/aesop/01/01.htm"><font size="-1">欲張りなイヌ</font></a>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td height="50" bgcolor="#ECFFEC">
|
||||||
|
<b><font size="-1">きょうの江戸小話</font></b><br />
|
||||||
|
<br />
|
||||||
|
<a href="../../../../douwa/pc/kobanashi/01/01.htm"><font size="-1">ぞうきんとお年玉</font></a>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td height="50" bgcolor="#ECFFEC">
|
||||||
|
<b><font size="-1">きょうの百物語</font></b><br />
|
||||||
|
<br />
|
||||||
|
<a href="../../../../douwa/pc/kaidan/01/01.htm"><font size="-1">百物語の幽霊</font></a>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
<table width="100%" border="0" bgcolor="#C8FFC8">
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<td height="30" align="center" bgcolor="#C8FFC8">
|
||||||
|
<b><font size="-1">福娘のサイト</font></b>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td height="60" bgcolor="#ECFFEC">
|
||||||
|
<font size="-1"><b>366日への旅</b><br />
|
||||||
|
<br />
|
||||||
|
<a href="http://hukumusume.com/366/">毎日の記念日・誕生花 ・有名人の誕生日と性格判断</a></font>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td height="60" bgcolor="#ECFFEC">
|
||||||
|
<font size="-1"><b>福娘童話集</b><br />
|
||||||
|
<br />
|
||||||
|
<a href="http://hukumusume.com/douwa/">世界と日本の童話と昔話</a></font>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td height="60" bgcolor="#ECFFEC">
|
||||||
|
<font size="-1"><b>女の子応援サイト -さくら-</b><br />
|
||||||
|
<br />
|
||||||
|
<a href="http://hukumusume.com/sakura/index.html">誕生日占い、お仕事紹介、おまじない、など</a></font>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td height="60" bgcolor="#ECFFEC">
|
||||||
|
<font size="-1"><b>子どもの病気相談所</b><br />
|
||||||
|
<br />
|
||||||
|
<a href="http://hukumusume.com/my_baby/sick/">病気検索と対応方法、症状から検索するWEB問診</a></font>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td height="60" bgcolor="#ECFFEC">
|
||||||
|
<font size="-1"><b>世界60秒巡り</b><br />
|
||||||
|
<br />
|
||||||
|
<a href="http://hukumusume.com/366/world/">国旗国歌や世界遺産など、世界の国々の豆知識</a></font>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
</body>
|
||||||
|
</html>
|
|
@ -73,7 +73,23 @@ pub const UNLIKELY_ROLES: &[&str] = &[
|
||||||
|
|
||||||
pub const DEFAULT_TAGS_TO_SCORE: &[&str] =
|
pub const DEFAULT_TAGS_TO_SCORE: &[&str] =
|
||||||
&["SECTION", "H2", "H3", "H4", "H5", "H6", "P", "TD", "PRE"];
|
&["SECTION", "H2", "H3", "H4", "H5", "H6", "P", "TD", "PRE"];
|
||||||
pub static DIV_TO_P_ELEMS: Lazy<HashSet<&'static str>> = Lazy::new(|| {
|
pub const DEPRECATED_SIZE_ATTRIBUTE_ELEMS: Lazy<HashSet<&str>> =
|
||||||
|
Lazy::new(|| HashSet::from(["TABLE", "TH", "TD", "HR", "PRE"]));
|
||||||
|
pub const PRESENTATIONAL_ATTRIBUTES: &[&str] = &[
|
||||||
|
"align",
|
||||||
|
"background",
|
||||||
|
"bgcolor",
|
||||||
|
"border",
|
||||||
|
"cellpadding",
|
||||||
|
"cellspacing",
|
||||||
|
"frame",
|
||||||
|
"hspace",
|
||||||
|
"rules",
|
||||||
|
"style",
|
||||||
|
"valign",
|
||||||
|
"vspace",
|
||||||
|
];
|
||||||
|
pub static DIV_TO_P_ELEMS: Lazy<HashSet<&str>> = Lazy::new(|| {
|
||||||
HashSet::from([
|
HashSet::from([
|
||||||
"BLOCKQUOTE",
|
"BLOCKQUOTE",
|
||||||
"DL",
|
"DL",
|
||||||
|
|
|
@ -19,7 +19,6 @@ use fingerprints::Fingerprints;
|
||||||
use libxml::parser::Parser;
|
use libxml::parser::Parser;
|
||||||
use libxml::tree::{Document, Node};
|
use libxml::tree::{Document, Node};
|
||||||
use libxml::xpath::Context;
|
use libxml::xpath::Context;
|
||||||
use log::{debug, error, info, warn};
|
|
||||||
use reqwest::header::HeaderMap;
|
use reqwest::header::HeaderMap;
|
||||||
use reqwest::{Client, Url};
|
use reqwest::{Client, Url};
|
||||||
use std::path::Path;
|
use std::path::Path;
|
||||||
|
@ -42,7 +41,7 @@ impl FullTextParser {
|
||||||
) -> Result<Article, FullTextParserError> {
|
) -> Result<Article, FullTextParserError> {
|
||||||
libxml::tree::node::set_node_rc_guard(10);
|
libxml::tree::node::set_node_rc_guard(10);
|
||||||
|
|
||||||
info!("Scraping article: '{}'", url.as_str());
|
log::debug!("Scraping article: '{url}'");
|
||||||
|
|
||||||
// check if we have a config for the url
|
// check if we have a config for the url
|
||||||
let config = self.get_grabber_config(url);
|
let config = self.get_grabber_config(url);
|
||||||
|
@ -58,14 +57,14 @@ impl FullTextParser {
|
||||||
.headers(headers)
|
.headers(headers)
|
||||||
.send()
|
.send()
|
||||||
.await
|
.await
|
||||||
.map_err(|err| {
|
.map_err(|error| {
|
||||||
error!("Failed head request to: '{}' - '{}'", url.as_str(), err);
|
log::error!("Failed head request to: '{url}' - '{error}'");
|
||||||
FullTextParserError::Http
|
FullTextParserError::Http
|
||||||
})?;
|
})?;
|
||||||
|
|
||||||
// check if url redirects and we need to pick up the new url
|
// check if url redirects and we need to pick up the new url
|
||||||
let url = if let Some(new_url) = Util::check_redirect(&response, url) {
|
let url = if let Some(new_url) = Util::check_redirect(&response, url) {
|
||||||
debug!("Url '{}' redirects to '{}'", url.as_str(), new_url.as_str());
|
log::debug!("Url '{url}' redirects to '{new_url}'");
|
||||||
new_url
|
new_url
|
||||||
} else {
|
} else {
|
||||||
url.clone()
|
url.clone()
|
||||||
|
@ -117,16 +116,18 @@ impl FullTextParser {
|
||||||
.await?;
|
.await?;
|
||||||
|
|
||||||
let context = Context::new(&document).map_err(|()| {
|
let context = Context::new(&document).map_err(|()| {
|
||||||
error!("Failed to create xpath context for extracted article");
|
log::error!("Failed to create xpath context for extracted article");
|
||||||
FullTextParserError::Xml
|
FullTextParserError::Xml
|
||||||
})?;
|
})?;
|
||||||
|
|
||||||
if let Err(error) = Self::prevent_self_closing_tags(&context) {
|
if let Err(error) = Self::prevent_self_closing_tags(&context) {
|
||||||
error!("Preventing self closing tags failed - '{}'", error);
|
log::error!("Preventing self closing tags failed - '{error}'");
|
||||||
return Err(error);
|
return Err(error);
|
||||||
}
|
}
|
||||||
|
|
||||||
Self::post_process_content(&document)?;
|
if let Some(mut root) = document.get_root_element() {
|
||||||
|
Self::post_process_content(&mut root, false)?;
|
||||||
|
}
|
||||||
|
|
||||||
article.document = Some(document);
|
article.document = Some(document);
|
||||||
|
|
||||||
|
@ -151,14 +152,14 @@ impl FullTextParser {
|
||||||
global_config.single_page_link.as_deref(),
|
global_config.single_page_link.as_deref(),
|
||||||
);
|
);
|
||||||
if let Some(xpath_single_page_link) = rule {
|
if let Some(xpath_single_page_link) = rule {
|
||||||
debug!(
|
log::debug!(
|
||||||
"Single page link xpath specified in config '{}'",
|
"Single page link xpath specified in config '{}'",
|
||||||
xpath_single_page_link
|
xpath_single_page_link
|
||||||
);
|
);
|
||||||
|
|
||||||
if let Some(single_page_url) = Util::find_page_url(&xpath_ctx, xpath_single_page_link) {
|
if let Some(single_page_url) = Util::find_page_url(&xpath_ctx, xpath_single_page_link) {
|
||||||
// parse again with single page url
|
// parse again with single page url
|
||||||
debug!("Single page link found '{}'", single_page_url);
|
log::debug!("Single page link found '{}'", single_page_url);
|
||||||
|
|
||||||
if let Err(error) = self
|
if let Err(error) = self
|
||||||
.parse_single_page(
|
.parse_single_page(
|
||||||
|
@ -171,8 +172,8 @@ impl FullTextParser {
|
||||||
)
|
)
|
||||||
.await
|
.await
|
||||||
{
|
{
|
||||||
log::warn!("Single Page parsing: {}", error);
|
log::warn!("Single Page parsing: {error}");
|
||||||
log::debug!("Continuing with regular parser.");
|
log::info!("Continuing with regular parser.");
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -181,26 +182,35 @@ impl FullTextParser {
|
||||||
if article.thumbnail_url.is_none() {
|
if article.thumbnail_url.is_none() {
|
||||||
Self::check_for_thumbnail(&xpath_ctx, article);
|
Self::check_for_thumbnail(&xpath_ctx, article);
|
||||||
}
|
}
|
||||||
Self::strip_junk(&xpath_ctx, config, global_config);
|
Self::prep_content(&xpath_ctx, config, global_config, &article.url);
|
||||||
Self::fix_urls(&xpath_ctx, &article.url);
|
|
||||||
let found_body = Self::extract_body(&xpath_ctx, root, config, global_config)?;
|
let found_body = Self::extract_body(&xpath_ctx, root, config, global_config)?;
|
||||||
|
|
||||||
if !found_body {
|
if !found_body {
|
||||||
if let Err(error) = Readability::extract_body(document, root, article.title.as_deref())
|
if let Err(error) = Readability::extract_body(document, root, article.title.as_deref())
|
||||||
{
|
{
|
||||||
log::error!("Both ftr and readability failed to find content: {}", error);
|
log::error!("Both ftr and readability failed to find content: {error}");
|
||||||
return Err(error);
|
return Err(error);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
while let Some(url) = self.check_for_next_page(&xpath_ctx, config, global_config) {
|
while let Some(url) = self.check_for_next_page(&xpath_ctx, config, global_config) {
|
||||||
|
log::debug!("");
|
||||||
|
|
||||||
let headers = Util::generate_headers(config, global_config)?;
|
let headers = Util::generate_headers(config, global_config)?;
|
||||||
let html = Self::download(&url, client, headers).await?;
|
let html = Self::download(&url, client, headers).await?;
|
||||||
document = Self::parse_html(&html, config, global_config)?;
|
document = Self::parse_html(&html, config, global_config)?;
|
||||||
xpath_ctx = Self::get_xpath_ctx(&document)?;
|
xpath_ctx = Self::get_xpath_ctx(&document)?;
|
||||||
Self::strip_junk(&xpath_ctx, config, global_config);
|
Self::prep_content(&xpath_ctx, config, global_config, &url);
|
||||||
Self::fix_urls(&xpath_ctx, &url);
|
let found_body = Self::extract_body(&xpath_ctx, root, config, global_config)?;
|
||||||
Self::extract_body(&xpath_ctx, root, config, global_config)?;
|
|
||||||
|
if !found_body {
|
||||||
|
if let Err(error) =
|
||||||
|
Readability::extract_body(document, root, article.title.as_deref())
|
||||||
|
{
|
||||||
|
log::error!("Both ftr and readability failed to find content: {error}");
|
||||||
|
return Err(error);
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
Ok(())
|
Ok(())
|
||||||
|
@ -227,14 +237,14 @@ impl FullTextParser {
|
||||||
// parse html
|
// parse html
|
||||||
let parser = Parser::default_html();
|
let parser = Parser::default_html();
|
||||||
parser.parse_string(html.as_str()).map_err(|err| {
|
parser.parse_string(html.as_str()).map_err(|err| {
|
||||||
error!("Parsing HTML failed for downloaded HTML {:?}", err);
|
log::error!("Parsing HTML failed for downloaded HTML {:?}", err);
|
||||||
FullTextParserError::Xml
|
FullTextParserError::Xml
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
fn get_xpath_ctx(doc: &Document) -> Result<Context, FullTextParserError> {
|
fn get_xpath_ctx(doc: &Document) -> Result<Context, FullTextParserError> {
|
||||||
Context::new(doc).map_err(|()| {
|
Context::new(doc).map_err(|()| {
|
||||||
error!("Creating xpath context failed for downloaded HTML");
|
log::error!("Creating xpath context failed for downloaded HTML");
|
||||||
FullTextParserError::Xml
|
FullTextParserError::Xml
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
|
@ -254,8 +264,7 @@ impl FullTextParser {
|
||||||
let xpath_ctx = Self::get_xpath_ctx(&document)?;
|
let xpath_ctx = Self::get_xpath_ctx(&document)?;
|
||||||
metadata::extract(&xpath_ctx, config, Some(global_config), article);
|
metadata::extract(&xpath_ctx, config, Some(global_config), article);
|
||||||
Self::check_for_thumbnail(&xpath_ctx, article);
|
Self::check_for_thumbnail(&xpath_ctx, article);
|
||||||
Self::strip_junk(&xpath_ctx, config, global_config);
|
Self::prep_content(&xpath_ctx, config, global_config, url);
|
||||||
Self::fix_urls(&xpath_ctx, url);
|
|
||||||
Self::extract_body(&xpath_ctx, root, config, global_config)?;
|
Self::extract_body(&xpath_ctx, root, config, global_config)?;
|
||||||
|
|
||||||
Ok(())
|
Ok(())
|
||||||
|
@ -272,7 +281,7 @@ impl FullTextParser {
|
||||||
.send()
|
.send()
|
||||||
.await
|
.await
|
||||||
.map_err(|err| {
|
.map_err(|err| {
|
||||||
error!(
|
log::error!(
|
||||||
"Downloading HTML failed: GET '{}' - '{}'",
|
"Downloading HTML failed: GET '{}' - '{}'",
|
||||||
url.as_str(),
|
url.as_str(),
|
||||||
err
|
err
|
||||||
|
@ -289,22 +298,22 @@ impl FullTextParser {
|
||||||
|
|
||||||
match from_utf8(&bytes) {
|
match from_utf8(&bytes) {
|
||||||
Ok(utf8_str) => {
|
Ok(utf8_str) => {
|
||||||
debug!("Valid utf-8 string");
|
log::debug!("Valid utf-8 string");
|
||||||
return Ok(utf8_str.into());
|
return Ok(utf8_str.into());
|
||||||
}
|
}
|
||||||
Err(error) => {
|
Err(error) => {
|
||||||
debug!("Invalid utf-8 string");
|
log::debug!("Invalid utf-8 string");
|
||||||
let lossy_string = std::string::String::from_utf8_lossy(&bytes);
|
let lossy_string = std::string::String::from_utf8_lossy(&bytes);
|
||||||
|
|
||||||
if let Some(encoding) = Self::get_encoding_from_html(&lossy_string) {
|
if let Some(encoding) = Self::get_encoding_from_html(&lossy_string) {
|
||||||
debug!("Encoding extracted from HTML: '{}'", encoding);
|
log::debug!("Encoding extracted from HTML: '{}'", encoding);
|
||||||
if let Some(decoded_html) = Self::decode_html(&bytes, encoding) {
|
if let Some(decoded_html) = Self::decode_html(&bytes, encoding) {
|
||||||
return Ok(decoded_html);
|
return Ok(decoded_html);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
if let Some(encoding) = Self::get_encoding_from_http_header(&headers) {
|
if let Some(encoding) = Self::get_encoding_from_http_header(&headers) {
|
||||||
debug!("Encoding extracted from headers: '{}'", encoding);
|
log::debug!("Encoding extracted from headers: '{}'", encoding);
|
||||||
if let Some(decoded_html) = Self::decode_html(&bytes, encoding) {
|
if let Some(decoded_html) = Self::decode_html(&bytes, encoding) {
|
||||||
return Ok(decoded_html);
|
return Ok(decoded_html);
|
||||||
}
|
}
|
||||||
|
@ -350,7 +359,7 @@ impl FullTextParser {
|
||||||
return Some(decoded_html.into_owned());
|
return Some(decoded_html.into_owned());
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
warn!("Could not decode HTML. Encoding: '{}'", encoding);
|
log::warn!("Could not decode HTML. Encoding: '{}'", encoding);
|
||||||
None
|
None
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -364,7 +373,7 @@ impl FullTextParser {
|
||||||
Ok(name.into())
|
Ok(name.into())
|
||||||
}
|
}
|
||||||
None => {
|
None => {
|
||||||
error!("Getting config failed due to bad Url");
|
log::error!("Getting config failed due to bad Url");
|
||||||
Err(FullTextParserError::Config)
|
Err(FullTextParserError::Config)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -420,7 +429,7 @@ impl FullTextParser {
|
||||||
.and_then(|correct_url| node.set_property("src", &correct_url).ok())
|
.and_then(|correct_url| node.set_property("src", &correct_url).ok())
|
||||||
.is_none()
|
.is_none()
|
||||||
{
|
{
|
||||||
warn!("Failed to fix lazy loading image");
|
log::warn!("Failed to fix lazy loading image");
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
Ok(())
|
Ok(())
|
||||||
|
@ -445,10 +454,10 @@ impl FullTextParser {
|
||||||
})
|
})
|
||||||
.is_err();
|
.is_err();
|
||||||
if !success {
|
if !success {
|
||||||
warn!("Failed to add iframe as child of video wrapper <div>");
|
log::warn!("Failed to add iframe as child of video wrapper <div>");
|
||||||
}
|
}
|
||||||
} else {
|
} else {
|
||||||
warn!("Failed to get parent of iframe");
|
log::warn!("Failed to get parent of iframe");
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
Ok(())
|
Ok(())
|
||||||
|
@ -529,7 +538,21 @@ impl FullTextParser {
|
||||||
_ = Self::repair_urls(context, "//iframe", "src", url);
|
_ = Self::repair_urls(context, "//iframe", "src", url);
|
||||||
}
|
}
|
||||||
|
|
||||||
fn strip_junk(context: &Context, config: Option<&ConfigEntry>, global_config: &ConfigEntry) {
|
fn prep_content(
|
||||||
|
context: &Context,
|
||||||
|
config: Option<&ConfigEntry>,
|
||||||
|
global_config: &ConfigEntry,
|
||||||
|
url: &Url,
|
||||||
|
) {
|
||||||
|
// replace H1 with H2 as H1 should be only title that is displayed separately
|
||||||
|
if let Ok(h1_nodes) = Util::evaluate_xpath(context, "//h1", false) {
|
||||||
|
for mut h1_node in h1_nodes {
|
||||||
|
_ = h1_node.set_name("h2");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
_ = Util::mark_data_tables(context);
|
||||||
|
|
||||||
// strip specified xpath
|
// strip specified xpath
|
||||||
if let Some(config) = config {
|
if let Some(config) = config {
|
||||||
for xpath_strip in &config.xpath_strip {
|
for xpath_strip in &config.xpath_strip {
|
||||||
|
@ -620,6 +643,8 @@ impl FullTextParser {
|
||||||
_ = Util::strip_node(context, "//footer");
|
_ = Util::strip_node(context, "//footer");
|
||||||
_ = Util::strip_node(context, "//link");
|
_ = Util::strip_node(context, "//link");
|
||||||
_ = Util::strip_node(context, "//aside");
|
_ = Util::strip_node(context, "//aside");
|
||||||
|
|
||||||
|
Self::fix_urls(context, url);
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
|
@ -759,11 +784,13 @@ impl FullTextParser {
|
||||||
return Err(FullTextParserError::Xml);
|
return Err(FullTextParserError::Xml);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
Self::post_process_content(&mut node, true)?;
|
||||||
|
|
||||||
node.unlink();
|
node.unlink();
|
||||||
if root.add_child(&mut node).is_ok() {
|
if root.add_child(&mut node).is_ok() {
|
||||||
found_something = true;
|
found_something = true;
|
||||||
} else {
|
} else {
|
||||||
error!("Failed to add body to prepared document");
|
log::error!("Failed to add body to prepared document");
|
||||||
return Err(FullTextParserError::Xml);
|
return Err(FullTextParserError::Xml);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -830,35 +857,22 @@ impl FullTextParser {
|
||||||
Ok(())
|
Ok(())
|
||||||
}
|
}
|
||||||
|
|
||||||
pub(crate) fn post_process_content(document: &Document) -> Result<(), FullTextParserError> {
|
pub(crate) fn post_process_content(
|
||||||
let context = Context::new(document).map_err(|()| {
|
node: &mut Node,
|
||||||
error!("Creating xpath context failed for article HTML");
|
clean_conditionally: bool,
|
||||||
FullTextParserError::Xml
|
) -> Result<(), FullTextParserError> {
|
||||||
})?;
|
if clean_conditionally {
|
||||||
|
Util::clean_conditionally(node, "fieldset");
|
||||||
// replace H1 with H2 as H1 should be only title that is displayed separately
|
Util::clean_conditionally(node, "table");
|
||||||
let h1_nodes = Util::evaluate_xpath(&context, "//h1", false)?;
|
Util::clean_conditionally(node, "ul");
|
||||||
for mut h1_node in h1_nodes {
|
Util::clean_conditionally(node, "div");
|
||||||
h1_node.set_name("h2").map_err(|e| {
|
|
||||||
log::error!("{e}");
|
|
||||||
FullTextParserError::Xml
|
|
||||||
})?;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
Util::mark_data_tables(&context)?;
|
Self::clean_attributes(node)?;
|
||||||
|
Self::simplify_nested_elements(node)?;
|
||||||
|
|
||||||
if let Some(mut root) = document.get_root_element() {
|
Self::remove_single_cell_tables(node);
|
||||||
Util::clean_conditionally(&mut root, "fieldset");
|
Self::remove_extra_p_and_div(node);
|
||||||
Util::clean_conditionally(&mut root, "table");
|
|
||||||
Util::clean_conditionally(&mut root, "ul");
|
|
||||||
Util::clean_conditionally(&mut root, "div");
|
|
||||||
|
|
||||||
Self::clean_attributes(&mut root)?;
|
|
||||||
Self::simplify_nested_elements(&mut root)?;
|
|
||||||
|
|
||||||
Self::remove_single_cell_tables(&mut root);
|
|
||||||
Self::remove_extra_p_and_div(&mut root);
|
|
||||||
}
|
|
||||||
|
|
||||||
Ok(())
|
Ok(())
|
||||||
}
|
}
|
||||||
|
@ -927,6 +941,17 @@ impl FullTextParser {
|
||||||
let mut node_iter = Some(root.clone());
|
let mut node_iter = Some(root.clone());
|
||||||
|
|
||||||
while let Some(mut node) = node_iter {
|
while let Some(mut node) = node_iter {
|
||||||
|
let tag_name = node.get_name().to_uppercase();
|
||||||
|
|
||||||
|
for attr in constants::PRESENTATIONAL_ATTRIBUTES {
|
||||||
|
_ = node.remove_attribute(attr);
|
||||||
|
}
|
||||||
|
|
||||||
|
if constants::DEPRECATED_SIZE_ATTRIBUTE_ELEMS.contains(tag_name.as_str()) {
|
||||||
|
_ = node.remove_attribute("width");
|
||||||
|
_ = node.remove_attribute("height");
|
||||||
|
}
|
||||||
|
|
||||||
node.remove_attribute("class").map_err(|e| {
|
node.remove_attribute("class").map_err(|e| {
|
||||||
log::error!("{e}");
|
log::error!("{e}");
|
||||||
FullTextParserError::Xml
|
FullTextParserError::Xml
|
||||||
|
|
|
@ -497,6 +497,11 @@ impl Readability {
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
crate::FullTextParser::post_process_content(
|
||||||
|
&mut article_content,
|
||||||
|
state.clean_conditionally,
|
||||||
|
)?;
|
||||||
|
|
||||||
if needed_to_create_top_candidate {
|
if needed_to_create_top_candidate {
|
||||||
// We already created a fake div thing, and there wouldn't have been any siblings left
|
// We already created a fake div thing, and there wouldn't have been any siblings left
|
||||||
// for the previous loop, so there's no point trying to create a new div, and then
|
// for the previous loop, so there's no point trying to create a new div, and then
|
||||||
|
|
|
@ -18,9 +18,7 @@ async fn run_test(name: &str) {
|
||||||
let document = crate::FullTextParser::parse_html(&html, None, &empty_config).unwrap();
|
let document = crate::FullTextParser::parse_html(&html, None, &empty_config).unwrap();
|
||||||
let xpath_ctx = crate::FullTextParser::get_xpath_ctx(&document).unwrap();
|
let xpath_ctx = crate::FullTextParser::get_xpath_ctx(&document).unwrap();
|
||||||
|
|
||||||
crate::FullTextParser::strip_junk(&xpath_ctx, None, &empty_config);
|
crate::FullTextParser::prep_content(&xpath_ctx, None, &empty_config, &url);
|
||||||
|
|
||||||
crate::FullTextParser::fix_urls(&xpath_ctx, &url);
|
|
||||||
let mut article = Article {
|
let mut article = Article {
|
||||||
title: None,
|
title: None,
|
||||||
author: None,
|
author: None,
|
||||||
|
@ -36,7 +34,9 @@ async fn run_test(name: &str) {
|
||||||
|
|
||||||
metadata::extract(&xpath_ctx, None, None, &mut article);
|
metadata::extract(&xpath_ctx, None, None, &mut article);
|
||||||
super::Readability::extract_body(document, &mut root, article.title.as_deref()).unwrap();
|
super::Readability::extract_body(document, &mut root, article.title.as_deref()).unwrap();
|
||||||
crate::FullTextParser::post_process_content(&article_document).unwrap();
|
if let Some(mut root) = article_document.get_root_element() {
|
||||||
|
crate::FullTextParser::post_process_content(&mut root, false).unwrap();
|
||||||
|
}
|
||||||
|
|
||||||
article.document = Some(article_document);
|
article.document = Some(article_document);
|
||||||
let html = article.get_content().unwrap();
|
let html = article.get_content().unwrap();
|
||||||
|
@ -236,6 +236,11 @@ async fn hidden_nodes() {
|
||||||
run_test("hidden-nodes").await
|
run_test("hidden-nodes").await
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn hukumusume() {
|
||||||
|
run_test("hukumusume").await
|
||||||
|
}
|
||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn webmd_1() {
|
async fn webmd_1() {
|
||||||
run_test("webmd-1").await
|
run_test("webmd-1").await
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue