diff --git a/resources/tests/readability/001/expected.html b/resources/tests/readability/001/expected.html index 3b869e4..bdd9564 100644 --- a/resources/tests/readability/001/expected.html +++ b/resources/tests/readability/001/expected.html @@ -1,40 +1,35 @@ -
-

So finally you're testing your frontend JavaScript code? Great! The more you +

So finally you're testing your frontend JavaScript code? Great! The more you write tests, the more confident you are with your code… but how much precisely? That's where code coverage might -help. -

-

The idea behind code coverage is to record which parts of your code (functions, +help.

+

The idea behind code coverage is to record which parts of your code (functions, statements, conditionals and so on) have been executed by your test suite, to compute metrics out of these data and usually to provide tools for navigating and inspecting them.

-

Not a lot of frontend developers I know actually test their frontend code, +

Not a lot of frontend developers I know actually test their frontend code, and I can barely imagine how many of them have ever setup code coverage… Mostly because there are not many frontend-oriented tools in this area I guess.

-

Actually I've only found one which provides an adapter for Mocha and +

Actually I've only found one which provides an adapter for Mocha and actually works…

-
-

Drinking game for web devs: +

+

Drinking game for web devs:
(1) Think of a noun
(2) Google "<noun>.js"
(3) If a library with that name exists - drink

— Shay Friedman (@ironshay) August 22, 2013 -
-

Blanket.js is an easy to install, easy to configure, +

+

Blanket.js is an easy to install, easy to configure, and easy to use JavaScript code coverage library that works both in-browser and -with nodejs. -

-

Its use is dead easy, adding Blanket support to your Mocha test suite +with nodejs.

+

Its use is dead easy, adding Blanket support to your Mocha test suite is just matter of adding this simple line to your HTML test file:

<script src="vendor/blanket.js"
         data-cover-adapter="vendor/mocha-blanket.js"></script>
 
- -

Source files: blanket.js, - mocha-blanket.js -

-

As an example, let's reuse the silly Cow example we used +

Source files: blanket.js, + mocha-blanket.js

+

As an example, let's reuse the silly Cow example we used in a previous episode:

// cow.js
 (function(exports) {
@@ -54,8 +49,7 @@ with nodejs.
   };
 })(this);
 
- -

And its test suite, powered by Mocha and Chai:

+

And its test suite, powered by Mocha and Chai:

var expect = chai.expect;
 
 describe("Cow", function() {
@@ -79,8 +73,7 @@ describe("Cow", function() {
   });
 });
 
- -

Let's create the HTML test file for it, featuring Blanket and its adapter +

Let's create the HTML test file for it, featuring Blanket and its adapter for Mocha:

<!DOCTYPE html>
 <html>
@@ -104,29 +97,24 @@ describe("Cow", function() {
 </body>
 </html>
 
- -

Notes:

-
    -
  • Notice the data-cover attribute we added to the script tag +

    Notes:

    +
      +
    • Notice the data-cover attribute we added to the script tag loading the source of our library;
    • -
    • The HTML test file must be served over HTTP for the adapter to +
    • The HTML test file must be served over HTTP for the adapter to be loaded.
    • -
    -

    Running the tests now gives us something like this:

    -

    - screenshot -

    -

    As you can see, the report at the bottom highlights that we haven't actually +

+

Running the tests now gives us something like this:

+

screenshot

+

As you can see, the report at the bottom highlights that we haven't actually tested the case where an error is raised in case a target name is missing. We've been informed of that, nothing more, nothing less. We simply know we're missing a test here. Isn't this cool? I think so!

-

Just remember that code coverage will only bring you numbers and +

Just remember that code coverage will only bring you numbers and raw information, not actual proofs that the whole of your code logic has been actually covered. If you ask me, the best inputs you can get about your code logic and implementation ever are the ones issued out of pair programming sessions and code reviews — but that's another story.

-

So is code coverage silver bullet? No. Is it useful? Definitely. Happy testing! -

-
+

So is code coverage silver bullet? No. Is it useful? Definitely. Happy testing!

diff --git a/resources/tests/readability/002/expected.html b/resources/tests/readability/002/expected.html index af685ed..8cfeb6f 100644 --- a/resources/tests/readability/002/expected.html +++ b/resources/tests/readability/002/expected.html @@ -1,284 +1,144 @@ -
-
-

For more than a decade the Web has used XMLHttpRequest (XHR) to achieve +

For more than a decade the Web has used XMLHttpRequest (XHR) to achieve asynchronous requests in JavaScript. While very useful, XHR is not a very nice API. It suffers from lack of separation of concerns. The input, output and state are all managed by interacting with one object, and state is tracked using events. Also, the event-based model doesn’t play well with JavaScript’s recent focus on Promise- and generator-based asynchronous programming.

-

The Fetch API intends +

The Fetch API intends to fix most of these problems. It does this by introducing the same primitives to JS that are used in the HTTP protocol. In addition, it introduces a utility function fetch() that succinctly captures the intention of retrieving a resource from the network.

-

The Fetch specification, which +

The Fetch specification, which defines the API, nails down the semantics of a user agent fetching a resource. This, combined with ServiceWorkers, is an attempt to:

-
    -
  1. Improve the offline experience.
  2. -
  3. Expose the building blocks of the Web to the platform as part of the +
      +
    1. Improve the offline experience.
    2. +
    3. Expose the building blocks of the Web to the platform as part of the extensible web movement.
    4. -
    -

    As of this writing, the Fetch API is available in Firefox 39 (currently +

+

As of this writing, the Fetch API is available in Firefox 39 (currently Nightly) and Chrome 42 (currently dev). Github has a Fetch polyfill.

-

Feature detection

- -

Fetch API support can be detected by checking for Headers,Request, Response or fetch on +

Fetch API support can be detected by checking for Headers,Request, Response or fetch on the window or worker scope.

-

Simple fetching

- -

The most useful, high-level part of the Fetch API is the fetch() function. +

The most useful, high-level part of the Fetch API is the fetch() function. In its simplest form it takes a URL and returns a promise that resolves to the response. The response is captured as a Response object.

-

- - - - - - -
-
fetch("/data.json").then(function(res) {
-  // res instanceof Response == true.
-  if (res.ok) {
-    res.json().then(function(data) {
-      console.log(data.entries);
-    });
-  } else {
-    console.log("Looks like the response wasn't perfect, got status", res.status);
-  }
-}, function(e) {
-  console.log("Fetch failed!", e);
-});
-
-

-

Submitting some parameters, it would look like this:

-

- - - - - - -
-
fetch("http://www.example.org/submit.php", {
-  method: "POST",
-  headers: {
-    "Content-Type": "application/x-www-form-urlencoded"
-  },
-  body: "firstName=Nikhil&favColor=blue&password=easytoguess"
-}).then(function(res) {
-  if (res.ok) {
-    alert("Perfect! Your settings are saved.");
-  } else if (res.status == 401) {
-    alert("Oops! You are not authorized.");
-  }
-}, function(e) {
-  alert("Error submitting form!");
-});
-
-

-

The fetch() function’s arguments are the same as those passed +

fetch("/data.json").then(function(res){// res instanceof Response == true.if(res.ok){
+    res.json().then(function(data){
+      console.log(data.entries);});}else{
+    console.log("Looks like the response wasn't perfect, got status", res.status);}},function(e){
+  console.log("Fetch failed!", e);});
+

Submitting some parameters, it would look like this:

+
fetch("http://www.example.org/submit.php",{
+  method:"POST",
+  headers:{"Content-Type":"application/x-www-form-urlencoded"},
+  body:"firstName=Nikhil&favColor=blue&password=easytoguess"}).then(function(res){if(res.ok){
+    alert("Perfect! Your settings are saved.");}elseif(res.status==401){
+    alert("Oops! You are not authorized.");}},function(e){
+  alert("Error submitting form!");});
+

The fetch() function’s arguments are the same as those passed to the -
-Request() constructor, so you may directly pass arbitrarily +
Request() constructor, so you may directly pass arbitrarily complex requests to fetch() as discussed below.

-

Headers

- -

Fetch introduces 3 interfaces. These are Headers, Request and -
-Response. They map directly to the underlying HTTP concepts, +

Fetch introduces 3 interfaces. These are Headers, Request and +
Response. They map directly to the underlying HTTP concepts, but have
certain visibility filters in place for privacy and security reasons, such as
supporting CORS rules and ensuring cookies aren’t readable by third parties.

-

The Headers interface is +

The Headers interface is a simple multi-map of names to values:

-

- - - -
-
var content = "Hello World";
-var reqHeaders = new Headers();
-reqHeaders.append("Content-Type", "text/plain"
+
- - -
var content ="Hello World";var reqHeaders =new Headers();
+reqHeaders.append("Content-Type","text/plain"
 reqHeaders.append("Content-Length", content.length.toString());
-reqHeaders.append("X-Custom-Header", "ProcessThisImmediately");
-
-

-

The same can be achieved by passing an array of arrays or a JS object +reqHeaders.append("X-Custom-Header","ProcessThisImmediately");

+

The same can be achieved by passing an array of arrays or a JS object literal
to the constructor:

-

- - - - - - -
-
reqHeaders = new Headers({
-  "Content-Type": "text/plain",
-  "Content-Length": content.length.toString(),
-  "X-Custom-Header": "ProcessThisImmediately",
-});
-
-

-

The contents can be queried and retrieved:

-

- - - -
-
console.log(reqHeaders.has("Content-Type")); // true
-console.log(reqHeaders.has("Set-Cookie")); // false
-reqHeaders.set("Content-Type", "text/html");
-reqHeaders.append("X-Custom-Header", "AnotherValue");
+
reqHeaders =new Headers({"Content-Type":"text/plain","Content-Length": content.length.toString(),"X-Custom-Header":"ProcessThisImmediately",});
+

The contents can be queried and retrieved:

+
- - -
console.log(reqHeaders.has("Content-Type"));// true
+console.log(reqHeaders.has("Set-Cookie"));// false
+reqHeaders.set("Content-Type","text/html");
+reqHeaders.append("X-Custom-Header","AnotherValue");
  
-console.log(reqHeaders.get("Content-Length")); // 11
-console.log(reqHeaders.getAll("X-Custom-Header")); // ["ProcessThisImmediately", "AnotherValue"]
+console.log(reqHeaders.get("Content-Length"));// 11
+console.log(reqHeaders.getAll("X-Custom-Header"));// ["ProcessThisImmediately", "AnotherValue"]
  
 reqHeaders.delete("X-Custom-Header");
-console.log(reqHeaders.getAll("X-Custom-Header")); // []
-
-

-

Some of these operations are only useful in ServiceWorkers, but they provide +console.log(reqHeaders.getAll("X-Custom-Header"));// []

+

Some of these operations are only useful in ServiceWorkers, but they provide
a much nicer API to Headers.

-

Since Headers can be sent in requests, or received in responses, and have +

Since Headers can be sent in requests, or received in responses, and have various limitations about what information can and should be mutable, Headers objects have a guard property. This is not exposed to the Web, but it affects which mutation operations are allowed on the Headers object.
Possible values are:

- +

The details of how each guard affects the behaviors of the Headers object are
in the specification. For example, you may not append or set a “request” guarded Headers’ “Content-Length” header. Similarly, inserting “Set-Cookie” into a Response header is not allowed so that ServiceWorkers may not set cookies via synthesized Responses.

-

All of the Headers methods throw TypeError if name is not a +

All of the Headers methods throw TypeError if name is not a valid HTTP Header name. The mutation operations will throw TypeError if there is an immutable guard. Otherwise they fail silently. For example:

-

- - - - - - -
-
var res = Response.error();
-try {
-  res.headers.set("Origin", "http://mybank.com");
-} catch(e) {
-  console.log("Cannot pretend to be a bank!");
-}
-
-

- +
var res = Response.error();try{
+  res.headers.set("Origin","http://mybank.com");}catch(e){
+  console.log("Cannot pretend to be a bank!");}

Request

- -

The Request interface defines a request to fetch a resource over HTTP. +

The Request interface defines a request to fetch a resource over HTTP. URL, method and headers are expected, but the Request also allows specifying a body, a request mode, credentials and cache hints.

-

The simplest Request is of course, just a URL, as you may do to GET a +

The simplest Request is of course, just a URL, as you may do to GET a resource.

-

- - - - - - -
-
var req = new Request("/index.html");
-console.log(req.method); // "GET"
-console.log(req.url); // "http://example.com/index.html"
-
-

-

You may also pass a Request to the Request() constructor to +

var req =new Request("/index.html");
+console.log(req.method);// "GET"
+console.log(req.url);// "http://example.com/index.html"
+

You may also pass a Request to the Request() constructor to create a copy.
(This is not the same as calling the clone() method, which is covered in
the “Reading bodies” section.).

-

- - - - - - -
-
var copy = new Request(req);
-console.log(copy.method); // "GET"
-console.log(copy.url); // "http://example.com/index.html"
-
-

-

Again, this form is probably only useful in ServiceWorkers.

-

The non-URL attributes of the Request can only be set by passing +

var copy =new Request(req);
+console.log(copy.method);// "GET"
+console.log(copy.url);// "http://example.com/index.html"
+

Again, this form is probably only useful in ServiceWorkers.

+

The non-URL attributes of the Request can only be set by passing initial
values as a second argument to the constructor. This argument is a dictionary.

-

- - - - - - -
-
var uploadReq = new Request("/uploadImage", {
-  method: "POST",
-  headers: {
-    "Content-Type": "image/png",
-  },
-  body: "image data"
-});
-
-

-

The Request’s mode is used to determine if cross-origin requests lead +

var uploadReq =new Request("/uploadImage",{
+  method:"POST",
+  headers:{"Content-Type":"image/png",},
+  body:"image data"});
+

The Request’s mode is used to determine if cross-origin requests lead to valid responses, and which properties on the response are readable. Legal mode values are "same-origin", "no-cors" (default) and "cors".

-

The "same-origin" mode is simple, if a request is made to another +

The "same-origin" mode is simple, if a request is made to another origin with this mode set, the result is simply an error. You could use this to ensure that
a request is always being made to your origin.

-

- - - - - - -
-
var arbitraryUrl = document.getElementById("url-input").value;
-fetch(arbitraryUrl, { mode: "same-origin" }).then(function(res) {
-  console.log("Response succeeded?", res.ok);
-}, function(e) {
-  console.log("Please enter a same-origin URL!");
-});
-
-

-

The "no-cors" mode captures what the web platform does by default +

var arbitraryUrl = document.getElementById("url-input").value;
+fetch(arbitraryUrl,{ mode:"same-origin"}).then(function(res){
+  console.log("Response succeeded?", res.ok);},function(e){
+  console.log("Please enter a same-origin URL!");});
+

The "no-cors" mode captures what the web platform does by default for scripts you import from CDNs, images hosted on other domains, and so on. First, it prevents the method from being anything other than “HEAD”, “GET” or “POST”. Second, if any ServiceWorkers intercept these requests, @@ -287,7 +147,7 @@ fetch(arbitraryUrl, { mode: This ensures that ServiceWorkers do not affect the semantics of the Web and prevents security and privacy issues that could arise from leaking data across domains.

-

"cors" mode is what you’ll usually use to make known cross-origin +

"cors" mode is what you’ll usually use to make known cross-origin requests to access various APIs offered by other vendors. These are expected to adhere to
the CORS protocol. @@ -295,300 +155,175 @@ fetch(arbitraryUrl, { mode: headers is exposed in the Response, but the body is readable. For example, you could get a list of Flickr’s most interesting photos today like this:

-

- - - -
-
var u = new URLSearchParams();
-u.append('method', 'flickr.interestingness.getList');
-u.append('api_key', '<insert api key here>');
-u.append('format', 'json');
-u.append('nojsoncallback', '1');
+
- - -
var u =new URLSearchParams();
+u.append('method','flickr.interestingness.getList');
+u.append('api_key','<insert api key here>');
+u.append('format','json');
+u.append('nojsoncallback','1');var apiCall = fetch('https://api.flickr.com/services/rest?'+ u);
  
-var apiCall = fetch('https://api.flickr.com/services/rest?' + u);
- 
-apiCall.then(function(response) {
-  return response.json().then(function(json) {
-    // photo is a list of photos.
-    return json.photos.photo;
-  });
-}).then(function(photos) {
-  photos.forEach(function(photo) {
-    console.log(photo.title);
-  });
-});
-
-

-

You may not read out the “Date” header since Flickr does not allow it +apiCall.then(function(response){return response.json().then(function(json){// photo is a list of photos.return json.photos.photo;});}).then(function(photos){ + photos.forEach(function(photo){ + console.log(photo.title);});});

+

You may not read out the “Date” header since Flickr does not allow it via -
-Access-Control-Expose-Headers.

-

- - - - - - -
-
response.headers.get("Date"); // null
-
-

-

The credentials enumeration determines if cookies for the other +
Access-Control-Expose-Headers.

+
response.headers.get("Date");// null
+

The credentials enumeration determines if cookies for the other domain are -
sent to cross-origin requests. This is similar to XHR’s withCredentials -
flag, but tri-valued as "omit" (default), "same-origin" and "include".

-

The Request object will also give the ability to offer caching hints to +
sent to cross-origin requests. This is similar to XHR’s withCredentials
flag, but tri-valued as "omit" (default), "same-origin" and "include".

+

The Request object will also give the ability to offer caching hints to the user-agent. This is currently undergoing some security review. Firefox exposes the attribute, but it has no effect.

-

Requests have two read-only attributes that are relevant to ServiceWorkers +

Requests have two read-only attributes that are relevant to ServiceWorkers
intercepting them. There is the string referrer, which is set by the UA to be
the referrer of the Request. This may be an empty string. The other is -
-context which is a rather large enumeration defining +
context which is a rather large enumeration defining what sort of resource is being fetched. This could be “image” if the request is from an <img>tag in the controlled document, “worker” if it is an attempt to load a worker script, and so on. When used with the fetch() function, it is “fetch”.

-

Response

- -

Response instances are returned by calls to fetch(). +

Response instances are returned by calls to fetch(). They can also be created by JS, but this is only useful in ServiceWorkers.

-

We have already seen some attributes of Response when we looked at fetch(). +

We have already seen some attributes of Response when we looked at fetch(). The most obvious candidates are status, an integer (default value 200) and statusText (default value “OK”), which correspond to the HTTP status code and reason. The ok attribute is just a shorthand for checking that status is in the range 200-299 inclusive.

-

headers is the Response’s Headers object, with guard “response”. +

headers is the Response’s Headers object, with guard “response”. The url attribute reflects the URL of the corresponding request.

-

Response also has a type, which is “basic”, “cors”, “default”, +

Response also has a type, which is “basic”, “cors”, “default”, “error” or
“opaque”.

- +

The “error” type results in the fetch() Promise rejecting with TypeError.

-

There are certain attributes that are useful only in a ServiceWorker scope. +

There are certain attributes that are useful only in a ServiceWorker scope. The
idiomatic way to return a Response to an intercepted request in ServiceWorkers is:

-

- - - - - - -
-
addEventListener('fetch', function(event) {
-  event.respondWith(new Response("Response body", {
-    headers: { "Content-Type" : "text/plain" }
-  });
-});
-
-

-

As you can see, Response has a two argument constructor, where both arguments +

addEventListener('fetch',function(event){
+  event.respondWith(new Response("Response body",{
+    headers:{"Content-Type":"text/plain"}});});
+

As you can see, Response has a two argument constructor, where both arguments are optional. The first argument is a body initializer, and the second is a dictionary to set the status, statusText and headers.

-

The static method Response.error() simply returns an error +

The static method Response.error() simply returns an error response. Similarly, Response.redirect(url, status) returns a Response resulting in
a redirect to url.

-

Dealing with bodies

- -

Both Requests and Responses may contain body data. We’ve been glossing +

Both Requests and Responses may contain body data. We’ve been glossing over it because of the various data types body may contain, but we will cover it in detail now.

-

A body is an instance of any of the following types.

- +

In addition, Request and Response both offer the following methods to extract their body. These all return a Promise that is eventually resolved with the actual content.

- -

This is a significant improvement over XHR in terms of ease of use of +

+

This is a significant improvement over XHR in terms of ease of use of non-text data!

-

Request bodies can be set by passing body parameters:

-

- - - -
-
var form = new FormData(document.getElementById('login-form'));
-fetch("/login", {
-  method: "POST",
+

Request bodies can be set by passing body parameters:

+
- - -
var form =new FormData(document.getElementById('login-form'));
+fetch("/login",{
+  method:"POST",
   body: form
-})
-
-

-

Responses take the first argument as the body.

-

- - - - - - -
-
var res = new Response(new File(["chunk", "chunk"], "archive.zip",
-                       { type: "application/zip" }));
-
-

-

Both Request and Response (and by extension the fetch() function), +})

+

Responses take the first argument as the body.

+
var res =new Response(new File(["chunk","chunk"],"archive.zip",{ type:"application/zip"}));
+

Both Request and Response (and by extension the fetch() function), will try to intelligently determine the content type. Request will also automatically set a “Content-Type” header if none is set in the dictionary.

-

Streams and cloning

- -

It is important to realise that Request and Response bodies can only be +

It is important to realise that Request and Response bodies can only be read once! Both interfaces have a boolean attribute bodyUsed to determine if it is safe to read or not.

-

- - - -
-
var res = new Response("one time use");
-console.log(res.bodyUsed); // false
-res.text().then(function(v) {
-  console.log(res.bodyUsed); // true
-});
-console.log(res.bodyUsed); // true
+
- - -
var res =new Response("one time use");
+console.log(res.bodyUsed);// false
+res.text().then(function(v){
+  console.log(res.bodyUsed);// true});
+console.log(res.bodyUsed);// true
  
-res.text().catch(function(e) {
-  console.log("Tried to read already consumed Response");
-});
-
-

-

This decision allows easing the transition to an eventual stream-based Fetch +res.text().catch(function(e){ + console.log("Tried to read already consumed Response");});

+

This decision allows easing the transition to an eventual stream-based Fetch API. The intention is to let applications consume data as it arrives, allowing for JavaScript to deal with larger files like videos, and perform things like compression and editing on the fly.

-

Often, you’ll want access to the body multiple times. For example, you +

Often, you’ll want access to the body multiple times. For example, you can use the upcoming Cache API to store Requests and Responses for offline use, and Cache requires bodies to be available for reading.

-

So how do you read out the body multiple times within such constraints? +

So how do you read out the body multiple times within such constraints? The API provides a clone() method on the two interfaces. This will return a clone of the object, with a ‘new’ body. clone() MUST be called before the body of the corresponding object has been used. That is, clone() first, read later.

-

- - - -
-
addEventListener('fetch', function(evt) {
-  var sheep = new Response("Dolly");
-  console.log(sheep.bodyUsed); // false
-  var clone = sheep.clone();
-  console.log(clone.bodyUsed); // false
+
- - -
addEventListener('fetch',function(evt){var sheep =new Response("Dolly");
+  console.log(sheep.bodyUsed);// falsevar clone = sheep.clone();
+  console.log(clone.bodyUsed);// false
  
   clone.text();
-  console.log(sheep.bodyUsed); // false
-  console.log(clone.bodyUsed); // true
+  console.log(sheep.bodyUsed);// false
+  console.log(clone.bodyUsed);// true
  
-  evt.respondWith(cache.add(sheep.clone()).then(function(e) {
-    return sheep;
-  });
-});
-
-

- + evt.respondWith(cache.add(sheep.clone()).then(function(e){return sheep;});});

Future improvements

- -

Along with the transition to streams, Fetch will eventually have the ability +

Along with the transition to streams, Fetch will eventually have the ability to abort running fetch()es and some way to report the progress of a fetch. These are provided by XHR, but are a little tricky to fit in the Promise-based nature of the Fetch API.

-

You can contribute to the evolution of this API by participating in discussions +

You can contribute to the evolution of this API by participating in discussions on the WHATWG mailing list and in the issues in the Fetch and ServiceWorkerspecifications.

-

For a better web!

-

The author would like to thank Andrea Marchesini, Anne van Kesteren and Ben
-Kelly for helping with the specification and implementation.
-

-
- - - +

diff --git a/resources/tests/readability/webmd-1/expected.html b/resources/tests/readability/webmd-1/expected.html new file mode 100644 index 0000000..e0b4cb0 --- /dev/null +++ b/resources/tests/readability/webmd-1/expected.html @@ -0,0 +1,68 @@ +
+

+
+

+

+

Feb. 23, 2015 -- Life-threatening peanut allergies have mysteriously + been + on the rise in the past decade, with little hope for a cure.

+

But a groundbreaking new + study may offer a way to stem that rise, while + another may offer some hope for those who are already allergic.

+

Parents have been told for years to avoid giving foods containing + peanuts + to babies for fear of triggering an allergy. Now research shows the + opposite + is true: Feeding babies snacks made with peanuts before their first + birthday + appears to prevent that from happening.

+

The study is published in the New England Journal of Medicine, + and + it was presented at the annual meeting of the American Academy of + Allergy, + Asthma and Immunology in Houston. It found that among children at + high + risk for getting peanut allergies, eating peanut snacks by 11 months + of + age and continuing to eat them at least three times a week until age + 5 + cut their chances of becoming allergic by more than 80% compared to + kids + who avoided peanuts. Those at high risk were already allergic to + egg, they + had the skin condition eczema, or + both.

+

Overall, about 3% of kids who ate peanut butter or peanut snacks + before + their first birthday got an allergy, compared to about 17% of kids + who + didn’t eat them.

+

“I think this study is an astounding and groundbreaking study, + really,” + says Katie Allen, MD, PhD. She's the director of the Center for Food + and + Allergy Research at the Murdoch Children’s Research Institute in + Melbourne, + Australia. Allen was not involved in the research.

+

Experts say the research should shift thinking about how kids develop + food + allergies, and it should change the guidance doctors give to + parents. +

+

Meanwhile, for children and adults who are already allergic to peanuts, + another study presented at the same meeting held out hope of a + treatment.

+

A new skin patch called Viaskin allowed people with peanut allergies + to + eat tiny amounts of peanuts after they wore it for a year.

+

+

A Change in Guidelines?

+

Allergies to peanuts and other foods are on the rise. In the U.S., + more + than 2% of people react to peanuts, a 400% increase since 1997. And + reactions + to peanuts and other tree nuts can be especially severe. Nuts are + the main + reason people get a life-threatening problem called anaphylaxis.

+
+
diff --git a/resources/tests/readability/webmd-1/source.html b/resources/tests/readability/webmd-1/source.html new file mode 100644 index 0000000..29850a9 --- /dev/null +++ b/resources/tests/readability/webmd-1/source.html @@ -0,0 +1,2948 @@ + + + + + Babies Who Eat Peanuts Early May Avoid Allergy + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + + + + +
+
+
+
+
+
+ +
+
+
+
+
+
Skip to + content + + + + +
+
+ +
+
+
+
+ +
+
+
+
+
+

Allergies Health Center

+ +
+
+ +
+
+
+
+
+
+
+ + + +
+
+ +
+ +
+
Font Size
+
+
A + +
+
A + +
+
A + +
+
+
+
+ +
+ + + +
+

Babies Who Eat Peanuts Early May Avoid Allergy

+ +
+
By +
WebMD Health News +
+ +
+

+ +

+

Feb. 23, 2015 -- Life-threatening peanut allergies have mysteriously + been + on the rise in the past decade, with little hope for a cure.

+

But a groundbreaking new + study may offer a way to stem that rise, while + another may offer some hope for those who are already allergic.

+

Parents have been told for years to avoid giving foods containing + peanuts + to babies for fear of triggering an allergy. Now research shows the + opposite + is true: Feeding babies snacks made with peanuts before their first + birthday + appears to prevent that from happening.

+

The study is published in the New England Journal of Medicine, + and + it was presented at the annual meeting of the American Academy of + Allergy, + Asthma and Immunology in Houston. It found that among children at + high + risk for getting peanut allergies, eating peanut snacks by 11 months + of + age and continuing to eat them at least three times a week until age + 5 + cut their chances of becoming allergic by more than 80% compared to + kids + who avoided peanuts. Those at high risk were already allergic to + egg, they + had the skin condition eczema, or + both.

+

Overall, about 3% of kids who ate peanut butter or peanut snacks + before + their first birthday got an allergy, compared to about 17% of kids + who + didn’t eat them.

+

“I think this study is an astounding and groundbreaking study, + really,” + says Katie Allen, MD, PhD. She's the director of the Center for Food + and + Allergy Research at the Murdoch Children’s Research Institute in + Melbourne, + Australia. Allen was not involved in the research.

+

Experts say the research should shift thinking about how kids develop + food + allergies, and it should change the guidance doctors give to + parents. +

+

Meanwhile, for children and adults who are already allergic to peanuts, + another study presented at the same meeting held out hope of a + treatment.

+

A new skin patch called Viaskin allowed people with peanut allergies + to + eat tiny amounts of peanuts after they wore it for a year.

+

+ +

A Change in Guidelines?

+ +

Allergies to peanuts and other foods are on the rise. In the U.S., + more + than 2% of people react to peanuts, a 400% increase since 1997. And + reactions + to peanuts and other tree nuts can be especially severe. Nuts are + the main + reason people get a life-threatening problem called anaphylaxis.

+
+
+
+ +
+
+
1 + | + 2 + + | 3 + + | 4 + + | 5 + + +
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+ +
+ +
+ + + + + + +
+
+
+
+ + +
+
+
+
+
+
+
+
+
+
+
+ + + + + + +
+
+
+ +
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+

Today on WebMD

+ +
+
+ man blowing nose + + + +
Make these tweaks to your diet, home, and + lifestyle.
+
+
+ Allergy capsule + + + +
Breathe easier with these products.
+
+
 
+
+ cat on couch + + + +
Live in harmony with your cat or dog.
+
+
+ Woman sneezing with tissue in meadow + + + +
Which ones affect you?
+
+
 
+
+
+ + +
+
+
+
+
+ + +
+
+
+
+
+
+
+
+ +
+

+ +
+
+ blowing nose + + +
Article
+ +
+
+ woman with sore throat + + +
Article
+ +
+
 
+
+ lone star tick + + +
Slideshow
+ +
+
+ Woman blowing nose + + +
Slideshow
+ +
+
 
+ + +
+
+
+
+ +
+ +
+
+

Send yourself a link to download the app.

+
+ +
+
+ +
+
+
+
+
Loading ...
+

Please wait...

+
+
+

This feature is temporarily unavailable. Please try again later.

+
+
+

Thanks!

+ +

Now check your email account on your mobile phone to download your new + app.

+
+
+ +
+
+
+

+ +
+
+ cat lying on shelf + + +
Article
+ +
+
+ Allergy prick test + + +
VIDEO
+ +
+
 
+
+ Man sneezing into tissue + + +
Assessment
+ +
+
+ Woman holding feather duster up to face, twitching + + +
Quiz
+ +
+
 
+ + +
+
+
+
+
+
+ + +
+
+
+
+
+
+
+
+ +
+
+
+ + + + + \ No newline at end of file diff --git a/src/full_text_parser/mod.rs b/src/full_text_parser/mod.rs index 7703182..381d1c1 100644 --- a/src/full_text_parser/mod.rs +++ b/src/full_text_parser/mod.rs @@ -40,7 +40,7 @@ impl FullTextParser { url: &url::Url, client: &Client, ) -> Result { - libxml::tree::node::set_node_rc_guard(3); + libxml::tree::node::set_node_rc_guard(4); info!("Scraping article: '{}'", url.as_str()); diff --git a/src/full_text_parser/readability/mod.rs b/src/full_text_parser/readability/mod.rs index 3ea3ddc..4ee2264 100644 --- a/src/full_text_parser/readability/mod.rs +++ b/src/full_text_parser/readability/mod.rs @@ -32,6 +32,12 @@ impl Readability { while let Some(node_ref) = node.as_mut() { let tag_name = node_ref.get_name().to_uppercase(); + + if tag_name == "TEXT" && node_ref.get_content().trim().is_empty() { + node = Util::remove_and_next(node_ref); + continue; + } + let match_string = node_ref .get_class_names() .iter() @@ -107,16 +113,12 @@ impl Readability { for mut child_node in node_ref.get_child_nodes().into_iter() { if Self::is_phrasing_content(&child_node) { if let Some(p) = p.as_mut() { + child_node.unlink(); let _ = p.add_child(&mut child_node); } else if !Util::is_whitespace(&child_node) { + child_node.unlink(); let mut new_node = Node::new("p", None, &document) .map_err(|()| FullTextParserError::Readability)?; - node_ref - .replace_child_node(new_node.clone(), child_node.clone()) - .map_err(|error| { - log::error!("{error}"); - FullTextParserError::Readability - })?; new_node.add_child(&mut child_node).map_err(|error| { log::error!("{error}"); FullTextParserError::Readability @@ -247,6 +249,9 @@ impl Readability { }); let top_candidates = candidates.into_iter().take(5).collect::>(); + // for candidate in top_candidates.iter() { + // println!("candidate: {} {:?}", candidate.get_name(), candidate.get_attributes()); + // } let mut needed_to_create_top_candidate = false; let mut top_candidate = top_candidates.first().cloned().unwrap_or_else(|| { // If we still have no top candidate, just use the body as a last resort. @@ -619,12 +624,8 @@ impl Readability { is_text_node || constants::PHRASING_ELEMS.contains(&tag_name.as_str()) - || (tag_name == "A" || tag_name == "DEL" || tag_name == "INS") - && node - .get_child_nodes() - .iter() - .map(Self::is_phrasing_content) - .all(|val| val) + || ((tag_name == "A" || tag_name == "DEL" || tag_name == "INS") + && node.get_child_nodes().iter().all(Self::is_phrasing_content)) } // Initialize a node with the readability object. Also checks the diff --git a/src/full_text_parser/readability/tests.rs b/src/full_text_parser/readability/tests.rs index 199e276..4671ae5 100644 --- a/src/full_text_parser/readability/tests.rs +++ b/src/full_text_parser/readability/tests.rs @@ -7,7 +7,7 @@ use crate::{ }; async fn run_test(name: &str) { - libxml::tree::node::set_node_rc_guard(3); + libxml::tree::node::set_node_rc_guard(4); let _ = env_logger::builder().is_test(true).try_init(); let empty_config = ConfigEntry::default(); @@ -43,22 +43,27 @@ async fn run_test(name: &str) { article.document = Some(article_document); let html = article.get_content().unwrap(); + //std::fs::write("expected.html", &html).unwrap(); + let expected = std::fs::read_to_string(format!( "./resources/tests/readability/{name}/expected.html" )) .expect("Failed to read expected HTML"); - //std::fs::write("expected.html", &html).unwrap(); - assert_eq!(expected, html); } -#[tokio::test(flavor = "current_thread")] +#[tokio::test] async fn test_001() { run_test("001").await } -#[tokio::test(flavor = "current_thread")] +#[tokio::test] async fn test_002() { run_test("002").await } + +#[tokio::test] +async fn webmd_1() { + run_test("webmd-1").await +} diff --git a/src/util.rs b/src/util.rs index 9defaa2..117a55b 100644 --- a/src/util.rs +++ b/src/util.rs @@ -360,11 +360,11 @@ impl Util { pub fn has_single_tag_inside_element(node: &Node, tag: &str) -> bool { // There should be exactly 1 element child with given tag - if node.get_child_nodes().len() == 1 + if node.get_child_nodes().len() != 1 || node .get_child_nodes() .first() - .map(|n| n.get_name().to_uppercase() == tag) + .map(|n| n.get_name().to_uppercase() != tag) .unwrap_or(false) { return false; @@ -438,8 +438,8 @@ impl Util { // Determine whether element has any children block level elements. pub fn has_child_block_element(node: &Node) -> bool { - node.get_child_elements().iter().any(|node| { - constants::DIV_TO_P_ELEMS.contains(node.get_name().as_str()) + node.get_child_nodes().iter().any(|node| { + constants::DIV_TO_P_ELEMS.contains(node.get_name().to_uppercase().as_str()) || Self::has_child_block_element(node) }) }