diff --git a/resources/tests/readability/ehow-1/expected.html b/resources/tests/readability/ehow-1/expected.html index 3bea2b2..d02bb05 100644 --- a/resources/tests/readability/ehow-1/expected.html +++ b/resources/tests/readability/ehow-1/expected.html @@ -1,7 +1,6 @@
-

-How to Build a Terrarium

+
diff --git a/resources/tests/readability/embedded-videos/expected.html b/resources/tests/readability/embedded-videos/expected.html index 89513a2..33e6650 100644 --- a/resources/tests/readability/embedded-videos/expected.html +++ b/resources/tests/readability/embedded-videos/expected.html @@ -6,7 +6,7 @@ consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

-

Videos

+

At root

diff --git a/resources/tests/readability/keep-images/expected.html b/resources/tests/readability/keep-images/expected.html index 233d7b2..1bc6901 100644 --- a/resources/tests/readability/keep-images/expected.html +++ b/resources/tests/readability/keep-images/expected.html @@ -9,7 +9,7 @@
-

Inside the Deep Web Drug Lab

+

Welcome to DoctorX’s Barcelona lab, where the drugs you bought online are tested for safety and purity. No questions asked.

diff --git a/resources/tests/readability/links-in-tables/expected.html b/resources/tests/readability/links-in-tables/expected.html new file mode 100644 index 0000000..7b80bfd --- /dev/null +++ b/resources/tests/readability/links-in-tables/expected.html @@ -0,0 +1,257 @@ +
+

+Posted by Andrew Hayden, Software Engineer on Google Play +

+

+Android users are downloading tens of billions of apps and games on Google Play. + We're also seeing developers update their apps frequently in order to provide +users with great content, improve security, and enhance the overall user +experience. It takes a lot of data to download these updates and we know users +care about how much data their devices are using. Earlier this year, we +announced that we started using the +bsdiff algorithm (by +Colin Percival). Using bsdiff, we were able to reduce the size of app +updates on average by 47% compared to the full APK size. +

+

+Today, we're excited to share a new approach that goes further — File-by-File +patching. App Updates using File-by-File patching are, on average, +65% smaller than the full app, and in some cases more than 90% +smaller. +

+

+The savings, compared to our previous approach, add up to 6 petabytes of user +data saved per day! +

+

+In order to get the new version of the app, Google Play sends your device a +patch that describes the differences between the old and new versions +of the app. +

+

+Imagine you are an author of a book about to be published, and wish to change a +single sentence - it's much easier to tell the editor which sentence to change +and what to change, rather than send an entirely new book. In the same way, +patches are much smaller and much faster to download than the entire APK. +

+

+Techniques used in File-by-File +patching +

+

+Android apps are packaged as APKs, which are ZIP files with special conventions. +Most of the content within the ZIP files (and APKs) is compressed using a +technology called Deflate. +Deflate is really good at compressing data but it has a drawback: it makes +identifying changes in the original (uncompressed) content really hard. Even a +tiny change to the original content (like changing one word in a book) can make +the compressed output of deflate look completely different. Describing +the differences between the original content is easy, but describing +the differences between the compressed content is so hard that it leads +to inefficient patches. +

+

+Watch how much the compressed text on the right side changes from a one-letter +change in the uncompressed text on the left: +

+

+

+File-by-File therefore is based on detecting changes in the uncompressed data. +To generate a patch, we first decompress both old and new files before computing +the delta (we still use bsdiff here). Then to apply the patch, we decompress the +old file, apply the delta to the uncompressed content and then recompress the +new file. In doing so, we need to make sure that the APK on your device is a +perfect match, byte for byte, to the one on the Play Store (see APK Signature +Schema v2 for why). +

+

+When recompressing the new file, we hit two complications. First, Deflate has a +number of settings that affect output; and we don't know which settings were +used in the first place. Second, many versions of deflate exist and we need to +know whether the version on your device is suitable. +

+

+Fortunately, after analysis of the apps on the Play Store, we've discovered that +recent and compatible versions of deflate based on zlib (the most popular +deflate library) account for almost all deflated content in the Play Store. In +addition, the default settings (level=6) and maximum compression settings +(level=9) are the only settings we encountered in practice. +

+

+Knowing this, we can detect and reproduce the original deflate settings. This +makes it possible to uncompress the data, apply a patch, and then recompress the +data back to exactly the same bytes as originally uploaded. +

+

+However, there is one trade off; extra processing power is needed on the device. +On modern devices (e.g. from 2015), recompression can take a little over a +second per megabyte and on older or less powerful devices it can be longer. +Analysis so far shows that, on average, if the patch size is halved then the +time spent applying the patch (which for File-by-File includes recompression) is +doubled. +

+

+For now, we are limiting the use of this new patching technology to auto-updates +only, i.e. the updates that take place in the background, usually at night when +your phone is plugged into power and you're not likely to be using it. This +ensures that users won't have to wait any longer than usual for an update to +finish when manually updating an app. +

+

+How effective is File-by-File +Patching? +

+

+Here are examples of app updates already using File-by-File Patching: +

+
+ ++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+

Application

+
+

Original Size

+
+

Previous (BSDiff) Patch Size

+

(% vs original)

+
+

File-by-File Patch Size (% vs original)

+
+ + +

71.1 MB

+
+

13.4 MB (-81%)

+
+

8.0 MB (-89%)

+
+ + +

32.7 MB

+
+

17.5 MB (-46%)

+
+

9.6 MB (-71%)

+
+
+

Gmail

+
+
+

17.8 MB

+
+

7.6 MB (-57%)

+
+

7.3 MB (-59%)

+
+ + +

18.9 MB

+
+

17.2 MB (-9%)

+
+

13.1 MB (-31%)

+
+
+

Kindle

+
+
+

52.4 MB

+
+

19.1 MB (-64%)

+
+

8.4 MB (-84%)

+
+ + +

16.2 MB

+
+

7.7 MB (-52%)

+
+

1.2 MB (-92%)

+
+
+

Disclaimer: if you see different patch sizes when you press "update" +manually, that is because we are not currently using File-by-file for +interactive updates, only those done in the background.

+

+Saving data and making our +users (& developers!) happy +

+

+These changes are designed to ensure our community of over a billion Android +users use as little data as possible for regular app updates. The best thing is +that as a developer you don't need to do anything. You get these reductions to +your update size for free! +

+ +

+If you'd like to know more about File-by-File patching, including the technical +details, head over to the Archive Patcher GitHub +project where you can find information, including the source code. Yes, +File-by-File patching is completely open-source! +

+

+As a developer if you're interested in reducing your APK size still further, +here are some general +tips on reducing APK size. +

+

+ +
diff --git a/resources/tests/readability/links-in-tables/source.html b/resources/tests/readability/links-in-tables/source.html new file mode 100644 index 0000000..b9ca816 --- /dev/null +++ b/resources/tests/readability/links-in-tables/source.html @@ -0,0 +1,3165 @@ + + + + + + + + + + + + + + + + + + + + +Saving Data: Reducing the size of App Updates by 65% | Android Developers Blog + + + + + + + + + + + + + +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
This Blog
This Blog
 
 
 
+
+ +
+ +

06 December 2016

+ +
+ +
+
+ +

+Saving Data: Reducing the size of App Updates by 65% +

+
+
+
+
+

+Posted by Andrew Hayden, Software Engineer on Google Play +

+

+Android users are downloading tens of billions of apps and games on Google Play. + We're also seeing developers update their apps frequently in order to provide +users with great content, improve security, and enhance the overall user +experience. It takes a lot of data to download these updates and we know users +care about how much data their devices are using. Earlier this year, we +announced that we started using the +bsdiff algorithm (by +Colin Percival). Using bsdiff, we were able to reduce the size of app +updates on average by 47% compared to the full APK size. +

+

+Today, we're excited to share a new approach that goes further — File-by-File +patching. App Updates using File-by-File patching are, on average, +65% smaller than the full app, and in some cases more than 90% +smaller. +

+

+The savings, compared to our previous approach, add up to 6 petabytes of user +data saved per day! +

+

+In order to get the new version of the app, Google Play sends your device a +patch that describes the differences between the old and new versions +of the app. +

+

+Imagine you are an author of a book about to be published, and wish to change a +single sentence - it's much easier to tell the editor which sentence to change +and what to change, rather than send an entirely new book. In the same way, +patches are much smaller and much faster to download than the entire APK. +

+

+Techniques used in File-by-File +patching +

+

+Android apps are packaged as APKs, which are ZIP files with special conventions. +Most of the content within the ZIP files (and APKs) is compressed using a +technology called Deflate. +Deflate is really good at compressing data but it has a drawback: it makes +identifying changes in the original (uncompressed) content really hard. Even a +tiny change to the original content (like changing one word in a book) can make +the compressed output of deflate look completely different. Describing +the differences between the original content is easy, but describing +the differences between the compressed content is so hard that it leads +to inefficient patches. +

+

+Watch how much the compressed text on the right side changes from a one-letter +change in the uncompressed text on the left: +

+
+

+File-by-File therefore is based on detecting changes in the uncompressed data. +To generate a patch, we first decompress both old and new files before computing +the delta (we still use bsdiff here). Then to apply the patch, we decompress the +old file, apply the delta to the uncompressed content and then recompress the +new file. In doing so, we need to make sure that the APK on your device is a +perfect match, byte for byte, to the one on the Play Store (see APK Signature +Schema v2 for why). +

+

+When recompressing the new file, we hit two complications. First, Deflate has a +number of settings that affect output; and we don't know which settings were +used in the first place. Second, many versions of deflate exist and we need to +know whether the version on your device is suitable. +

+

+Fortunately, after analysis of the apps on the Play Store, we've discovered that +recent and compatible versions of deflate based on zlib (the most popular +deflate library) account for almost all deflated content in the Play Store. In +addition, the default settings (level=6) and maximum compression settings +(level=9) are the only settings we encountered in practice. +

+

+Knowing this, we can detect and reproduce the original deflate settings. This +makes it possible to uncompress the data, apply a patch, and then recompress the +data back to exactly the same bytes as originally uploaded. +

+

+However, there is one trade off; extra processing power is needed on the device. +On modern devices (e.g. from 2015), recompression can take a little over a +second per megabyte and on older or less powerful devices it can be longer. +Analysis so far shows that, on average, if the patch size is halved then the +time spent applying the patch (which for File-by-File includes recompression) is +doubled. +

+

+For now, we are limiting the use of this new patching technology to auto-updates +only, i.e. the updates that take place in the background, usually at night when +your phone is plugged into power and you're not likely to be using it. This +ensures that users won't have to wait any longer than usual for an update to +finish when manually updating an app. +

+

+How effective is File-by-File +Patching? +

+

+Here are examples of app updates already using File-by-File Patching: +

+
+
+
+
+ + + + + + + + +
+Application
+
+Original Size
+
+Previous (BSDiff) Patch Size
+
+(% vs original)
+
+File-by-File Patch Size (% vs original)
+
+
+71.1 MB
+
+13.4 MB (-81%)
+
+8.0 MB (-89%)
+
+
+32.7 MB
+
+17.5 MB (-46%)
+
+9.6 MB (-71%)
+
+
+17.8 MB
+
+7.6 MB (-57%)
+
+7.3 MB (-59%)
+
+
+18.9 MB
+
+17.2 MB (-9%)
+
+13.1 MB (-31%)
+
+
+52.4 MB
+
+19.1 MB (-64%)
+
+8.4 MB (-84%)
+
+
+16.2 MB
+
+7.7 MB (-52%)
+
+1.2 MB (-92%)
+
+
+
+
+
+
+Disclaimer: if you see different patch sizes when you press "update" +manually, that is because we are not currently using File-by-file for +interactive updates, only those done in the background. +

+Saving data and making our +users (& developers!) happy +

+

+These changes are designed to ensure our community of over a billion Android +users use as little data as possible for regular app updates. The best thing is +that as a developer you don't need to do anything. You get these reductions to +your update size for free! +

+ +

+If you'd like to know more about File-by-File patching, including the technical +details, head over to the Archive Patcher GitHub +project where you can find information, including the source code. Yes, +File-by-File patching is completely open-source! +

+

+As a developer if you're interested in reducing your APK size still further, +here are some general +tips on reducing APK size. +

+
+ +
+
+ +
+
+ + +
+
+ +
+ +
+
+ +Newer Post + + +Older Post + +Home +
+
+
+
+ +
+
+
+
+
+ +
+
+
+
+ +
+
+
+
+ +
+ +
+
+
+
+
+
+
+
+ +
+ +
+
+
+
+
+
+
+
+ + + + + + + + + + + + + + + + diff --git a/resources/tests/readability/lwn-1/expected.html b/resources/tests/readability/lwn-1/expected.html new file mode 100644 index 0000000..a1b0e35 --- /dev/null +++ b/resources/tests/readability/lwn-1/expected.html @@ -0,0 +1,662 @@ +
+ + +
+

A trademark battle in the Arduino community

+ +

The Arduino has been one of the biggest success stories of the open-hardware movement, but that success does not protect it from internal conflict. In recent months, two of the project's founders have come into conflict about the direction of future efforts—and that conflict has turned into a legal dispute about who owns the rights to the Arduino trademark.

+

The current fight is a battle between two companies that both bear the Arduino name: Arduino LLC and Arduino SRL. The disagreements that led to present state of affairs go back a bit further.

+

The Arduino project grew out of 2005-era course work taught at the Interaction Design Institute Ivrea (IDII) in Ivrea, Italy (using Processing, Wiring, and pre-existing microcontroller hardware). After the IDII program was discontinued, the open-hardware Arduino project as we know it was launched by Massimo Banzi, David Cuartielles, and David Mellis (who had worked together at IDII), with co-founders Tom Igoe and Gianluca Martino joining shortly afterward. The project released open hardware designs (including full schematics and design files) as well as the microcontroller software to run on the boards and the desktop IDE needed to program it.

+

Arduino LLC was incorporated in 2008 by Banzi, Cuartielles, Mellis, Igoe, and Martino. The company is registered in the United States, and it has continued to design the Arduino product line, develop the software, and run the Arduino community site. The hardware devices themselves, however, were manufactured by a separate company, "Smart Projects SRL," that was founded by Martino. "SRL" is essentially the Italian equivalent of "LLC"—Smart Projects was incorporated in Italy.

+

This division of responsibilities—with the main Arduino project handling everything except for board manufacturing—may seem like an odd one, but it is consistent with Arduino's marketing story. From its earliest days, the designs for the hardware have been freely available, and outside companies were allowed to make Arduino-compatible devices. The project has long run a certification +program for third-party manufacturers interested in using the "Arduino" branding, but allows (and arguably even encourages) informal software and firmware compatibility.

+

The Arduino branding was not formally registered as a trademark in the early days, however. Arduino LLC filed to register the US trademark in April 2009, and it was granted in 2011.

+

At this point, the exact events begin to be harder to verify, but the original group of founders reportedly had a difference of opinion about how to license out hardware production rights to other companies. Wired Italy reports that Martino and Smart Projects resisted the other four founders' plans to "internationalize" production—although it is not clear if that meant that Smart Projects disapproved of licensing out any official hardware manufacturing to other companies, or had some other concern. Heise Online adds that the conflict seemed to be about moving some production to China.

+

What is clear is that Smart Projects filed a petition with the US Patent and Trademark Office (USPTO) in October 2014 asking the USPTO to cancel Arduino LLC's trademark on "Arduino." Then, in November 2014, Smart Projects changed its company's name to Arduino SRL. Somewhere around that time, Martino sold off his ownership stake in Smart Projects SRL and new owner Federico Musto was named CEO.

+

Unsurprisingly, Arduino LLC did not care for the petition to the USPTO and, in January 2015, the company filed a trademark-infringement lawsuit against Arduino SRL. Confusing matters further, the re-branded Arduino SRL has set up its own web site using the domain name arduino.org, which duplicates most of the site features found on the original Arduino site (arduino.cc). That includes both a hardware store and software downloads.

+

Musto, the new CEO of the company now called Arduino SRL, has a bit of a history with Arduino as well. His other manufacturing business had collaborated with Arduino LLC on the design and production of the Arduino Yún, which has received some criticism for including proprietary components.

+

Hackaday has run a two-part series (in February and March) digging into the ins and outs of the dispute, including the suggestion that Arduino LLC's recent release of version 1.6.0 of the Arduino IDE was a move intended to block Arduino SRL from hijacking IDE development. Commenter Paul Stoffregen (who was the author of the Heise story above) noted that Arduino SRL recently created a fork of the Arduino IDE on GitHub.

+

Most recently, Banzi broke his silence about the dispute in a story published at MAKEzine. There, Banzi claims that Martino secretly filed a trademark application on "Arduino" in Italy in 2008 and told none of the other Arduino founders. He also details a series of unpleasant negotiations between the companies, including Smart Projects stopping the royalty payments it had long sent to Arduino LLC for manufacturing devices and re-branding its boards with the Arduino.org URL.

+

Users appear to be stuck in the middle. Banzi says that several retail outlets that claim to be selling "official" Arduino boards are actually paying Arduino SRL, not Arduino LLC, but it is quite difficult to determine which retailers are lined up on which side, since there are (typically) several levels of supplier involved. The two Arduino companies' web sites also disagree about the available hardware, with Arduino.org offering the new Arduino Zero model for sale today and Arduino.cc listing it as "Coming soon."

+

Furthermore, as Hackaday's March story explains, the recently-released Arduino.cc IDE now reports that boards manufactured by Arduino SRL are "uncertified." That warning does not prevent users from programming the other company's hardware, but it will no doubt confuse quite a few users who believe they possess genuine Arduino-manufactured devices.

+

The USPTO page for Arduino SRL's petition notes pre-trial disclosure dates have been set for August and October of 2015 (for Arduino SRL and Arduino LLC, respectively), which suggests that this debate is far from over. Of course, it is always disappointing to observe a falling out between project founders, particularly when the project in question has had such an impact on open-source software and open hardware.

+

One could argue that disputes of this sort are proof that even small projects started among friends need to take legal and intellectual-property issues (such as trademarks) seriously from the very beginning—perhaps Arduino and Smart Projects thought that an informal agreement was all that was necessary in the early days, after all.

+

But, perhaps, once a project becomes profitable, there is simply no way to predict what might happen. Arduino LLC would seem to have a strong case for continual and rigorous use of the "Arduino" trademark, which is the salient point in US trademark law. It could still be a while before the courts rule on either side of that question, however.

+

Comments (5 posted)

+ +

Mapping and data mining with QGIS 2.8

+

By Nathan Willis +
March 25, 2015

+

QGIS is a free-software geographic information system (GIS) tool; it provides a unified interface in which users can import, edit, and analyze geographic-oriented information, and it can produce output as varied as printable maps or map-based web services. The project recently made its first update to be designated a long-term release (LTR), and that release is both poised for high-end usage and friendly to newcomers alike.

+

The new release is version 2.8, which was unveiled on March 2. An official change +log is available on the QGIS site, while the release itself was announced primarily through blog posts (such as this +post by Anita Graser of the project's steering committee). Downloads are available for a variety of platforms, including packages for Ubuntu, Debian, Fedora, openSUSE, and several other distributions.

+

[QGIS main interface]

+

As the name might suggest, QGIS is a Qt application; the latest release will, in fact, build on both Qt4 and Qt5, although the binaries released by the project come only in Qt4 form at present. 2.8 has been labeled a long-term release (LTR)—which, in this case, means that the project has committed to providing backported bug fixes for one full calendar year, and that the 2.8.x series is in permanent feature freeze. The goal, according to the change log, is to provide a stable version suitable for businesses and deployments in other large organizations. The change log itself points out that the development of quite a few new features was underwritten by various GIS companies or university groups, which suggests that taking care of these organizations' needs is reaping dividends for the project.

+

For those new to QGIS (or GIS in general), there is a detailed new-user tutorial that provides a thorough walk-through of the data-manipulation, mapping, and analysis functions. Being a new user, I went through the tutorial; although there are a handful of minor differences between QGIS 2.8 and the version used in the text (primarily whether specific features were accessed through a toolbar or right-click menu), on the whole it is well worth the time.

+

QGIS is designed to make short work of importing spatially oriented data sets, mining information from them, and turning the results into a meaningful visualization. Technically speaking, the visualization output is optional: one could simply extract the needed statistics and results and use them to answer some question or, perhaps, publish the massaged data set as a database for others to use.

+

But well-made maps are often the easiest way to illuminate facts about populations, political regions, geography, and many other topics when human comprehension is the goal. QGIS makes importing data from databases, web-mapping services (WMS), and even unwieldy flat-file data dumps a painless experience. It handles converting between a variety of map-referencing systems more or less automatically, and allows the user to focus on finding the useful attributes of the data sets and rendering them on screen.

+

Here be data

+

The significant changes in QGIS 2.8 fall into several categories. There are updates to how QGIS handles the mathematical expressions and queries users can use to filter information out of a data set, improvements to the tools used to explore the on-screen map canvas, and enhancements to the "map composer" used to produce visual output. This is on top of plenty of other under-the-hood improvements, naturally.

+

[QGIS query builder]

+

In the first category are several updates to the filtering tools used to mine a data set. Generally speaking, each independent data set is added to a QGIS project as its own layer, then transformed with filters to focus in on a specific portion of the original data. For instance, the land-usage statistics for a region might be one layer, while roads and buildings for the same region from OpenStreetMap might be two additional layers. Such filters can be created in several ways: there is a "query builder" that lets the user construct and test expressions on a data layer, then save the results, an SQL console for performing similar queries on a database, and spreadsheet-like editing tools for working directly on data tables.

+

All three have been improved in this release. New are support for if(condition, true, false) conditional statements, a set of operations for geometry primitives (e.g., to test whether regions overlap or lines intersect), and an "integer divide" operation. Users can also add comments to their queries to annotate their code, and there is a new custom +function editor for writing Python functions that can be called in mathematical expressions within the query builder.

+

It is also now possible to select only some rows in a table, then perform calculations just on the selection—previously, users would have to extract the rows of interest into a new table first. Similarly, in the SQL editor, the user can highlight a subset of the SQL query and execute it separately, which is no doubt helpful for debugging.

+

There have also been several improvements to the Python and Processing plugins. Users can now drag-and-drop Python scripts onto QGIS and they will be run automatically. Several new analysis algorithms are now available through the Processing interface that were previously Python-only; they include algorithms for generating grids of points or vectors within a region, splitting layers and lines, generating hypsometric +curves, refactoring data sets, and more.

+

Maps in, maps out

+

[QGIS simplify tool]

+

The process of working with on-screen map data picked up some improvements in the new release as well. Perhaps the most fundamental is that each map layer added to the canvas is now handled in its own thread, so fewer hangs in the user interface are experienced when re-rendering a layer (as happens whenever the user changes the look of points or shapes in a layer). Since remote databases can also be layers, this multi-threaded approach is more resilient against connectivity problems, too. The interface also now supports temporary "scratch" layers that can be used to merge, filter, or simply experiment with a data set, but are not saved when the current project is saved.

+

For working on the canvas itself, polygonal regions can now use raster images (tiled, if necessary) as fill colors, the map itself can be rotated arbitrarily, and objects can be "snapped" to align with items on any layer (not just the current layer). For working with raster image layers (e.g., aerial photographs) or simply creating new geometric shapes by hand, there is a new digitizing tool that can offer assistance by locking lines to specific angles, automatically keeping borders parallel, and other niceties.

+

There is a completely overhauled "simplify" tool that is used to reduce the number of extraneous vertices of a vector layer (thus reducing its size). The old simplify tool provided only a relative "tolerance" setting that did not correspond directly to any units. With the new tool, users can set a simplification threshold in terms of the underlying map units, layer-specific units, pixels, and more—and, in addition, the tool reports how much the simplify operation has reduced the size of the data.

+

[QGIS style editing]

+

There has also been an effort to present a uniform interface to one of the most important features of the map canvas: the ability to change the symbology used for an item based on some data attribute. The simplest example might be to change the line color of a road based on whether its road-type attribute is "highway," "service road," "residential," or so on. But the same feature is used to automatically highlight layer information based on the filtering and querying functionality discussed above. The new release allows many more map attributes to be controlled by these "data definition" settings, and provides a hard-to-miss button next to each attribute, through which a custom data definition can be set.

+

QGIS's composer module is the tool used to take project data and generate a map that can be used outside of the application (in print, as a static image, or as a layer for MapServer or some other software tool, for example). Consequently, it is not a simple select-and-click-export tool; composing the output can involve a lot of choices about which data to make visible, how (and where) to label it, and how to make it generally accessible.

+

The updated composer in 2.8 now has a full-screen mode and sports several new options for configuring output. For instance, the user now has full control over how map axes are labeled. In previous releases, the grid coordinates of the map could be turned on or off, but the only options were all or nothing. Now, the user can individually choose whether coordinates are displayed on all four sides, and can even choose in which direction vertical text labels will run (so that they can be correctly justified to the edge of the map, for example).

+

There are, as usual, many more changes than there is room to discuss. Some particularly noteworthy improvements include the ability to save and load bookmarks for frequently used data sources (perhaps most useful for databases, web services, and other non-local data) and improvements to QGIS's server module. This module allows one QGIS instance to serve up data accessible to other QGIS applications (for example, to simply team projects). The server can now be extended with Python plugins and the data layers that it serves can be styled with style rules like those used in the desktop interface.

+

QGIS is one of those rare free-software applications that is both powerful enough for high-end work and yet also straightforward to use for the simple tasks that might attract a newcomer to GIS in the first place. The 2.8 release, particularly with its project-wide commitment to long-term support, appears to be an update well worth checking out, whether one needs to create a simple, custom map or to mine a database for obscure geo-referenced meaning.

+

Comments (3 posted)

+ +

Development activity in LibreOffice and OpenOffice

+

By Jonathan Corbet +
March 25, 2015

+

The LibreOffice project was announced with great fanfare in September 2010. Nearly one year later, the OpenOffice.org project (from which LibreOffice was forked) was +cut loose from Oracle and found a new home as an Apache project. It is fair to say that the rivalry between the two projects in the time since then has been strong. Predictions that one project or the other would fail have not been borne out, but that does not mean that the two projects are equally successful. A look at the two projects' development communities reveals some interesting differences. +

+

Release histories

+

Apache OpenOffice has made two releases in the past year: 4.1 in April 2014 and 4.1.1 (described as "a micro update" in the release announcement) in August. The main feature added during that time would appear to be significantly improved accessibility support.

+

The release history for LibreOffice tells a slightly different story:

+ +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ReleaseDate
4.2.3April 2014
4.1.6April 2014
4.2.4May 2014
4.2.5June 2014
4.3July 2014
4.2.6August 2014
4.3.1August 2014
4.3.2September 2014
4.2.7/4.3.3October 2014
4.3.4November 2014
4.2.8December 2014
4.3.5December 2014
4.4January 2015
4.3.6February 2015
4.4.1February 2015
+
+

It seems clear that LibreOffice has maintained a rather more frenetic release cadence, generally putting out at least one release per month. The project typically keeps at least two major versions alive at any one time. Most of the releases are of the minor, bug-fix variety, but there have been two major releases in the last year as well.

+ +

Development statistics

+

In the one-year period since late March 2014, there have been 381 changesets committed to the OpenOffice Subversion repository. The most active committers are:

+ +
+ + + + + + + + + + +
Most active OpenOffice developers
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
By changesets
Herbert Dürr6316.6%
Jürgen Schmidt             5614.7%
Armin Le Grand5614.7%
Oliver-Rainer Wittmann4612.1%
Tsutomu Uchino338.7%
Kay Schenk277.1%
Pedro Giffuni236.1%
Ariel Constenla-Haile225.8%
Andrea Pescetti143.7%
Steve Yin112.9%
Andre Fischer102.6%
Yuri Dario71.8%
Regina Henschel61.6%
Juan C. Sanz20.5%
Clarence Guo20.5%
Tal Daniel20.5%
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
By changed lines
Jürgen Schmidt             45549988.1%
Andre Fischer261483.8%
Pedro Giffuni231833.4%
Armin Le Grand110181.6%
Juan C. Sanz45820.7%
Oliver-Rainer Wittmann43090.6%
Andrea Pescetti39080.6%
Herbert Dürr28110.4%
Tsutomu Uchino19910.3%
Ariel Constenla-Haile12580.2%
Steve Yin10100.1%
Kay Schenk6160.1%
Regina Henschel4170.1%
Yuri Dario2680.0%
tal160.0%
Clarence Guo110.0%
+
+
+

In truth, the above list is not just the most active OpenOffice developers — it is all of them; a total of 16 developers have committed changes to OpenOffice in the last year. Those developers changed 528,000 lines of code, but, as can be seen above, Jürgen Schmidt accounted for the bulk of those changes, which were mostly updates to translation files.

+

The top four developers in the "by changesets" column all work for IBM, so IBM is responsible for a minimum of about 60% of the changes to OpenOffice in the last year.

+

The picture for LibreOffice is just a little bit different; in the same one-year period, the project has committed 22,134 changesets from 268 developers. The most active of these developers were:

+ +
+ + + + + + + + + + +
Most active LibreOffice developers
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
By changesets
Caolán McNamara430719.5%
Stephan Bergmann235110.6%
Miklos Vajna14496.5%
Tor Lillqvist11595.2%
Noel Grandin10644.8%
Markus Mohrhard9354.2%
Michael Stahl9154.1%
Kohei Yoshida7553.4%
Tomaž Vajngerl6583.0%
Thomas Arnhold6192.8%
Jan Holesovsky4662.1%
Eike Rathke4572.1%
Matteo Casalin4422.0%
Bjoern Michaelsen4211.9%
Chris Sherlock3961.8%
David Tardon3861.7%
Julien Nabet3621.6%
Zolnai Tamás3381.5%
Matúš Kukan2561.2%
Robert Antoni Buj Gelonch2311.0%
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
By changed lines
Lionel Elie Mamane24406212.5%
Noel Grandin23871112.2%
Stephan Bergmann1612208.3%
Miklos Vajna1293256.6%
Caolán McNamara975445.0%
Tomaž Vajngerl694043.6%
Tor Lillqvist594983.1%
Laurent Balland-Poirier528022.7%
Markus Mohrhard505092.6%
Kohei Yoshida455142.3%
Chris Sherlock367881.9%
Peter Foley343051.8%
Christian Lohmaier337871.7%
Thomas Arnhold327221.7%
David Tardon216811.1%
David Ostrovsky216201.1%
Jan Holesovsky207921.1%
Valentin Kettner205261.1%
Robert Antoni Buj Gelonch204471.0%
Michael Stahl182160.9%
+
+
+

To a first approximation, the top ten companies supporting LibreOffice in the last year are:

+ +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Companies supporting LibreOffice development
(by changesets)
Red Hat841738.0%
Collabora Multimedia +653129.5%
(Unknown)512623.2%
(None)14906.7%
Canonical4221.9%
Igalia S.L.800.4%
Ericsson210.1%
Yandex180.1%
FastMail.FM170.1%
SUSE70.0%
+
+

Development work on LibreOffice is thus concentrated in a small number of companies, though it is rather more spread out than OpenOffice development. It is worth noting that the LibreOffice developers with unknown affiliation, who contributed 23% of the changes, make up 82% of the developer base, so there would appear to be a substantial community of developers contributing from outside the above-listed companies.

+ +

Some conclusions

+

Last October, some concerns were raised on the OpenOffice list about the health of that project's community. At the time, Rob Weir shrugged them off as the result of a marketing effort by the LibreOffice crowd. There can be no doubt that the war of words between these two projects has gotten tiresome at times, but, looking at the above numbers, it is hard not to conclude that there is an issue that goes beyond marketing hype here.

+

In the 4½ years since its founding, the LibreOffice project has put together a community with over 250 active developers. There is support from multiple companies and an impressive rate of patches going into the project's repository. The project's ability to sustain nearly monthly releases on two branches is a direct result of that community's work. Swearing at LibreOffice is one of your editor's favorite pastimes, but it seems clear that the project is on a solid footing with a healthy community.

+

OpenOffice, instead, is driven by four developers from a single company — a company that appears to have been deemphasizing OpenOffice work for some time. As a result, the project's commit rate is a fraction of what LibreOffice is able to sustain and releases are relatively rare. As of this writing, the OpenOffice +blog shows no posts in 2015. In the October discussion, Rob said that "the dogs may +bark but the caravan moves on." That may be true, but, in this case, the caravan does not appear to be moving with any great speed.

+

Anything can happen in the free-software development world; it is entirely possible that a reinvigorated OpenOffice.org may yet give LibreOffice a run for its money. But something will clearly have to change to bring that future around. As things stand now, it is hard not to conclude that LibreOffice has won the battle for developer participation.

+

Comments (74 posted)

+ +

Page editor: Jonathan Corbet +

+

Inside this week's LWN.net Weekly Edition

+
    +
  • Security: Toward secure package downloads; New vulnerabilities in drupal, mozilla, openssl, python-django ...
  • +
  • Kernel: LSFMM coverage: NFS, defragmentation, epoll(), copy offload, and more.
  • +
  • Distributions: A look at Debian's 2015 DPL candidates; Debian, Fedora, ...
  • +
  • Development: A look at GlusterFS; LibreOffice Online; Open sourcing existing code; Secure Boot in Windows 10; ...
  • +
  • Announcements: A Turing award for Michael Stonebraker, Sébastien Jodogne, ReGlue are Free Software Award winners, Kat Walsh joins FSF board of directors, Cyanogen, ...
  • +

Next page: Security>> +

+
+ + + + + + +
diff --git a/resources/tests/readability/lwn-1/source.html b/resources/tests/readability/lwn-1/source.html new file mode 100644 index 0000000..85eea11 --- /dev/null +++ b/resources/tests/readability/lwn-1/source.html @@ -0,0 +1,820 @@ + + + + + LWN.net Weekly Edition for March 26, 2015 [LWN.net] + + + + + + + + + + + + + + + + + + +
+
+ LWN.net Logo +
+

+ + +

+

+ + + + +
+ + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + +
+
+
+
+
+

LWN.net Weekly Edition for March 26, 2015

+
+

A trademark battle in the Arduino community

+
By Nathan Willis +
March 25, 2015
+

The Arduino has been one of the biggest success stories of the open-hardware movement, but that success does not protect it from internal conflict. In recent months, two of the project's founders have come into conflict about the direction of future efforts—and that conflict has turned into a legal dispute about who owns the rights to the Arduino trademark.

+

The current fight is a battle between two companies that both bear the Arduino name: Arduino LLC and Arduino SRL. The disagreements that led to present state of affairs go back a bit further.

+

The Arduino project grew out of 2005-era course work taught at the Interaction Design Institute Ivrea (IDII) in Ivrea, Italy (using Processing, Wiring, and pre-existing microcontroller hardware). After the IDII program was discontinued, the open-hardware Arduino project as we know it was launched by Massimo Banzi, David Cuartielles, and David Mellis (who had worked together at IDII), with co-founders Tom Igoe and Gianluca Martino joining shortly afterward. The project released open hardware designs (including full schematics and design files) as well as the microcontroller software to run on the boards and the desktop IDE needed to program it.

+

Arduino LLC was incorporated in 2008 by Banzi, Cuartielles, Mellis, Igoe, and Martino. The company is registered in the United States, and it has continued to design the Arduino product line, develop the software, and run the Arduino community site. The hardware devices themselves, however, were manufactured by a separate company, "Smart Projects SRL," that was founded by Martino. "SRL" is essentially the Italian equivalent of "LLC"—Smart Projects was incorporated in Italy.

+

This division of responsibilities—with the main Arduino project handling everything except for board manufacturing—may seem like an odd one, but it is consistent with Arduino's marketing story. From its earliest days, the designs for the hardware have been freely available, and outside companies were allowed to make Arduino-compatible devices. The project has long run a certification +program for third-party manufacturers interested in using the "Arduino" branding, but allows (and arguably even encourages) informal software and firmware compatibility.

+

The Arduino branding was not formally registered as a trademark in the early days, however. Arduino LLC filed to register the US trademark in April 2009, and it was granted in 2011.

+

At this point, the exact events begin to be harder to verify, but the original group of founders reportedly had a difference of opinion about how to license out hardware production rights to other companies. Wired Italy reports that Martino and Smart Projects resisted the other four founders' plans to "internationalize" production—although it is not clear if that meant that Smart Projects disapproved of licensing out any official hardware manufacturing to other companies, or had some other concern. Heise Online adds that the conflict seemed to be about moving some production to China.

+

What is clear is that Smart Projects filed a petition with the US Patent and Trademark Office (USPTO) in October 2014 asking the USPTO to cancel Arduino LLC's trademark on "Arduino." Then, in November 2014, Smart Projects changed its company's name to Arduino SRL. Somewhere around that time, Martino sold off his ownership stake in Smart Projects SRL and new owner Federico Musto was named CEO.

+

Unsurprisingly, Arduino LLC did not care for the petition to the USPTO and, in January 2015, the company filed a trademark-infringement lawsuit against Arduino SRL. Confusing matters further, the re-branded Arduino SRL has set up its own web site using the domain name arduino.org, which duplicates most of the site features found on the original Arduino site (arduino.cc). That includes both a hardware store and software downloads.

+

Musto, the new CEO of the company now called Arduino SRL, has a bit of a history with Arduino as well. His other manufacturing business had collaborated with Arduino LLC on the design and production of the Arduino Yún, which has received some criticism for including proprietary components.

+

Hackaday has run a two-part series (in February and March) digging into the ins and outs of the dispute, including the suggestion that Arduino LLC's recent release of version 1.6.0 of the Arduino IDE was a move intended to block Arduino SRL from hijacking IDE development. Commenter Paul Stoffregen (who was the author of the Heise story above) noted that Arduino SRL recently created a fork of the Arduino IDE on GitHub.

+

Most recently, Banzi broke his silence about the dispute in a story published at MAKEzine. There, Banzi claims that Martino secretly filed a trademark application on "Arduino" in Italy in 2008 and told none of the other Arduino founders. He also details a series of unpleasant negotiations between the companies, including Smart Projects stopping the royalty payments it had long sent to Arduino LLC for manufacturing devices and re-branding its boards with the Arduino.org URL.

+

Users appear to be stuck in the middle. Banzi says that several retail outlets that claim to be selling "official" Arduino boards are actually paying Arduino SRL, not Arduino LLC, but it is quite difficult to determine which retailers are lined up on which side, since there are (typically) several levels of supplier involved. The two Arduino companies' web sites also disagree about the available hardware, with Arduino.org offering the new Arduino Zero model for sale today and Arduino.cc listing it as "Coming soon."

+

Furthermore, as Hackaday's March story explains, the recently-released Arduino.cc IDE now reports that boards manufactured by Arduino SRL are "uncertified." That warning does not prevent users from programming the other company's hardware, but it will no doubt confuse quite a few users who believe they possess genuine Arduino-manufactured devices.

+

The USPTO page for Arduino SRL's petition notes pre-trial disclosure dates have been set for August and October of 2015 (for Arduino SRL and Arduino LLC, respectively), which suggests that this debate is far from over. Of course, it is always disappointing to observe a falling out between project founders, particularly when the project in question has had such an impact on open-source software and open hardware.

+

One could argue that disputes of this sort are proof that even small projects started among friends need to take legal and intellectual-property issues (such as trademarks) seriously from the very beginning—perhaps Arduino and Smart Projects thought that an informal agreement was all that was necessary in the early days, after all.

+

But, perhaps, once a project becomes profitable, there is simply no way to predict what might happen. Arduino LLC would seem to have a strong case for continual and rigorous use of the "Arduino" trademark, which is the salient point in US trademark law. It could still be a while before the courts rule on either side of that question, however.

+

Comments (5 posted)

+

+

Mapping and data mining with QGIS 2.8

+
By Nathan Willis +
March 25, 2015
+

QGIS is a free-software geographic information system (GIS) tool; it provides a unified interface in which users can import, edit, and analyze geographic-oriented information, and it can produce output as varied as printable maps or map-based web services. The project recently made its first update to be designated a long-term release (LTR), and that release is both poised for high-end usage and friendly to newcomers alike.

+

The new release is version 2.8, which was unveiled on March 2. An official change +log is available on the QGIS site, while the release itself was announced primarily through blog posts (such as this +post by Anita Graser of the project's steering committee). Downloads are available for a variety of platforms, including packages for Ubuntu, Debian, Fedora, openSUSE, and several other distributions.

+ [QGIS main interface] +

As the name might suggest, QGIS is a Qt application; the latest release will, in fact, build on both Qt4 and Qt5, although the binaries released by the project come only in Qt4 form at present. 2.8 has been labeled a long-term release (LTR)—which, in this case, means that the project has committed to providing backported bug fixes for one full calendar year, and that the 2.8.x series is in permanent feature freeze. The goal, according to the change log, is to provide a stable version suitable for businesses and deployments in other large organizations. The change log itself points out that the development of quite a few new features was underwritten by various GIS companies or university groups, which suggests that taking care of these organizations' needs is reaping dividends for the project.

+

For those new to QGIS (or GIS in general), there is a detailed new-user tutorial that provides a thorough walk-through of the data-manipulation, mapping, and analysis functions. Being a new user, I went through the tutorial; although there are a handful of minor differences between QGIS 2.8 and the version used in the text (primarily whether specific features were accessed through a toolbar or right-click menu), on the whole it is well worth the time.

+

QGIS is designed to make short work of importing spatially oriented data sets, mining information from them, and turning the results into a meaningful visualization. Technically speaking, the visualization output is optional: one could simply extract the needed statistics and results and use them to answer some question or, perhaps, publish the massaged data set as a database for others to use.

+

But well-made maps are often the easiest way to illuminate facts about populations, political regions, geography, and many other topics when human comprehension is the goal. QGIS makes importing data from databases, web-mapping services (WMS), and even unwieldy flat-file data dumps a painless experience. It handles converting between a variety of map-referencing systems more or less automatically, and allows the user to focus on finding the useful attributes of the data sets and rendering them on screen.

+

Here be data

+

The significant changes in QGIS 2.8 fall into several categories. There are updates to how QGIS handles the mathematical expressions and queries users can use to filter information out of a data set, improvements to the tools used to explore the on-screen map canvas, and enhancements to the "map composer" used to produce visual output. This is on top of plenty of other under-the-hood improvements, naturally.

+ [QGIS query builder] +

In the first category are several updates to the filtering tools used to mine a data set. Generally speaking, each independent data set is added to a QGIS project as its own layer, then transformed with filters to focus in on a specific portion of the original data. For instance, the land-usage statistics for a region might be one layer, while roads and buildings for the same region from OpenStreetMap might be two additional layers. Such filters can be created in several ways: there is a "query builder" that lets the user construct and test expressions on a data layer, then save the results, an SQL console for performing similar queries on a database, and spreadsheet-like editing tools for working directly on data tables.

+

All three have been improved in this release. New are support for if(condition, true, false) conditional statements, a set of operations for geometry primitives (e.g., to test whether regions overlap or lines intersect), and an "integer divide" operation. Users can also add comments to their queries to annotate their code, and there is a new custom +function editor for writing Python functions that can be called in mathematical expressions within the query builder.

+

It is also now possible to select only some rows in a table, then perform calculations just on the selection—previously, users would have to extract the rows of interest into a new table first. Similarly, in the SQL editor, the user can highlight a subset of the SQL query and execute it separately, which is no doubt helpful for debugging.

+

There have also been several improvements to the Python and Processing plugins. Users can now drag-and-drop Python scripts onto QGIS and they will be run automatically. Several new analysis algorithms are now available through the Processing interface that were previously Python-only; they include algorithms for generating grids of points or vectors within a region, splitting layers and lines, generating hypsometric +curves, refactoring data sets, and more.

+

Maps in, maps out

+ [QGIS simplify tool] +

The process of working with on-screen map data picked up some improvements in the new release as well. Perhaps the most fundamental is that each map layer added to the canvas is now handled in its own thread, so fewer hangs in the user interface are experienced when re-rendering a layer (as happens whenever the user changes the look of points or shapes in a layer). Since remote databases can also be layers, this multi-threaded approach is more resilient against connectivity problems, too. The interface also now supports temporary "scratch" layers that can be used to merge, filter, or simply experiment with a data set, but are not saved when the current project is saved.

+

For working on the canvas itself, polygonal regions can now use raster images (tiled, if necessary) as fill colors, the map itself can be rotated arbitrarily, and objects can be "snapped" to align with items on any layer (not just the current layer). For working with raster image layers (e.g., aerial photographs) or simply creating new geometric shapes by hand, there is a new digitizing tool that can offer assistance by locking lines to specific angles, automatically keeping borders parallel, and other niceties.

+

There is a completely overhauled "simplify" tool that is used to reduce the number of extraneous vertices of a vector layer (thus reducing its size). The old simplify tool provided only a relative "tolerance" setting that did not correspond directly to any units. With the new tool, users can set a simplification threshold in terms of the underlying map units, layer-specific units, pixels, and more—and, in addition, the tool reports how much the simplify operation has reduced the size of the data.

+ [QGIS style editing] +

There has also been an effort to present a uniform interface to one of the most important features of the map canvas: the ability to change the symbology used for an item based on some data attribute. The simplest example might be to change the line color of a road based on whether its road-type attribute is "highway," "service road," "residential," or so on. But the same feature is used to automatically highlight layer information based on the filtering and querying functionality discussed above. The new release allows many more map attributes to be controlled by these "data definition" settings, and provides a hard-to-miss button next to each attribute, through which a custom data definition can be set.

+

QGIS's composer module is the tool used to take project data and generate a map that can be used outside of the application (in print, as a static image, or as a layer for MapServer or some other software tool, for example). Consequently, it is not a simple select-and-click-export tool; composing the output can involve a lot of choices about which data to make visible, how (and where) to label it, and how to make it generally accessible.

+

The updated composer in 2.8 now has a full-screen mode and sports several new options for configuring output. For instance, the user now has full control over how map axes are labeled. In previous releases, the grid coordinates of the map could be turned on or off, but the only options were all or nothing. Now, the user can individually choose whether coordinates are displayed on all four sides, and can even choose in which direction vertical text labels will run (so that they can be correctly justified to the edge of the map, for example).

+

There are, as usual, many more changes than there is room to discuss. Some particularly noteworthy improvements include the ability to save and load bookmarks for frequently used data sources (perhaps most useful for databases, web services, and other non-local data) and improvements to QGIS's server module. This module allows one QGIS instance to serve up data accessible to other QGIS applications (for example, to simply team projects). The server can now be extended with Python plugins and the data layers that it serves can be styled with style rules like those used in the desktop interface.

+

QGIS is one of those rare free-software applications that is both powerful enough for high-end work and yet also straightforward to use for the simple tasks that might attract a newcomer to GIS in the first place. The 2.8 release, particularly with its project-wide commitment to long-term support, appears to be an update well worth checking out, whether one needs to create a simple, custom map or to mine a database for obscure geo-referenced meaning.

+

Comments (3 posted)

+

+

Development activity in LibreOffice and OpenOffice

+
By Jonathan Corbet +
March 25, 2015
The LibreOffice project was announced with great fanfare in September 2010. Nearly one year later, the OpenOffice.org project (from which LibreOffice was forked) was +cut loose from Oracle and found a new home as an Apache project. It is fair to say that the rivalry between the two projects in the time since then has been strong. Predictions that one project or the other would fail have not been borne out, but that does not mean that the two projects are equally successful. A look at the two projects' development communities reveals some interesting differences. +

+

Release histories

+

Apache OpenOffice has made two releases in the past year: 4.1 in April 2014 and 4.1.1 (described as "a micro update" in the release announcement) in August. The main feature added during that time would appear to be significantly improved accessibility support.

+

The release history for LibreOffice tells a slightly different story:

+

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ReleaseDate
4.2.3April 2014
4.1.6April 2014
4.2.4May 2014
4.2.5June 2014
4.3July 2014
4.2.6August 2014
4.3.1August 2014
4.3.2September 2014
4.2.7/4.3.3October 2014
4.3.4November 2014
4.2.8December 2014
4.3.5December 2014
4.4January 2015
4.3.6February 2015
4.4.1February 2015
+
+

It seems clear that LibreOffice has maintained a rather more frenetic release cadence, generally putting out at least one release per month. The project typically keeps at least two major versions alive at any one time. Most of the releases are of the minor, bug-fix variety, but there have been two major releases in the last year as well.

+

+

Development statistics

+

In the one-year period since late March 2014, there have been 381 changesets committed to the OpenOffice Subversion repository. The most active committers are:

+

+
+ + + + + + + + + + +
Most active OpenOffice developers
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
By changesets
Herbert Dürr6316.6%
Jürgen Schmidt             5614.7%
Armin Le Grand5614.7%
Oliver-Rainer Wittmann4612.1%
Tsutomu Uchino338.7%
Kay Schenk277.1%
Pedro Giffuni236.1%
Ariel Constenla-Haile225.8%
Andrea Pescetti143.7%
Steve Yin112.9%
Andre Fischer102.6%
Yuri Dario71.8%
Regina Henschel61.6%
Juan C. Sanz20.5%
Clarence Guo20.5%
Tal Daniel20.5%
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
By changed lines
Jürgen Schmidt             45549988.1%
Andre Fischer261483.8%
Pedro Giffuni231833.4%
Armin Le Grand110181.6%
Juan C. Sanz45820.7%
Oliver-Rainer Wittmann43090.6%
Andrea Pescetti39080.6%
Herbert Dürr28110.4%
Tsutomu Uchino19910.3%
Ariel Constenla-Haile12580.2%
Steve Yin10100.1%
Kay Schenk6160.1%
Regina Henschel4170.1%
Yuri Dario2680.0%
tal160.0%
Clarence Guo110.0%
+
+
+

In truth, the above list is not just the most active OpenOffice developers — it is all of them; a total of 16 developers have committed changes to OpenOffice in the last year. Those developers changed 528,000 lines of code, but, as can be seen above, Jürgen Schmidt accounted for the bulk of those changes, which were mostly updates to translation files.

+

The top four developers in the "by changesets" column all work for IBM, so IBM is responsible for a minimum of about 60% of the changes to OpenOffice in the last year.

+

The picture for LibreOffice is just a little bit different; in the same one-year period, the project has committed 22,134 changesets from 268 developers. The most active of these developers were:

+

+
+ + + + + + + + + + +
Most active LibreOffice developers
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
By changesets
Caolán McNamara430719.5%
Stephan Bergmann235110.6%
Miklos Vajna14496.5%
Tor Lillqvist11595.2%
Noel Grandin10644.8%
Markus Mohrhard9354.2%
Michael Stahl9154.1%
Kohei Yoshida7553.4%
Tomaž Vajngerl6583.0%
Thomas Arnhold6192.8%
Jan Holesovsky4662.1%
Eike Rathke4572.1%
Matteo Casalin4422.0%
Bjoern Michaelsen4211.9%
Chris Sherlock3961.8%
David Tardon3861.7%
Julien Nabet3621.6%
Zolnai Tamás3381.5%
Matúš Kukan2561.2%
Robert Antoni Buj Gelonch2311.0%
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
By changed lines
Lionel Elie Mamane24406212.5%
Noel Grandin23871112.2%
Stephan Bergmann1612208.3%
Miklos Vajna1293256.6%
Caolán McNamara975445.0%
Tomaž Vajngerl694043.6%
Tor Lillqvist594983.1%
Laurent Balland-Poirier528022.7%
Markus Mohrhard505092.6%
Kohei Yoshida455142.3%
Chris Sherlock367881.9%
Peter Foley343051.8%
Christian Lohmaier337871.7%
Thomas Arnhold327221.7%
David Tardon216811.1%
David Ostrovsky216201.1%
Jan Holesovsky207921.1%
Valentin Kettner205261.1%
Robert Antoni Buj Gelonch204471.0%
Michael Stahl182160.9%
+
+
+

To a first approximation, the top ten companies supporting LibreOffice in the last year are:

+

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Companies supporting LibreOffice development
(by changesets)
Red Hat841738.0%
Collabora Multimedia653129.5%
(Unknown)512623.2%
(None)14906.7%
Canonical4221.9%
Igalia S.L.800.4%
Ericsson210.1%
Yandex180.1%
FastMail.FM170.1%
SUSE70.0%
+
+

Development work on LibreOffice is thus concentrated in a small number of companies, though it is rather more spread out than OpenOffice development. It is worth noting that the LibreOffice developers with unknown affiliation, who contributed 23% of the changes, make up 82% of the developer base, so there would appear to be a substantial community of developers contributing from outside the above-listed companies.

+

+

Some conclusions

+

Last October, some concerns were raised on the OpenOffice list about the health of that project's community. At the time, Rob Weir shrugged them off as the result of a marketing effort by the LibreOffice crowd. There can be no doubt that the war of words between these two projects has gotten tiresome at times, but, looking at the above numbers, it is hard not to conclude that there is an issue that goes beyond marketing hype here.

+

In the 4½ years since its founding, the LibreOffice project has put together a community with over 250 active developers. There is support from multiple companies and an impressive rate of patches going into the project's repository. The project's ability to sustain nearly monthly releases on two branches is a direct result of that community's work. Swearing at LibreOffice is one of your editor's favorite pastimes, but it seems clear that the project is on a solid footing with a healthy community.

+

OpenOffice, instead, is driven by four developers from a single company — a company that appears to have been deemphasizing OpenOffice work for some time. As a result, the project's commit rate is a fraction of what LibreOffice is able to sustain and releases are relatively rare. As of this writing, the OpenOffice +blog shows no posts in 2015. In the October discussion, Rob said that "the dogs may +bark but the caravan moves on." That may be true, but, in this case, the caravan does not appear to be moving with any great speed.

+

Anything can happen in the free-software development world; it is entirely possible that a reinvigorated OpenOffice.org may yet give LibreOffice a run for its money. But something will clearly have to change to bring that future around. As things stand now, it is hard not to conclude that LibreOffice has won the battle for developer participation.

+

Comments (74 posted)

+

+

Page editor: Jonathan Corbet +

+

Inside this week's LWN.net Weekly Edition

+
    +
  • Security: Toward secure package downloads; New vulnerabilities in drupal, mozilla, openssl, python-django ...
  • +
  • Kernel: LSFMM coverage: NFS, defragmentation, epoll(), copy offload, and more.
  • +
  • Distributions: A look at Debian's 2015 DPL candidates; Debian, Fedora, ...
  • +
  • Development: A look at GlusterFS; LibreOffice Online; Open sourcing existing code; Secure Boot in Windows 10; ...
  • +
  • Announcements: A Turing award for Michael Stonebraker, Sébastien Jodogne, ReGlue are Free Software Award winners, Kat Walsh joins FSF board of directors, Cyanogen, ...
  • +
Next page: Security>> +
+ +
+
+
+
+ +
+
+

+ Copyright © 2015, Eklektix, Inc.
+ + Comments and public postings are copyrighted by their creators.
+ Linux is a registered trademark of Linus Torvalds
+

+
+ + + \ No newline at end of file diff --git a/resources/tests/readability/medium-1/expected.html b/resources/tests/readability/medium-1/expected.html new file mode 100644 index 0000000..7331fd8 --- /dev/null +++ b/resources/tests/readability/medium-1/expected.html @@ -0,0 +1,344 @@ +
+
+ + +

Better Student Journalism

+ + + +

We pushed out the first version of the Open Journalism site in January. Our goal is for the + site to be a place to teach students what they should know about journalism + on the web. It should be fun too.

+

Topics like mapping, security, command + line tools, and open source are + all concepts that should be made more accessible, and should be easily + understood at a basic level by all journalists. We’re focusing on students + because we know student journalism well, and we believe that teaching maturing + journalists about the web will provide them with an important lens to view + the world with. This is how we got to where we are now.

+

Circa 2011

+

In late 2011 I sat in the design room of our university’s student newsroom + with some of the other editors: Kate Hudson, Brent Rose, and Nicholas Maronese. + I was working as the photo editor then—something I loved doing. I was very + happy travelling and photographing people while listening to their stories.

+

Photography was my lucky way of experiencing the many types of people + my generation seemed to avoid, as well as many the public spends too much + time discussing. One of my habits as a photographer was scouring sites + like Flickr to see how others could frame the world in ways I hadn’t previously + considered.

+
+
+ +

+

+
+
topleftpixel.com
+
+

I started discovering beautiful things the web could do with images: + things not possible with print. Just as every generation revolts against + walking in the previous generations shoes, I found myself questioning the + expectations that I came up against as a photo editor. In our newsroom + the expectations were built from an outdated information world. We were + expected to fill old shoes.

+

So we sat in our student newsroom—not very happy with what we were doing. + Our weekly newspaper had remained essentially unchanged for 40+ years. + Each editorial position had the same requirement every year. The big change + happened in the 80s when the paper started using colour. We’d also stumbled + into having a website, but it was updated just once a week with the release + of the newspaper.

+

Information had changed form, but the student newsroom hadn’t, and it + was becoming harder to romanticize the dusty newsprint smell coming from + the shoes we were handed down from previous generations of editors. It + was, we were told, all part of “becoming a journalist.”

+
+
+ +

+

+
+
+

We don’t know what we don’t know

+

We spent much of the rest of the school year asking “what should we be + doing in the newsroom?”, which mainly led us to ask “how do we use the + web to tell stories?” It was a straightforward question that led to many + more questions about the web: something we knew little about. Out in the + real world, traditional journalists were struggling to keep their jobs + in a dying print world. They wore the same design of shoes that we were + supposed to fill. Being pushed to repeat old, failing strategies and blocked + from trying something new scared us.

+

We had questions, so we started doing some research. We talked with student + newsrooms in Canada and the United States, and filled too many Google Doc + files with notes. Looking at the notes now, they scream of fear. We annotated + our notes with naive solutions, often involving scrambled and immature + odysseys into the future of online journalism.

+

There was a lot we didn’t know. We didn’t know how to build a mobile app. + We didn’t know if we should build a mobile app. + We didn’t know how to run a server. + We didn’t know where to go to find a server. + We didn’t know how the web worked. + We didn’t know how people used the web to read news. + We didn’t know what news should be on the web. + If news is just information, what does that even look like?

+

We asked these questions to many students at other papers to get a consensus + of what had worked and what hadn’t. They reported similar questions and + fears about the web but followed with “print advertising is keeping us + afloat so we can’t abandon it”.

+

In other words, we knew that we should be building a newer pair of shoes, + but we didn’t know what the function of the shoes should be.

+

Common problems in student newsrooms (2011)

+

Our questioning of other student journalists in 15 student newsrooms brought + up a few repeating issues.

+
    +
  • Lack of mentorship
  • +
  • A news process that lacked consideration of the web
  • +
  • No editor/position specific to the web
  • +
  • Little exposure to many of the cool projects being put together by professional + newsrooms
  • +
  • Lack of diverse skills within the newsroom. Writers made up 95% of the + personnel. Students with other skills were not sought because journalism + was seen as “a career with words.” The other 5% were designers, designing + words on computers, for print.
  • +
  • Not enough discussion between the business side and web efforts
  • +
+
+
+ +

+

+
+
From our 2011 research
+
+

Common problems in student newsrooms (2013)

+

Two years later, we went back and looked at what had changed. We talked + to a dozen more newsrooms and weren’t surprised by our findings.

+
    +
  • Still no mentorship or link to professional newsrooms building stories + for the web
  • +
  • Very little control of website and technology
  • +
  • The lack of exposure that student journalists have to interactive storytelling. + While some newsrooms are in touch with what’s happening with the web and + journalism, there still exists a huge gap between the student newsroom + and its professional counterpart
  • +
  • No time in the current news development cycle for student newsrooms to + experiment with the web
  • +
  • Lack of skill diversity (specifically coding, interaction design, and + statistics)
  • +
  • Overly restricted access to student website technology. Changes are primarily + visual rather than functional.
  • +
  • Significantly reduced print production of many papers
  • +
  • Computers aren’t set up for experimenting with software and code, and + often locked down
  • +
+

Newsrooms have traditionally been covered in copies of The New York Times + or Globe and Mail. Instead newsrooms should try spend at 20 minutes each + week going over the coolest/weirdest online storytelling in an effort to + expose each other to what is possible. “Hey, what has the New York Times R&D lab been up to this week?

+

Instead of having computers that are locked down, try setting aside a + few office computers that allow students to play and “break”, or encourage + editors to buy their own Macbooks so they’re always able to practice with + code and new tools on their own.

+

From all this we realized that changing a student newsroom is difficult. + It takes patience. It requires that the business and editorial departments + of the student newsroom be on the same (web)page. The shoes of the future + must be different from the shoes we were given.

+

We need to rethink how long the new shoe design will be valid. It’s more + important that we focus on the process behind making footwear than on actually + creating a specific shoe. We shouldn’t be building a shoe to last 40 years. + Our footwear design process will allow us to change and adapt as technology + evolves. The media landscape will change, so having a newsroom that can + change with it will be critical.

+

We are building a shoe machine, not a shoe. +

+ +

A train or light at the end of the tunnel: are student newsrooms changing for the better?

+ +

In our 2013 research we found that almost 50% of student newsrooms had + created roles specifically for the web. This sounds great, but is still problematic in its current state. +

+
+
+ +

+

+
+
We designed many of these slides to help explain to ourselves what we were doing +
+
+

When a newsroom decides to create a position for the web, it’s often with + the intent of having content flow steadily from writers onto the web. This + is a big improvement from just uploading stories to the web whenever there + is a print issue. However… +

+
    +
  1. +The handoff +
    Problems arise because web editors are given roles that absolve the rest + of the editors from thinking about the web. All editors should be involved + in the process of story development for the web. While it’s a good idea + to have one specific editor manage the website, contributors and editors + should all play with and learn about the web. Instead of “can you make + a computer do XYZ for me?”, we should be saying “can you show me how to + make a computer do XYZ?”
  2. +
  3. +Not just social media
    A + web editor could do much more than simply being in charge of the social + media accounts for the student paper. Their responsibility could include + teaching all other editors to be listening to what’s happening online. + The web editor can take advantage of live information to change how the + student newsroom reports news in real time.
  4. +
  5. +Web (interactive) editor
    The + goal of having a web editor should be for someone to build and tell stories + that take full advantage of the web as their medium. Too often the web’s + interactivity is not considered when developing the story. The web then + ends up as a resting place for print words.
  6. +
+

Editors at newsrooms are still figuring out how to convince writers of + the benefit to having their content online. There’s still a stronger draw + to writers seeing their name in print than on the web. Showing writers + that their stories can be told in new ways to larger audiences is a convincing + argument that the web is a starting point for telling a story, not its + graveyard.

+

When everyone in the newsroom approaches their website with the intention + of using it to explore the web as a medium, they all start to ask “what + is possible?” and “what can be done?” You can’t expect students to think + in terms of the web if it’s treated as a place for print words to hang + out on a web page.

+

We’re OK with this problem, if we see newsrooms continue to take small + steps towards having all their editors involved in the stories for the + web.

+
+
+ +

+

+
+
The current Open Journalism site was a few years in the making. This was + an original launch page we use in 2012
+
+

What we know

+
    +
  • +New process +
    Our rough research has told us newsrooms need to be reorganized. This + includes every part of the newsroom’s workflow: from where a story and + its information comes from, to thinking of every word, pixel, and interaction + the reader will have with your stories. If I was a photo editor that wanted + to re-think my process with digital tools in mind, I’d start by asking + “how are photo assignments processed and sent out?”, “how do we receive + images?”, “what formats do images need to be exported in?”, “what type + of screens will the images be viewed on?”, and “how are the designers getting + these images?” Making a student newsroom digital isn’t about producing + “digital manifestos”, it’s about being curious enough that you’ll want + to to continue experimenting with your process until you’ve found one that + fits your newsroom’s needs.
  • +
  • +More (remote) mentorship +
    Lack of mentorship is still a big problem. Google’s fellowship program is great. The fact that it + only caters to United States students isn’t. There are only a handful of + internships in Canada where students interested in journalism can get experience + writing code and building interactive stories. We’re OK with this for now, + as we expect internships and mentorship over the next 5 years between professional + newsrooms and student newsrooms will only increase. It’s worth noting that + some of that mentorship will likely be done remotely.
  • +
  • +Changing a newsroom culture +
    Skill diversity needs to change. We encourage every student newsroom we + talk to, to start building a partnership with their school’s Computer Science + department. It will take some work, but you’ll find there are many CS undergrads + that love playing with web technologies, and using data to tell stories. + Changing who is in the newsroom should be one of the first steps newsrooms + take to changing how they tell stories. The same goes with getting designers + who understand the wonderful interactive elements of the web and students + who love statistics and exploring data. Getting students who are amazing + at design, data, code, words, and images into one room is one of the coolest + experience I’ve had. Everyone benefits from a more diverse newsroom.
  • +
+

What we don’t know

+
    +
  • +Sharing curiosity for the web +
    We don’t know how to best teach students about the web. It’s not efficient + for us to teach coding classes. We do go into newsrooms and get them running + their first code exercises, but if someone wants to learn to program, we + can only provide the initial push and curiosity. We will be trying out + “labs” with a few schools next school year to hopefully get a better idea + of how to teach students about the web.
  • +
  • +Business +
    We don’t know how to convince the business side of student papers that + they should invest in the web. At the very least we’re able to explain + that having students graduate with their current skill set is painful in + the current job market.
  • +
  • +The future +
    We don’t know what journalism or the web will be like in 10 years, but + we can start encouraging students to keep an open mind about the skills + they’ll need. We’re less interested in preparing students for the current + newsroom climate, than we are in teaching students to have the ability + to learn new tools quickly as they come and go.
  • +
+
+
+

What we’re trying to share with others

+
    +
  • +A concise guide to building stories for the web +
    There are too many options to get started. We hope to provide an opinionated + guide that follows both our experiences, research, and observations from + trying to teach our peers.
  • +
+

Student newsrooms don’t have investors to please. Student newsrooms can + change their website every week if they want to try a new design or interaction. + As long as students start treating the web as a different medium, and start + building stories around that idea, then we’ll know we’re moving forward.

+

A note to professional news orgs

+

We’re also asking professional newsrooms to be more open about their process + of developing stories for the web. You play a big part in this. This means + writing about it, and sharing code. We need to start building a bridge + between student journalism and professional newsrooms.

+
+
+ +

+

+
+
2012
+
+

This is a start

+

We going to continue slowly growing the content on Open Journalism. We still consider this the beta version, + but expect to polish it, and beef up the content for a real launch at the + beginning of the summer.

+

We expect to have more original tutorials as well as the beginnings of + what a curriculum may look like that a student newsroom can adopt to start + guiding their transition to become a web first newsroom. We’re also going + to be working with the Queen’s Journal and + The Ubysseynext school year to better understand how to make the student + newsroom a place for experimenting with telling stories on the web. If + this sound like a good idea in your newsroom, we’re still looking to add + 1 more school.

+

We’re trying out some new shoes. And while they’re not self-lacing, and + smell a bit different, we feel lacing up a new pair of kicks can change + a lot.

+
+
+ +

+

+
+
+ +

Let’s talk. Let’s listen. +

+

We’re still in the early stages of what this project will look like, so if you want to help or have thoughts, let’s talk. +

+

pippin@pippinlee.com +

+ + +

This isn’t supposed to be a + manifesto™© + we just think it’s pretty cool to share what we’ve learned so far, and hope you’ll do the same. We’re all in this together. +

+
+
diff --git a/resources/tests/readability/medium-1/source.html b/resources/tests/readability/medium-1/source.html new file mode 100644 index 0000000..63be920 --- /dev/null +++ b/resources/tests/readability/medium-1/source.html @@ -0,0 +1,705 @@ + + + + + + + The Open Journalism Project: Better Student Journalism — Medium + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ +
+
+
+
+
Ready to publish?
+
Change the story’s title, subtitle, and visibility as needed
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+ + + + +
+
+
+
+
+ + + +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+

Open Journalism Project:

+

+
+

+

Better Student Journalism

+

+
+

+


+

+
+

+

We pushed out the first version of the Open Journalism site in January. Our goal is for the + site to be a place to teach students what they should know about journalism + on the web. It should be fun too.

+

Topics like mapping, security, command + line tools, and open source are + all concepts that should be made more accessible, and should be easily + understood at a basic level by all journalists. We’re focusing on students + because we know student journalism well, and we believe that teaching maturing + journalists about the web will provide them with an important lens to view + the world with. This is how we got to where we are now.

+

Circa 2011

+

In late 2011 I sat in the design room of our university’s student newsroom + with some of the other editors: Kate Hudson, Brent Rose, and Nicholas Maronese. + I was working as the photo editor then—something I loved doing. I was very + happy travelling and photographing people while listening to their stories.

+

Photography was my lucky way of experiencing the many types of people + my generation seemed to avoid, as well as many the public spends too much + time discussing. One of my habits as a photographer was scouring sites + like Flickr to see how others could frame the world in ways I hadn’t previously + considered.

+
+
+
+ +
+
topleftpixel.com
+
+

I started discovering beautiful things the web could do with images: + things not possible with print. Just as every generation revolts against + walking in the previous generations shoes, I found myself questioning the + expectations that I came up against as a photo editor. In our newsroom + the expectations were built from an outdated information world. We were + expected to fill old shoes.

+

So we sat in our student newsroom—not very happy with what we were doing. + Our weekly newspaper had remained essentially unchanged for 40+ years. + Each editorial position had the same requirement every year. The big change + happened in the 80s when the paper started using colour. We’d also stumbled + into having a website, but it was updated just once a week with the release + of the newspaper.

+

Information had changed form, but the student newsroom hadn’t, and it + was becoming harder to romanticize the dusty newsprint smell coming from + the shoes we were handed down from previous generations of editors. It + was, we were told, all part of “becoming a journalist.”

+
+
+
+ +
+
+

We don’t know what we don’t know

+

We spent much of the rest of the school year asking “what should we be + doing in the newsroom?”, which mainly led us to ask “how do we use the + web to tell stories?” It was a straightforward question that led to many + more questions about the web: something we knew little about. Out in the + real world, traditional journalists were struggling to keep their jobs + in a dying print world. They wore the same design of shoes that we were + supposed to fill. Being pushed to repeat old, failing strategies and blocked + from trying something new scared us.

+

We had questions, so we started doing some research. We talked with student + newsrooms in Canada and the United States, and filled too many Google Doc + files with notes. Looking at the notes now, they scream of fear. We annotated + our notes with naive solutions, often involving scrambled and immature + odysseys into the future of online journalism.

+

There was a lot we didn’t know. We didn’t know how to build a mobile app. + We didn’t know if we should build a mobile app. + We didn’t know how to run a server. + We didn’t know where to go to find a server. + We didn’t know how the web worked. + We didn’t know how people used the web to read news. + We didn’t know what news should be on the web. + If news is just information, what does that even look like?

+

We asked these questions to many students at other papers to get a consensus + of what had worked and what hadn’t. They reported similar questions and + fears about the web but followed with “print advertising is keeping us + afloat so we can’t abandon it”.

+

In other words, we knew that we should be building a newer pair of shoes, + but we didn’t know what the function of the shoes should be.

+

Common problems in student newsrooms (2011)

+

Our questioning of other student journalists in 15 student newsrooms brought + up a few repeating issues.

+
    +
  • Lack of mentorship
  • +
  • A news process that lacked consideration of the web
  • +
  • No editor/position specific to the web
  • +
  • Little exposure to many of the cool projects being put together by professional + newsrooms
  • +
  • Lack of diverse skills within the newsroom. Writers made up 95% of the + personnel. Students with other skills were not sought because journalism + was seen as “a career with words.” The other 5% were designers, designing + words on computers, for print.
  • +
  • Not enough discussion between the business side and web efforts
  • +
+
+
+
+ +
+
From our 2011 research
+
+

Common problems in student newsrooms (2013)

+

Two years later, we went back and looked at what had changed. We talked + to a dozen more newsrooms and weren’t surprised by our findings.

+
    +
  • Still no mentorship or link to professional newsrooms building stories + for the web
  • +
  • Very little control of website and technology
  • +
  • The lack of exposure that student journalists have to interactive storytelling. + While some newsrooms are in touch with what’s happening with the web and + journalism, there still exists a huge gap between the student newsroom + and its professional counterpart
  • +
  • No time in the current news development cycle for student newsrooms to + experiment with the web
  • +
  • Lack of skill diversity (specifically coding, interaction design, and + statistics)
  • +
  • Overly restricted access to student website technology. Changes are primarily + visual rather than functional.
  • +
  • Significantly reduced print production of many papers
  • +
  • Computers aren’t set up for experimenting with software and code, and + often locked down
  • +
+

Newsrooms have traditionally been covered in copies of The New York Times + or Globe and Mail. Instead newsrooms should try spend at 20 minutes each + week going over the coolest/weirdest online storytelling in an effort to + expose each other to what is possible. “Hey, what has the New York Times R&D lab been up to this week?

+

Instead of having computers that are locked down, try setting aside a + few office computers that allow students to play and “break”, or encourage + editors to buy their own Macbooks so they’re always able to practice with + code and new tools on their own.

+

From all this we realized that changing a student newsroom is difficult. + It takes patience. It requires that the business and editorial departments + of the student newsroom be on the same (web)page. The shoes of the future + must be different from the shoes we were given.

+

We need to rethink how long the new shoe design will be valid. It’s more + important that we focus on the process behind making footwear than on actually + creating a specific shoe. We shouldn’t be building a shoe to last 40 years. + Our footwear design process will allow us to change and adapt as technology + evolves. The media landscape will change, so having a newsroom that can + change with it will be critical.

+

We are building a shoe machine, not a shoe. +

+

+
+

+

A train or light at the end of the tunnel: are student newsrooms changing for the better?

+

+
+

+

In our 2013 research we found that almost 50% of student newsrooms had + created roles specifically for the web. This sounds great, but is still problematic in its current state. +

+
+
+
+ +
+
We designed many of these slides to help explain to ourselves what we were doing +
+
+

When a newsroom decides to create a position for the web, it’s often with + the intent of having content flow steadily from writers onto the web. This + is a big improvement from just uploading stories to the web whenever there + is a print issue. However… +

+
    +
  1. The handoff +
    Problems arise because web editors are given roles that absolve the rest + of the editors from thinking about the web. All editors should be involved + in the process of story development for the web. While it’s a good idea + to have one specific editor manage the website, contributors and editors + should all play with and learn about the web. Instead of “can you make + a computer do XYZ for me?”, we should be saying “can you show me how to + make a computer do XYZ?”
  2. +
  3. Not just social media
    A + web editor could do much more than simply being in charge of the social + media accounts for the student paper. Their responsibility could include + teaching all other editors to be listening to what’s happening online. + The web editor can take advantage of live information to change how the + student newsroom reports news in real time.
  4. +
  5. Web (interactive) editor
    The + goal of having a web editor should be for someone to build and tell stories + that take full advantage of the web as their medium. Too often the web’s + interactivity is not considered when developing the story. The web then + ends up as a resting place for print words.
  6. +
+

Editors at newsrooms are still figuring out how to convince writers of + the benefit to having their content online. There’s still a stronger draw + to writers seeing their name in print than on the web. Showing writers + that their stories can be told in new ways to larger audiences is a convincing + argument that the web is a starting point for telling a story, not its + graveyard.

+

When everyone in the newsroom approaches their website with the intention + of using it to explore the web as a medium, they all start to ask “what + is possible?” and “what can be done?” You can’t expect students to think + in terms of the web if it’s treated as a place for print words to hang + out on a web page.

+

We’re OK with this problem, if we see newsrooms continue to take small + steps towards having all their editors involved in the stories for the + web.

+
+
+
+ +
+
The current Open Journalism site was a few years in the making. This was + an original launch page we use in 2012
+
+

What we know

+
    +
  • New process +
    Our rough research has told us newsrooms need to be reorganized. This + includes every part of the newsroom’s workflow: from where a story and + its information comes from, to thinking of every word, pixel, and interaction + the reader will have with your stories. If I was a photo editor that wanted + to re-think my process with digital tools in mind, I’d start by asking + “how are photo assignments processed and sent out?”, “how do we receive + images?”, “what formats do images need to be exported in?”, “what type + of screens will the images be viewed on?”, and “how are the designers getting + these images?” Making a student newsroom digital isn’t about producing + “digital manifestos”, it’s about being curious enough that you’ll want + to to continue experimenting with your process until you’ve found one that + fits your newsroom’s needs.
  • +
  • More (remote) mentorship +
    Lack of mentorship is still a big problem. Google’s fellowship program is great. The fact that it + only caters to United States students isn’t. There are only a handful of + internships in Canada where students interested in journalism can get experience + writing code and building interactive stories. We’re OK with this for now, + as we expect internships and mentorship over the next 5 years between professional + newsrooms and student newsrooms will only increase. It’s worth noting that + some of that mentorship will likely be done remotely.
  • +
  • Changing a newsroom culture +
    Skill diversity needs to change. We encourage every student newsroom we + talk to, to start building a partnership with their school’s Computer Science + department. It will take some work, but you’ll find there are many CS undergrads + that love playing with web technologies, and using data to tell stories. + Changing who is in the newsroom should be one of the first steps newsrooms + take to changing how they tell stories. The same goes with getting designers + who understand the wonderful interactive elements of the web and students + who love statistics and exploring data. Getting students who are amazing + at design, data, code, words, and images into one room is one of the coolest + experience I’ve had. Everyone benefits from a more diverse newsroom.
  • +
+

What we don’t know

+
    +
  • Sharing curiosity for the web +
    We don’t know how to best teach students about the web. It’s not efficient + for us to teach coding classes. We do go into newsrooms and get them running + their first code exercises, but if someone wants to learn to program, we + can only provide the initial push and curiosity. We will be trying out + “labs” with a few schools next school year to hopefully get a better idea + of how to teach students about the web.
  • +
  • Business +
    We don’t know how to convince the business side of student papers that + they should invest in the web. At the very least we’re able to explain + that having students graduate with their current skill set is painful in + the current job market.
  • +
  • The future +
    We don’t know what journalism or the web will be like in 10 years, but + we can start encouraging students to keep an open mind about the skills + they’ll need. We’re less interested in preparing students for the current + newsroom climate, than we are in teaching students to have the ability + to learn new tools quickly as they come and go.
  • +
+
+
+
+
+
+ +
+
Another slide from 2012 website
+
+
+
+

What we’re trying to share with others

+
    +
  • A concise guide to building stories for the web +
    There are too many options to get started. We hope to provide an opinionated + guide that follows both our experiences, research, and observations from + trying to teach our peers.
  • +
+

Student newsrooms don’t have investors to please. Student newsrooms can + change their website every week if they want to try a new design or interaction. + As long as students start treating the web as a different medium, and start + building stories around that idea, then we’ll know we’re moving forward.

+

A note to professional news orgs

+

We’re also asking professional newsrooms to be more open about their process + of developing stories for the web. You play a big part in this. This means + writing about it, and sharing code. We need to start building a bridge + between student journalism and professional newsrooms.

+
+
+
+ +
+
2012
+
+

This is a start

+

We going to continue slowly growing the content on Open Journalism. We still consider this the beta version, + but expect to polish it, and beef up the content for a real launch at the + beginning of the summer.

+

We expect to have more original tutorials as well as the beginnings of + what a curriculum may look like that a student newsroom can adopt to start + guiding their transition to become a web first newsroom. We’re also going + to be working with the Queen’s Journal and + The Ubysseynext school year to better understand how to make the student + newsroom a place for experimenting with telling stories on the web. If + this sound like a good idea in your newsroom, we’re still looking to add + 1 more school.

+

We’re trying out some new shoes. And while they’re not self-lacing, and + smell a bit different, we feel lacing up a new pair of kicks can change + a lot.

+
+
+
+ +
+
+

+
+

+

Let’s talk. Let’s listen. +

+

We’re still in the early stages of what this project will look like, so if you want to help or have thoughts, let’s talk. +

+

pippin@pippinlee.com +

+

+
+

+

+
+

+

This isn’t supposed to be a + manifesto™© + we just think it’s pretty cool to share what we’ve learned so far, and hope you’ll do the same. We’re all in this together. +

+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+ + + + + diff --git a/resources/tests/readability/medium-2/expected.html b/resources/tests/readability/medium-2/expected.html new file mode 100644 index 0000000..413bc78 --- /dev/null +++ b/resources/tests/readability/medium-2/expected.html @@ -0,0 +1,25 @@ +
+

+
Words need defenders.

On Behalf of “Literally”

+

You either are a “literally” abuser or know of one. If you’re anything like me, hearing the word “literally” used incorrectly causes a little piece of your soul to whither and die. Of course I do not mean that literally, I mean that figuratively. An abuser would have said: “Every time a person uses that word, a piece of my soul literally withers and dies.” Which is terribly, horribly wrong.

+

For whatever bizarre reason, people feel the need to use literally as a sort of verbal crutch. They use it to emphasize a point, which is silly because they’re already using an analogy or a metaphor to illustrate said point. For example: “Ugh, I literally tore the house apart looking for my remote control!” No, you literally did not tear apart your house, because it’s still standing. If you’d just told me you “tore your house apart” searching for your remote, I would’ve understood what you meant. No need to add “literally” to the sentence.

+

Maybe I should define literally.

+
Literally means actually. When you say something literally happened, you’re describing the scene or situation as it actually happened.
+

So you should only use literally when you mean it. It should not be used in hyperbole. Example: “That was so funny I literally cried.” Which is possible. Some things are funny enough to elicit tears. Note the example stops with “literally cried.” You cannot literally cry your eyes out. The joke wasn’t so funny your eyes popped out of their sockets.

+

When in Doubt, Leave it Out

+

“I’m so hungry I could eat a horse,” means you’re hungry. You don’t need to say “I’m so hungry I could literally eat a horse.” Because you can’t do that in one sitting, I don’t care how big your stomach is.

+

“That play was so funny I laughed my head off,” illustrates the play was amusing. You don’t need to say you literally laughed your head off, because then your head would be on the ground and you wouldn’t be able to speak, much less laugh.

+

“I drove so fast my car was flying,” we get your point: you were speeding. But your car is never going fast enough to fly, so don’t say your car was literally flying.

+

Insecurities?

+

Maybe no one believed a story you told as a child, and you felt the need to prove that it actually happened. No really, mom, I literally climbed the tree. In efforts to prove truth, you used literally to describe something real, however outlandish it seemed. Whatever the reason, now your overuse of literally has become a habit.

+

Hard Habit to Break?

+

Abusing literally isn’t as bad a smoking, but it’s still an unhealthy habit (I mean that figuratively). Help is required in order to break it.

+

This is my version of an intervention for literally abusers. I’m not sure how else to do it other than in writing. I know this makes me sound like a know-it-all, and I accept that. But there’s no excuse other than blatant ignorance to misuse the word “literally.” So just stop it.

+

Don’t say “Courtney, this post is so snobbish it literally burned up my computer.” Because nothing is that snobbish that it causes computers to combust. Or: “Courtney, your head is so big it literally cannot get through the door.” Because it can, unless it’s one of those tiny doors from Alice in Wonderland and I need to eat a mushroom to make my whole body smaller.

+

No One’s Perfect

+

And I’m not saying I am. I’m trying to restore meaning to a word that’s lost meaning. I’m standing up for literally. It’s a good word when used correctly. People are butchering it and destroying it every day (figuratively speaking) and the massacre needs to stop. Just as there’s a coalition of people against the use of certain fonts (like Comic Sans and Papyrus), so should there be a coalition of people against the abuse of literally.

+

Saying it to Irritate?

+

Do you misuse the word “literally” just to annoy your know-it-all or grammar police friends/acquaintances/total strangers? If so, why? Doing so would be like me going outside when it’s freezing, wearing nothing but a pair of shorts and t-shirt in hopes of making you cold by just looking at me. Who suffers more?

+

Graphical Representation

+

Matthew Inman of “The Oatmeal” wrote a comic about literally. Abusers and defenders alike should check it out. It’s clear this whole craze about literally is driving a lot of us nuts. You literally abusers are killing off pieces of our souls. You must be stopped, or the world will be lost to meaninglessness forever. Figuratively speaking.

+
diff --git a/resources/tests/readability/medium-2/source.html b/resources/tests/readability/medium-2/source.html new file mode 100644 index 0000000..dcec2b1 --- /dev/null +++ b/resources/tests/readability/medium-2/source.html @@ -0,0 +1,14 @@ +On Behalf of “Literally” — Medium + +
+ + sidebar-open-28px + + + + + +
+
Ready to publish?
Change the story’s title, subtitle, and visibility as needed

Words need defenders.

On Behalf of “Literally”

You either are a “literally” abuser or know of one. If you’re anything like me, hearing the word “literally” used incorrectly causes a little piece of your soul to whither and die. Of course I do not mean that literally, I mean that figuratively. An abuser would have said: “Every time a person uses that word, a piece of my soul literally withers and dies.” Which is terribly, horribly wrong.

For whatever bizarre reason, people feel the need to use literally as a sort of verbal crutch. They use it to emphasize a point, which is silly because they’re already using an analogy or a metaphor to illustrate said point. For example: “Ugh, I literally tore the house apart looking for my remote control!” No, you literally did not tear apart your house, because it’s still standing. If you’d just told me you “tore your house apart” searching for your remote, I would’ve understood what you meant. No need to add “literally” to the sentence.

Maybe I should define literally.

Literally means actually. When you say something literally happened, you’re describing the scene or situation as it actually happened.

So you should only use literally when you mean it. It should not be used in hyperbole. Example: “That was so funny I literally cried.” Which is possible. Some things are funny enough to elicit tears. Note the example stops with “literally cried.” You cannot literally cry your eyes out. The joke wasn’t so funny your eyes popped out of their sockets.

When in Doubt, Leave it Out

“I’m so hungry I could eat a horse,” means you’re hungry. You don’t need to say “I’m so hungry I could literally eat a horse.” Because you can’t do that in one sitting, I don’t care how big your stomach is.

“That play was so funny I laughed my head off,” illustrates the play was amusing. You don’t need to say you literally laughed your head off, because then your head would be on the ground and you wouldn’t be able to speak, much less laugh.

“I drove so fast my car was flying,” we get your point: you were speeding. But your car is never going fast enough to fly, so don’t say your car was literally flying.

Insecurities?

Maybe no one believed a story you told as a child, and you felt the need to prove that it actually happened. No really, mom, I literally climbed the tree. In efforts to prove truth, you used literally to describe something real, however outlandish it seemed. Whatever the reason, now your overuse of literally has become a habit.

Hard Habit to Break?

Abusing literally isn’t as bad a smoking, but it’s still an unhealthy habit (I mean that figuratively). Help is required in order to break it.

This is my version of an intervention for literally abusers. I’m not sure how else to do it other than in writing. I know this makes me sound like a know-it-all, and I accept that. But there’s no excuse other than blatant ignorance to misuse the word “literally.” So just stop it.

Don’t say “Courtney, this post is so snobbish it literally burned up my computer.” Because nothing is that snobbish that it causes computers to combust. Or: “Courtney, your head is so big it literally cannot get through the door.” Because it can, unless it’s one of those tiny doors from Alice in Wonderland and I need to eat a mushroom to make my whole body smaller.

No One’s Perfect

And I’m not saying I am. I’m trying to restore meaning to a word that’s lost meaning. I’m standing up for literally. It’s a good word when used correctly. People are butchering it and destroying it every day (figuratively speaking) and the massacre needs to stop. Just as there’s a coalition of people against the use of certain fonts (like Comic Sans and Papyrus), so should there be a coalition of people against the abuse of literally.

Saying it to Irritate?

Do you misuse the word “literally” just to annoy your know-it-all or grammar police friends/acquaintances/total strangers? If so, why? Doing so would be like me going outside when it’s freezing, wearing nothing but a pair of shorts and t-shirt in hopes of making you cold by just looking at me. Who suffers more?

Graphical Representation

Matthew Inman of “The Oatmeal” wrote a comic about literally. Abusers and defenders alike should check it out. It’s clear this whole craze about literally is driving a lot of us nuts. You literally abusers are killing off pieces of our souls. You must be stopped, or the world will be lost to meaninglessness forever. Figuratively speaking.


Originally published at www.courtneykirchoff.com on November 18, 2011. Sadly this solo post did not stop the abuse of literally. Help the word out. Recommend this article to your literally abusers in your life.

diff --git a/resources/tests/readability/medium-3/expected.html b/resources/tests/readability/medium-3/expected.html new file mode 100644 index 0000000..7b336d9 --- /dev/null +++ b/resources/tests/readability/medium-3/expected.html @@ -0,0 +1,694 @@ +
+
+
+

John C. Welch +

+ +
+

+ How to get shanked doing what people say they want +

+
+

+ don’t preach to me
+ Mr. integrity +

+
+

+ (EDIT: removed the link to Samantha’s post, because the arments and the grubers and the rest of The Deck Clique got what they wanted: a non-proper person driven off the internet lightly capped with a dusting of transphobia along the way, all totally okay because the ends justify the means, and it’s okay when “good” people do it.) +

+

+ First, I need to say something about this article: the reason I’m writing it infuriates me. Worse than installing CS 3 or Acrobat 7 ever did, and the former inspired comparisons to fecophile porn. I’m actually too mad to cuss. Well, not completely, but in this case, I don’t think the people I’m mad at are worth the creativity I try to put into profanity. This is about a brownfield of hypocrisy and viciously deliberate mischaracterization that “shame” cannot even come close to the shame those behind it should feel. +

+

+ Now, read this post by Samantha Bielefeld: The Elephant in the Room. First, it is a well-written critical piece that raises a few points in a calm, rational, nonconfrontational fashion, exactly the kind of things the pushers of The Great Big Lie say we need more of, as opposed to the screaming that is the norm in such cases. +

+

+ …sorry, I should explain “The Great Big Lie”. There are several, but in this case, our specific instance of “The Great Big Lie” is about criticism. Over and over, you hear from the very people I am not going to be nice to in this that we need “better” criticsm. Instead of rage and anger, volume and vitriol, we need in-depth rational criticism, that isn’t personal or ad hominem. That it should focus on points, not people. +

+

+ That, readers, is “The Big Lie”. It is a lie so big that if one ponders the reality of it, as I am going to, one wonders why anyone would believe it. It is a lie and it is one we should stop telling. +

+
+
+

+ Samantha’s points (I assume you read it, for you are smart people who know the importance of such things) are fairly clear: +

+
    +
  1. With the release of Overcast 2.0, a product Samantha actually likes, Marco Arment moved to a patronage model that will probably be successful for him. +
  2. +
  3. Arment’s insistence that “anyone can do this” while technically true, (anyone can in fact, implement this pricing model), also implies that “anyone” can have the kind of success that a developer with Marco’s history, financial status, and deep ties to the Apple News Web is expected to have. This is silly. +
  4. +
  5. Marco Arment occupies a fairly unique position in the Apple universe, (gained by hard work and no small talent), and because of that, benefits from a set of privileges that a new developer or even one that has been around for a long time, but isn’t, well, Marco, not only don’t have, but have little chance of attaining anytime soon. +
  6. +
  7. Marco has earned his success and is entitled to the benefits and privileges it brings, but he seems rather blind to all of that, and seems to still imagine himself as “two guys in a garage”. This is just not correct. +
  8. +
  9. In addition, the benefits and privileges of the above ensure that by releasing Overcast 2 as a free app, with patronage pricing, he has, if not gutted, severely hurt the ability of folks actually selling their apps for an up-front price of not free to continue doing so. This has the effect of accelerating the “race to the bottom” in the podcast listening app segment, which hurts devs who cannot afford to work on a “I don’t really need this money, so whatever you feel like sending is okay” model. +
  10. +
+

+ None of this is incorrect. None of this is an ad hominem attack in any way. It is just pointing out that a developer of Arment’s stature and status lives in a very different world than someone in East Frog Balls, Arkansas trying to make a living off of App sales. Our dev in EFB doesn’t have the main sites on the Apple web falling all over themselves to review their app the way that Arment does. They’re not friends with the people being The Loop, Daring Fireball, SixColors, iMore, The Mac Observer, etc., yadda. +

+

+ So, our hero, in a fit of well-meaning ignorance writes this piece (posted this morning, 14 Oct. 15) and of course, the response and any criticisms are just as reasonable and thoughtful. +

+

+ If you really believe that, you are the most preciously ignorant person in the world, and can I have your seriously charmed life. +

+
+
+

+ The response, from all quarters, including Marco, someone who is so sensitive to criticism that the word “useless” is enough to shut him down, who blocked a friend of mine for the high crime of pointing out that his review of podcasting mics centered around higher priced gear and ignored folks without the scratch, who might not be ready for such things, is, in a single word, disgusting. Vomitous even. +

+

+ It’s an hours-long dogpile that beggars even my imagination, and I can imagine almost anything. Seriously, it’s all there in Samantha’s Twitter Feed. From what I can tell, she’s understandably shocked over it. I however was not. This one comment in her feed made me smile (warning, this wanders a bit…er…LOT. Twitter timelines are not easy to put together): +

+
+

+ I can see why you have some reservations about publishing it, but my gut feeling is that he would take it better than Nilay. +

+
+

+ Oh honey, bless your sweet, ignorant heart. Marco is one of the biggest pushers of The Big Lie, and one of the reasons it is such a lie. +

+

+ But it gets better. First, you have the “hey, Marco earned his status!” lot. A valid point, and one Bielefeld explicitly acknowledges, here: +

+
+

+ From his ground floor involvement in Tumblr (for which he is now a millionaire), to the creation and sale of a wildly successful app called Instapaper, he has become a household name in technology minded circles. It is this extensive time spent in the spotlight, the huge following on Twitter, and dedicated listeners of his weekly aired Accidental Tech Podcast, that has granted him the freedom to break from seeking revenue in more traditional manners. +

+
+

+ and here: +

+
+

+ I’m not knocking his success, he has put effort into his line of work, and has built his own life. +

+
+

+ and here: +

+
+

+ He has earned his time in the spotlight, and it’s only natural for him to take advantage of it. +

+
+

+ But still, you get the people telling her something she already acknowledge: +

+
+

+ I don’t think he’s blind. he’s worked to where he has gotten and has had failures like everyone else. +

+
+

+ Thank you for restating something in the article. To the person who wrote it. +

+

+ In the original article, Samantha talked about the money Marco makes from his podcast. She based that on the numbers provided by ATP in terms of sponsorship rates and the number of current sponsors the podcast has. Is this going to yield perfect numbers? No. But the numbers you get from it will at least be reasonable, or should be unless the published sponsorship rates are just fantasy, and you’re stupid for taking them seriously. +

+

+ At first, she went with a simple formula: +

+
+

+ $4K x 3 per episode = $12K x 52 weeks / 3 hosts splitting it. +

+
+

+ That’s not someone making shit up, right? Rather quickly, someone pointed out that she’d made an error in how she calculated it: +

+
+

+ That’s $4k per ad, no? So more like $12–16k per episode. +

+
+

+ She’d already realized her mistake and fixed it. +

+
+

+ which is actually wrong, and I’m correcting now. $4,000 per sponsor, per episode! So, $210,000 per year. +

+
+

+ Again, this is based on publicly available data the only kind someone not part of ATP or a close friend of Arment has access to. So while her numbers may be wrong, if they are, there’s no way for her to know that. She’s basing her opinion on actual available data. Which is sadly rare. +

+

+ This becomes a huge flashpoint. You name a reason to attack her over this, people do. No really. For example, she’s not calculating his income taxes correctly: +

+
+

+ especially since it isn’t his only source of income thus, not an indicator of his marginal inc. tax bracket. +

+

+ thus, guessing net income is more haphazard than stating approx. gross income. +

+
+

+ Ye Gods. She’s not doing his taxes for him, her point is invalid? +

+

+ Then there’s the people who seem to have not read anything past what other people are telling them: +

+
+

+ Not sure what to make of your Marco piece, to be honest. You mention his fame, whatever, but what’s the main idea here? +

+
+

+ Just how spoon-fed do you have to be? Have you no teeth? +

+

+ Of course, Marco jumps in, and predictably, he’s snippy: +

+
+

+ If you’re going to speak in precise absolutes, it’s best to first ensure that you’re correct. +

+
+

+ If you’re going to be like that, it’s best to provide better data. Don’t get snippy when someone is going off the only data available, and is clearly open to revising based on better data. +

+

+ Then Marco’s friends/fans get into it: +

+
+

+ I really don’t understand why it’s anyone’s business +

+
+

+ Samantha is trying to qualify for sainthood at this point: +

+
+

+ It isn’t really, it was a way of putting his income in context in regards to his ability to gamble with Overcast. +

+
+

+ Again, she’s trying to drag people back to her actual point, but no one is going to play. The storm has begun. Then we get people who are just spouting nonsense: +

+
+

+ Why is that only relevant for him? It’s a pretty weird metric,especially since his apps aren’t free. +

+
+

+ Wha?? Overcast 2 is absolutely free. Samantha points this out: +

+
+

+ His app is free, that’s what sparked the article to begin with. +

+
+

+ The response is literally a parallel to “How can there be global warming if it snowed today in my town?” +

+
+

+ If it’s free, how have I paid for it? Twice? +

+
+

+ She is still trying: +

+
+

+ You paid $4.99 to unlock functionality in Overcast 1.0 and you chose to support him with no additional functionality in 2.0 +

+
+

+ He is having none of it. IT SNOWED! SNOWWWWWWW! +

+
+

+ Yes. That’s not free. Free is when you choose not to make money. And that can be weaponized. But that’s not what Overcast does. +

+
+

+ She however, is relentless: +

+
+

+ No, it’s still free. You can choose to support it, you are required to pay $4.99 for Pocket Casts. Totally different model. +

+
+

+ Dude seems to give up. (Note: allllll the people bagging on her are men. All of them. Mansplaining like hell. And I’d bet every one of them considers themselves a feminist.) +

+

+ We get another guy trying to push the narrative she’s punishing him for his success, which is just…it’s stupid, okay? Stupid. +

+
+

+ It also wasn’t my point in writing my piece today, but it seems to be everyone’s focus. +

+
+

+ (UNDERSTATEMENT OF THE YEAR) +

+
+

+ I think the focus should be more on that fact that while it’s difficult, Marco spent years building his audience. +

+

+ It doesn’t matter what he makes it how he charges. If the audience be earned is willing to pay for it, awesome. +

+
+

+ She tries, oh lord, she tries: +

+
+

+ To assert that he isn’t doing anything any other dev couldn’t, is wrong. It’s successful because it’s Marco. +

+
+

+ But no, HE KNOWS HER POINT BETTER THAN SHE DOES: +

+
+

+ No, it’s successful because he busted his ass to make it so. It’s like any other business. He grew it. +

+
+

+ Christ. This is like a field of strawmen. Stupid ones. Very stupid ones. +

+

+ One guy tries to blame it all on Apple, another in a string of Wha??? moments: +

+
+

+ the appropriate context is Apple’s App Store policies. Other devs aren’t Marco’s responsibility +

+
+

+ Seriously? Dude, are you even trying to talk about what Samantha actually wrote? At this point, Samantha is clearly mystified at the entire thing: +

+
+

+ Why has the conversation suddenly turned to focus on nothing more than ATP sponsorship income? +

+
+

+ Because it’s a nit they can pick and allows them to ignore everything you wrote. That’s the only reason. +

+

+ One guy is “confused”: +

+
+

+ I see. He does have clout, so are you saying he’s too modest in how he sees himself as a dev? +

+

+ Yes. He can’t be equated to the vast majority of other developers. Like calling Gruber, “just another blogger”. +

+

+ Alright, that’s fair. I was just confused by the $ and fame angle at first. +

+
+

+ Samantha’s point centers on the benefits Marco gains via his fame and background. HOW DO YOU NOT MENTION THAT? HOW IS THAT CONFUSING? +

+

+ People of course are telling her it’s her fault for mentioning a salient fact at all: +

+
+

+ Why has the conversation suddenly turned to focus on nothing more than ATP sponsorship income? +

+

+ Maybe because you went there with your article? +

+

+ As a way of rationalizing his ability to gamble with the potential for Overcast to generate income…not the norm at all. +

+
+

+ Of course, had she not brought up those important points, she’d have been bagged on for “not providing proof”. Lose some, lose more. By now, she’s had enough and she just deletes all mention of it. Understandable, but sad she was bullied into doing that. +

+

+ Yes, bullied. That’s all this is. Bullying. She didn’t lie, cheat, or exaagerate. If her numbers were wrong, they weren’t wrong in a way she had any ability to do anything about. But there’s blood in the water, and the comments and attacks get worse: +

+
+

+ Because you decided to start a conversation about someone else’s personal shit. You started this war. +

+
+

+ War. THIS. IS. WAR. +

+

+ This is a bunch of nerds attacking someone for reasoned, calm, polite criticism of their friend/idol. Samantha is politely pushing back a bit: +

+
+

+ That doesn’t explain why every other part of my article is being pushed aside. +

+
+

+ She’s right. This is all nonsense. This is people ignoring her article completely, just looking for things to attack so it can be dismissed. It’s tribalism at its purest. +

+

+ Then some of the other annointed get into it, including Jason Snell in one of the most spectactular displays of “I have special knowledge you can’t be expected to have, therefore you are totally off base and wrong, even though there’s no way for you to know this” I’ve seen in a while. Jason: +

+
+

+ You should never use an ad rate card to estimate ad revenue from any media product ever. +

+

+ I learned this when I started working for a magazine — rate cards are mostly fiction, like prices on new cars +

+
+

+ How…exactly…in the name of whatever deity Jason may believe in…is Samantha or anyone not “in the biz” supposed to know this. Also, what exactly does a magazine on paper like Macworld have to do with sponsorships for a podcast? I have done podcasts that were sponsored, and I can retaliate with “we charged what the rate card said we did. Checkmate Elitests!” +

+

+ Samantha basically abases herself at his feet: +

+
+

+ I understand my mistake, and it’s unfortunate that it has completely diluted the point of my article. +

+
+

+ I think she should have told him where and how to stuff that nonsense, but she’s a nicer person than I am. Also, it’s appropriate that Jason’s twitter avatar has its nose in the air. This is some rank snobbery. It’s disgusting and if anyone pulled that on him, Jason would be very upset. But hey, one cannot criticize The Marco without getting pushback. By “pushback”, I mean “an unrelenting fecal flood”. +

+

+ Her only mistake was criticizing one of the Kool Kids. Folks, if you criticize anyone in The Deck Clique, or their friends, expect the same thing, regardless of tone or point. +

+

+ Another App Dev, seemingly unable to parse Samantha’s words, needs more explanation: +

+
+

+ so just looking over your mentions, I’m curious what exactly was your main point? Ignoring the podcast income bits. +

+
+

+ Oh wait, he didn’t even read the article. Good on you, Dev Guy, good. on. you. Still, she plays nice with someone who didn’t even read her article: +

+
+

+ That a typical unknown developer can’t depend on patronage to generate revenue, and charging for apps will become a negative. +

+
+

+ Marco comes back of course, and now basically accuses her of lying about other devs talking to her and supporting her point: +

+
+

+ How many actual developers did you hear from, really? Funny how almost nobody wants to give a (real) name on these accusations. +

+
+

+ Really? You’re going to do that? “There’s no name, so I don’t think it’s a real person.” Just…what’s the Joe Welch quote from the McCarthy hearings? +

+
+

+ Let us not assassinate this lad further, Senator. You’ve done enough. Have you no sense of decency, sir? At long last, have you left no sense of decency? +

+
+

+ That is what this is at this point: character assasination because she said something critical of A Popular Person. It’s disgusting. Depressing and disgusting. No one, none of these people have seriously discussed her point, heck, it looks like they barely bothered to read it, if they did at all. +

+

+ Marco starts getting really petty with her (no big shock) and Samantha finally starts pushing back: +

+
+

+ Glad to see you be the bigger person and ignore the mindset of so many developers not relating to you, good for you! +

+
+

+ That of course, is what caused Marco to question the validity, if not the existence of her sources. (Funny how anonymous sources are totes okay when they convenience Marco et al, and work for oh, Apple, but when they are inconvenient? Ha! PROVIDE ME PROOF YOU INTEMPERATE WOMAN!) +

+

+ Make no mistake, there’s some sexist shit going on here. Every tweet I’ve quoted was authored by a guy. +

+

+ Of course, Marco has to play the “I’ve been around longer than you” card with this bon mot: +

+
+

+ Yup, before you existed! +

+
+

+ Really dude? I mean, I’m sorry about the penis, but really? +

+

+ Mind you, when the criticism isn’t just bizarrely stupid, Samantha reacts the way Marco and his ilk claim they would to (if they ever got any valid criticism. Which clearly is impossible): +

+
+

+ Not to get into the middle of this, but “income” is not the term you’re looking for. “Revenue” is. +

+

+ lol. Noted. +

+

+ And I wasn’t intending to be a dick, just a lot of people hear/say “income” when they intend “revenue”, and then discussion … +

+

+ … gets derailed by a jedi handwave of “Expenses”. But outside of charitable donation, it is all directly related. +

+

+ haha. Thank you for the clarification. +

+
+

+ Note to Marco and the other…whatever they are…that is how one reacts to that kind of criticism. With a bit of humor and self-deprecation. You should try it sometime. For real, not just in your heads or conversations in Irish Pubs in S.F. +

+

+ But now, the door has been cracked, and the cheap shots come out: +

+
+

+ @testflight_app: Don’t worry guys, we process @marcoarment’s apps in direct proportion to his megabucks earnings. #fairelephant +

+
+

+ (Note: testflight_app is a parody account. Please do not mess with the actual testflight folks. They are still cool.) +

+

+ Or this…conversation: +

+
+
+

Image for post +

+
+
+

+ Good job guys. Good job. Defend the tribe. Attack the other. Frederico attempts to recover from his stunning display of demeaning douchery: ‏@viticci: @s_bielefeld I don’t know if it’s an Italian thing, but counting other people’s money is especially weird for me. IMO, bad move in the post. +

+

+ Samantha is clearly sick of his crap: ‏@s_bielefeld: @viticci That’s what I’m referring to, the mistake of ever having mentioned it. So, now, Marco can ignore the bigger issue and go on living. +

+

+ Good for her. There’s being patient and being roadkill. +

+

+ Samantha does put the call out for her sources to maybe let her use their names: +

+
+

+ From all of you I heard from earlier, anyone care to go on record? +

+
+

+ My good friend, The Angry Drunk points out the obvious problem: +

+
+

+ Nobody’s going to go on record when they count on Marco’s friends for their PR. +

+
+

+ This is true. Again, the sites that are Friends of Marco: +

+

+ Daring Fireball +

+

+ The Loop +

+

+ SixColors +

+

+ iMore +

+

+ MacStories +

+

+ A few others, but I want this post to end one day. +

+

+ You piss that crew off, and given how petty rather a few of them have demonstrated they are, good luck on getting any kind of notice from them. +

+

+ Of course, the idea this could happen is just craycray: +

+
+

+ @KevinColeman .@Angry_Drunk @s_bielefeld @marcoarment Wow, you guys are veering right into crazy conspiracy theory territory. #JetFuelCantMeltSteelBeams +

+
+

+ Yeah. Because a mature person like Marco would never do anything like that. +

+

+ Of course, the real point on this is starting to happen: +

+
+

+ you’re getting a lot of heat now but happy you are writing things that stir up the community. Hope you continue to be a voice! +

+

+ I doubt I will. +

+
+

+ See, they’ve done their job. Mess with the bull, you get the horns. Maybe you should find another thing to write about, this isn’t a good place for you. Great job y’all. +

+

+ Some people aren’t even pretending. They’re just in full strawman mode: +

+
+

+ @timkeller: Unfair to begrudge a person for leveraging past success, especially when that success is earned. No ‘luck’ involved. +

+

+ @s_bielefeld: @timkeller I plainly stated that I don’t hold his doing this against him. Way to twist words. +

+
+

+ I think she’s earned her anger at this point. +

+

+ Don’t worry, Marco knows what the real problem is: most devs just suck — +

+
+
+

Image for post +

+
+
+

+ I have a saying that applies in this case: don’t place your head so far up your nethers that you go full Klein Bottle. Marco has gone full Klein Bottle. (To be correct, he went FKB some years ago.) +

+

+ There are some bright spots. My favorite is when Building Twenty points out the real elephant in the room: +

+
+

+ @BuildingTwenty: Both @s_bielefeld & I wrote similar critiques of @marcoarment’s pricing model yet the Internet pilloried only the woman. Who’d have guessed? +

+
+

+ Yup. +

+

+ Another bright spot are these comments from Ian Betteridge, who has been doing this even longer than Marco: +

+
+

+ You know, any writer who has never made a single factual error in a piece hasn’t ever written anything worth reading. +

+

+ I learned my job with the support of people who helped me. Had I suffered an Internet pile on for every error I wouldn’t have bothered. +

+
+

+ To which Samantha understandably replies: +

+
+

+ and it’s honestly something I’m contemplating right now, whether to continue… +

+
+

+ Gee, I can’t imagine why. Why with comments like this from Chris Breen that completely misrepresent Samantha’s point, (who until today, I would have absolutely defended as being better than this, something I am genuinely saddened to be wrong about), why wouldn’t she want to continue doing this? +

+
+

+ If I have this right, some people are outraged that a creator has decided to give away his work. +

+
+

+ No Chris, you don’t have this right. But hey, who has time to find out the real issue and read an article. I’m sure your friends told you everything you need to know. +

+

+ Noted Feminist Glenn Fleishman gets a piece of the action too: +

+
+
+

Image for post +

+
+
+

+ I’m not actually surprised here. I watched Fleishman berate a friend of mine who has been an engineer for…heck, waaaaay too long on major software products in the most condescending way because she tried to point out that as a very technical woman, “The Magazine” literally had nothing to say to her and maybe he should fix that. “Impertinent” was I believe what he called her, but I may have the specific word wrong. Not the attitude mind you. Great Feminists like Glenn do not like uppity women criticizing Great Feminists who are their Great Allies. +

+

+ Great Feminists are often tools. +

+
+
+

+ Luckily, I hope, the people who get Samantha’s point also started chiming in (and you get 100% of the women commenting here that I’ve seen): +

+
+

+ I don’t think he’s wrong for doing it, he just discusses it as if the market’s a level playing field — it isn’t +

+

+ This is a great article with lots of great points about the sustainability of iOS development. Thank you for publishing it. +

+

+ Regardless of the numbers and your view of MA, fair points here about confirmation bias in app marketing feasibility http://samanthabielefeld.com/the-elephant-in-the-room … +

+

+ thank you for posting this, it covers a lot of things people don’t like to talk about. +

+

+ I’m sure you have caught untold amounts of flak over posting this because Marco is blind to his privilege as a developer. +

+

+ Catching up on the debate, and agreeing with Harry’s remark. (Enjoyed your article, Samantha, and ‘got’ your point.) +

+
+
+
+

+ I would like to say I’m surprised at the reaction to Samantha’s article, but I’m not. In spite of his loud declarations of support for The Big Lie, Marco Arment is as bad at any form of criticism that he hasn’t already approved as a very insecure tween. An example from 2011: http://www.businessinsider.com/marco-arment-2011-9 +

+

+ Marco is great with criticism as long as it never actually criticizes him. If it does, be prepared a flood of petty, petulant whining that a room full of bored preschoolers on a hot day would be hard-pressed to match. +

+

+ Today has been…well, it sucks. It sucks because someone doing what all the Arments of the world claim to want was naive enough to believe what they were told, and found out the hard way just how big a lie The Big Lie is, and how vicious people are when you’re silly enough to believe anything they say about criticism. +

+

+ And note again, every single condescending crack, misrepresentation, and strawman had an exclusively male source. Most of them have, at one point or another, loudly trumpted themselves as Feminist Allies, as a friend to women struggling with the sexism and misogyny in tech. Congratulations y’all on being just as bad as the people you claim to oppose. +

+

+ Samantha has handled this better than anyone else could have. My respect for her as a person and a writer is off the charts. If she choses to walk away from blogging in the Apple space, believe me I understand. As bad as today was for her, I’ve seen worse. Much worse. +

+

+ But I hope she doesn’t. I hope she stays, because she is Doing This Right, and in a corner of the internet that has become naught but an endless circle jerk, a cliquish collection, a churlish, childish cohort interested not in writing or the truth, but in making sure The Right People are elevated, and The Others put down, she is someone worth reading and listening to. The number people who owe her apologies goes around the block, and I don’t think she’ll ever see a one. I’m sure as heck not apologizing for them, I’ll not make their lives easier in the least. +

+

+ All of you, all. of. you…Marco, Breen, Snell, Vittici, had a chance to live by your words. You were faced with reasoned, polite, respectful criticism and instead of what you should have done, you all dropped trou and sprayed an epic diarrheal discharge all over someone who had done nothing to deserve it. Me, I earned most of my aggro, Samantha did not earn any of the idiocy I’ve seen today. I hope you’re all proud of yourselves. Someone should be, it won’t be me. Ever. +

+

+ So I hope she stays, but if she goes, I understand. For what it’s worth, I don’t think she’s wrong either way. +

+
+
diff --git a/resources/tests/readability/medium-3/source.html b/resources/tests/readability/medium-3/source.html new file mode 100644 index 0000000..1bd8d82 --- /dev/null +++ b/resources/tests/readability/medium-3/source.html @@ -0,0 +1,1645 @@ + + + + + + + + Samantha and The Great Big Lie. How to get shanked doing what people… | by John C. Welch | Medium + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+ +
+ + +
+
+ + +
+
+
+
+
+
+
+
+
+

+ Samantha and The Great Big Lie +

+
+
+
+
+ John C. Welch +
+
+
+
+
+ John C. Welch +
+ +
+
+
+
+
+ Oct 15, 2015 · 18 min read +
+
+
+
+
+
+ +
+
+ +
+
+ +
+
+
+ + +
+
+
+
+
+
+
+
+

+ How to get shanked doing what people say they want +

+
+

+ don’t preach to me
+ Mr. integrity +

+
+

+ (EDIT: removed the link to Samantha’s post, because the arments and the grubers and the rest of The Deck Clique got what they wanted: a non-proper person driven off the internet lightly capped with a dusting of transphobia along the way, all totally okay because the ends justify the means, and it’s okay when “good” people do it.) +

+

+ First, I need to say something about this article: the reason I’m writing it infuriates me. Worse than installing CS 3 or Acrobat 7 ever did, and the former inspired comparisons to fecophile porn. I’m actually too mad to cuss. Well, not completely, but in this case, I don’t think the people I’m mad at are worth the creativity I try to put into profanity. This is about a brownfield of hypocrisy and viciously deliberate mischaracterization that “shame” cannot even come close to the shame those behind it should feel. +

+

+ Now, read this post by Samantha Bielefeld: The Elephant in the Room. First, it is a well-written critical piece that raises a few points in a calm, rational, nonconfrontational fashion, exactly the kind of things the pushers of The Great Big Lie say we need more of, as opposed to the screaming that is the norm in such cases. +

+

+ …sorry, I should explain “The Great Big Lie”. There are several, but in this case, our specific instance of “The Great Big Lie” is about criticism. Over and over, you hear from the very people I am not going to be nice to in this that we need “better” criticsm. Instead of rage and anger, volume and vitriol, we need in-depth rational criticism, that isn’t personal or ad hominem. That it should focus on points, not people. +

+

+ That, readers, is “The Big Lie”. It is a lie so big that if one ponders the reality of it, as I am going to, one wonders why anyone would believe it. It is a lie and it is one we should stop telling. +

+
+
+
+
+
+
+
+

+ Samantha’s points (I assume you read it, for you are smart people who know the importance of such things) are fairly clear: +

+
    +
  1. With the release of Overcast 2.0, a product Samantha actually likes, Marco Arment moved to a patronage model that will probably be successful for him. +
  2. +
  3. Arment’s insistence that “anyone can do this” while technically true, (anyone can in fact, implement this pricing model), also implies that “anyone” can have the kind of success that a developer with Marco’s history, financial status, and deep ties to the Apple News Web is expected to have. This is silly. +
  4. +
  5. Marco Arment occupies a fairly unique position in the Apple universe, (gained by hard work and no small talent), and because of that, benefits from a set of privileges that a new developer or even one that has been around for a long time, but isn’t, well, Marco, not only don’t have, but have little chance of attaining anytime soon. +
  6. +
  7. Marco has earned his success and is entitled to the benefits and privileges it brings, but he seems rather blind to all of that, and seems to still imagine himself as “two guys in a garage”. This is just not correct. +
  8. +
  9. In addition, the benefits and privileges of the above ensure that by releasing Overcast 2 as a free app, with patronage pricing, he has, if not gutted, severely hurt the ability of folks actually selling their apps for an up-front price of not free to continue doing so. This has the effect of accelerating the “race to the bottom” in the podcast listening app segment, which hurts devs who cannot afford to work on a “I don’t really need this money, so whatever you feel like sending is okay” model. +
  10. +
+

+ None of this is incorrect. None of this is an ad hominem attack in any way. It is just pointing out that a developer of Arment’s stature and status lives in a very different world than someone in East Frog Balls, Arkansas trying to make a living off of App sales. Our dev in EFB doesn’t have the main sites on the Apple web falling all over themselves to review their app the way that Arment does. They’re not friends with the people being The Loop, Daring Fireball, SixColors, iMore, The Mac Observer, etc., yadda. +

+

+ So, our hero, in a fit of well-meaning ignorance writes this piece (posted this morning, 14 Oct. 15) and of course, the response and any criticisms are just as reasonable and thoughtful. +

+

+ If you really believe that, you are the most preciously ignorant person in the world, and can I have your seriously charmed life. +

+
+
+
+
+
+
+
+

+ The response, from all quarters, including Marco, someone who is so sensitive to criticism that the word “useless” is enough to shut him down, who blocked a friend of mine for the high crime of pointing out that his review of podcasting mics centered around higher priced gear and ignored folks without the scratch, who might not be ready for such things, is, in a single word, disgusting. Vomitous even. +

+

+ It’s an hours-long dogpile that beggars even my imagination, and I can imagine almost anything. Seriously, it’s all there in Samantha’s Twitter Feed. From what I can tell, she’s understandably shocked over it. I however was not. This one comment in her feed made me smile (warning, this wanders a bit…er…LOT. Twitter timelines are not easy to put together): +

+
+

+ I can see why you have some reservations about publishing it, but my gut feeling is that he would take it better than Nilay. +

+
+

+ Oh honey, bless your sweet, ignorant heart. Marco is one of the biggest pushers of The Big Lie, and one of the reasons it is such a lie. +

+

+ But it gets better. First, you have the “hey, Marco earned his status!” lot. A valid point, and one Bielefeld explicitly acknowledges, here: +

+
+

+ From his ground floor involvement in Tumblr (for which he is now a millionaire), to the creation and sale of a wildly successful app called Instapaper, he has become a household name in technology minded circles. It is this extensive time spent in the spotlight, the huge following on Twitter, and dedicated listeners of his weekly aired Accidental Tech Podcast, that has granted him the freedom to break from seeking revenue in more traditional manners. +

+
+

+ and here: +

+
+

+ I’m not knocking his success, he has put effort into his line of work, and has built his own life. +

+
+

+ and here: +

+
+

+ He has earned his time in the spotlight, and it’s only natural for him to take advantage of it. +

+
+

+ But still, you get the people telling her something she already acknowledge: +

+
+

+ I don’t think he’s blind. he’s worked to where he has gotten and has had failures like everyone else. +

+
+

+ Thank you for restating something in the article. To the person who wrote it. +

+

+ In the original article, Samantha talked about the money Marco makes from his podcast. She based that on the numbers provided by ATP in terms of sponsorship rates and the number of current sponsors the podcast has. Is this going to yield perfect numbers? No. But the numbers you get from it will at least be reasonable, or should be unless the published sponsorship rates are just fantasy, and you’re stupid for taking them seriously. +

+

+ At first, she went with a simple formula: +

+
+

+ $4K x 3 per episode = $12K x 52 weeks / 3 hosts splitting it. +

+
+

+ That’s not someone making shit up, right? Rather quickly, someone pointed out that she’d made an error in how she calculated it: +

+
+

+ That’s $4k per ad, no? So more like $12–16k per episode. +

+
+

+ She’d already realized her mistake and fixed it. +

+
+

+ which is actually wrong, and I’m correcting now. $4,000 per sponsor, per episode! So, $210,000 per year. +

+
+

+ Again, this is based on publicly available data the only kind someone not part of ATP or a close friend of Arment has access to. So while her numbers may be wrong, if they are, there’s no way for her to know that. She’s basing her opinion on actual available data. Which is sadly rare. +

+

+ This becomes a huge flashpoint. You name a reason to attack her over this, people do. No really. For example, she’s not calculating his income taxes correctly: +

+
+

+ especially since it isn’t his only source of income thus, not an indicator of his marginal inc. tax bracket. +

+

+ thus, guessing net income is more haphazard than stating approx. gross income. +

+
+

+ Ye Gods. She’s not doing his taxes for him, her point is invalid? +

+

+ Then there’s the people who seem to have not read anything past what other people are telling them: +

+
+

+ Not sure what to make of your Marco piece, to be honest. You mention his fame, whatever, but what’s the main idea here? +

+
+

+ Just how spoon-fed do you have to be? Have you no teeth? +

+

+ Of course, Marco jumps in, and predictably, he’s snippy: +

+
+

+ If you’re going to speak in precise absolutes, it’s best to first ensure that you’re correct. +

+
+

+ If you’re going to be like that, it’s best to provide better data. Don’t get snippy when someone is going off the only data available, and is clearly open to revising based on better data. +

+

+ Then Marco’s friends/fans get into it: +

+
+

+ I really don’t understand why it’s anyone’s business +

+
+

+ Samantha is trying to qualify for sainthood at this point: +

+
+

+ It isn’t really, it was a way of putting his income in context in regards to his ability to gamble with Overcast. +

+
+

+ Again, she’s trying to drag people back to her actual point, but no one is going to play. The storm has begun. Then we get people who are just spouting nonsense: +

+
+

+ Why is that only relevant for him? It’s a pretty weird metric,especially since his apps aren’t free. +

+
+

+ Wha?? Overcast 2 is absolutely free. Samantha points this out: +

+
+

+ His app is free, that’s what sparked the article to begin with. +

+
+

+ The response is literally a parallel to “How can there be global warming if it snowed today in my town?” +

+
+

+ If it’s free, how have I paid for it? Twice? +

+
+

+ She is still trying: +

+
+

+ You paid $4.99 to unlock functionality in Overcast 1.0 and you chose to support him with no additional functionality in 2.0 +

+
+

+ He is having none of it. IT SNOWED! SNOWWWWWWW! +

+
+

+ Yes. That’s not free. Free is when you choose not to make money. And that can be weaponized. But that’s not what Overcast does. +

+
+

+ She however, is relentless: +

+
+

+ No, it’s still free. You can choose to support it, you are required to pay $4.99 for Pocket Casts. Totally different model. +

+
+

+ Dude seems to give up. (Note: allllll the people bagging on her are men. All of them. Mansplaining like hell. And I’d bet every one of them considers themselves a feminist.) +

+

+ We get another guy trying to push the narrative she’s punishing him for his success, which is just…it’s stupid, okay? Stupid. +

+
+

+ It also wasn’t my point in writing my piece today, but it seems to be everyone’s focus. +

+
+

+ (UNDERSTATEMENT OF THE YEAR) +

+
+

+ I think the focus should be more on that fact that while it’s difficult, Marco spent years building his audience. +

+

+ It doesn’t matter what he makes it how he charges. If the audience be earned is willing to pay for it, awesome. +

+
+

+ She tries, oh lord, she tries: +

+
+

+ To assert that he isn’t doing anything any other dev couldn’t, is wrong. It’s successful because it’s Marco. +

+
+

+ But no, HE KNOWS HER POINT BETTER THAN SHE DOES: +

+
+

+ No, it’s successful because he busted his ass to make it so. It’s like any other business. He grew it. +

+
+

+ Christ. This is like a field of strawmen. Stupid ones. Very stupid ones. +

+

+ One guy tries to blame it all on Apple, another in a string of Wha??? moments: +

+
+

+ the appropriate context is Apple’s App Store policies. Other devs aren’t Marco’s responsibility +

+
+

+ Seriously? Dude, are you even trying to talk about what Samantha actually wrote? At this point, Samantha is clearly mystified at the entire thing: +

+
+

+ Why has the conversation suddenly turned to focus on nothing more than ATP sponsorship income? +

+
+

+ Because it’s a nit they can pick and allows them to ignore everything you wrote. That’s the only reason. +

+

+ One guy is “confused”: +

+
+

+ I see. He does have clout, so are you saying he’s too modest in how he sees himself as a dev? +

+

+ Yes. He can’t be equated to the vast majority of other developers. Like calling Gruber, “just another blogger”. +

+

+ Alright, that’s fair. I was just confused by the $ and fame angle at first. +

+
+

+ Samantha’s point centers on the benefits Marco gains via his fame and background. HOW DO YOU NOT MENTION THAT? HOW IS THAT CONFUSING? +

+

+ People of course are telling her it’s her fault for mentioning a salient fact at all: +

+
+

+ Why has the conversation suddenly turned to focus on nothing more than ATP sponsorship income? +

+

+ Maybe because you went there with your article? +

+

+ As a way of rationalizing his ability to gamble with the potential for Overcast to generate income…not the norm at all. +

+
+

+ Of course, had she not brought up those important points, she’d have been bagged on for “not providing proof”. Lose some, lose more. By now, she’s had enough and she just deletes all mention of it. Understandable, but sad she was bullied into doing that. +

+

+ Yes, bullied. That’s all this is. Bullying. She didn’t lie, cheat, or exaagerate. If her numbers were wrong, they weren’t wrong in a way she had any ability to do anything about. But there’s blood in the water, and the comments and attacks get worse: +

+
+

+ Because you decided to start a conversation about someone else’s personal shit. You started this war. +

+
+

+ War. THIS. IS. WAR. +

+

+ This is a bunch of nerds attacking someone for reasoned, calm, polite criticism of their friend/idol. Samantha is politely pushing back a bit: +

+
+

+ That doesn’t explain why every other part of my article is being pushed aside. +

+
+

+ She’s right. This is all nonsense. This is people ignoring her article completely, just looking for things to attack so it can be dismissed. It’s tribalism at its purest. +

+

+ Then some of the other annointed get into it, including Jason Snell in one of the most spectactular displays of “I have special knowledge you can’t be expected to have, therefore you are totally off base and wrong, even though there’s no way for you to know this” I’ve seen in a while. Jason: +

+
+

+ You should never use an ad rate card to estimate ad revenue from any media product ever. +

+

+ I learned this when I started working for a magazine — rate cards are mostly fiction, like prices on new cars +

+
+

+ How…exactly…in the name of whatever deity Jason may believe in…is Samantha or anyone not “in the biz” supposed to know this. Also, what exactly does a magazine on paper like Macworld have to do with sponsorships for a podcast? I have done podcasts that were sponsored, and I can retaliate with “we charged what the rate card said we did. Checkmate Elitests!” +

+

+ Samantha basically abases herself at his feet: +

+
+

+ I understand my mistake, and it’s unfortunate that it has completely diluted the point of my article. +

+
+

+ I think she should have told him where and how to stuff that nonsense, but she’s a nicer person than I am. Also, it’s appropriate that Jason’s twitter avatar has its nose in the air. This is some rank snobbery. It’s disgusting and if anyone pulled that on him, Jason would be very upset. But hey, one cannot criticize The Marco without getting pushback. By “pushback”, I mean “an unrelenting fecal flood”. +

+

+ Her only mistake was criticizing one of the Kool Kids. Folks, if you criticize anyone in The Deck Clique, or their friends, expect the same thing, regardless of tone or point. +

+

+ Another App Dev, seemingly unable to parse Samantha’s words, needs more explanation: +

+
+

+ so just looking over your mentions, I’m curious what exactly was your main point? Ignoring the podcast income bits. +

+
+

+ Oh wait, he didn’t even read the article. Good on you, Dev Guy, good. on. you. Still, she plays nice with someone who didn’t even read her article: +

+
+

+ That a typical unknown developer can’t depend on patronage to generate revenue, and charging for apps will become a negative. +

+
+

+ Marco comes back of course, and now basically accuses her of lying about other devs talking to her and supporting her point: +

+
+

+ How many actual developers did you hear from, really? Funny how almost nobody wants to give a (real) name on these accusations. +

+
+

+ Really? You’re going to do that? “There’s no name, so I don’t think it’s a real person.” Just…what’s the Joe Welch quote from the McCarthy hearings? +

+
+

+ Let us not assassinate this lad further, Senator. You’ve done enough. Have you no sense of decency, sir? At long last, have you left no sense of decency? +

+
+

+ That is what this is at this point: character assasination because she said something critical of A Popular Person. It’s disgusting. Depressing and disgusting. No one, none of these people have seriously discussed her point, heck, it looks like they barely bothered to read it, if they did at all. +

+

+ Marco starts getting really petty with her (no big shock) and Samantha finally starts pushing back: +

+
+

+ Glad to see you be the bigger person and ignore the mindset of so many developers not relating to you, good for you! +

+
+

+ That of course, is what caused Marco to question the validity, if not the existence of her sources. (Funny how anonymous sources are totes okay when they convenience Marco et al, and work for oh, Apple, but when they are inconvenient? Ha! PROVIDE ME PROOF YOU INTEMPERATE WOMAN!) +

+

+ Make no mistake, there’s some sexist shit going on here. Every tweet I’ve quoted was authored by a guy. +

+

+ Of course, Marco has to play the “I’ve been around longer than you” card with this bon mot: +

+
+

+ Yup, before you existed! +

+
+

+ Really dude? I mean, I’m sorry about the penis, but really? +

+

+ Mind you, when the criticism isn’t just bizarrely stupid, Samantha reacts the way Marco and his ilk claim they would to (if they ever got any valid criticism. Which clearly is impossible): +

+
+

+ Not to get into the middle of this, but “income” is not the term you’re looking for. “Revenue” is. +

+

+ lol. Noted. +

+

+ And I wasn’t intending to be a dick, just a lot of people hear/say “income” when they intend “revenue”, and then discussion … +

+

+ … gets derailed by a jedi handwave of “Expenses”. But outside of charitable donation, it is all directly related. +

+

+ haha. Thank you for the clarification. +

+
+

+ Note to Marco and the other…whatever they are…that is how one reacts to that kind of criticism. With a bit of humor and self-deprecation. You should try it sometime. For real, not just in your heads or conversations in Irish Pubs in S.F. +

+

+ But now, the door has been cracked, and the cheap shots come out: +

+
+

+ @testflight_app: Don’t worry guys, we process @marcoarment’s apps in direct proportion to his megabucks earnings. #fairelephant +

+
+

+ (Note: testflight_app is a parody account. Please do not mess with the actual testflight folks. They are still cool.) +

+

+ Or this…conversation: +

+
+
+
+
+
+ +
Image for post +
+
+
+
+

+ Good job guys. Good job. Defend the tribe. Attack the other. Frederico attempts to recover from his stunning display of demeaning douchery: ‏@viticci: @s_bielefeld I don’t know if it’s an Italian thing, but counting other people’s money is especially weird for me. IMO, bad move in the post. +

+

+ Samantha is clearly sick of his crap: ‏@s_bielefeld: @viticci That’s what I’m referring to, the mistake of ever having mentioned it. So, now, Marco can ignore the bigger issue and go on living. +

+

+ Good for her. There’s being patient and being roadkill. +

+

+ Samantha does put the call out for her sources to maybe let her use their names: +

+
+

+ From all of you I heard from earlier, anyone care to go on record? +

+
+

+ My good friend, The Angry Drunk points out the obvious problem: +

+
+

+ Nobody’s going to go on record when they count on Marco’s friends for their PR. +

+
+

+ This is true. Again, the sites that are Friends of Marco: +

+

+ Daring Fireball +

+

+ The Loop +

+

+ SixColors +

+

+ iMore +

+

+ MacStories +

+

+ A few others, but I want this post to end one day. +

+

+ You piss that crew off, and given how petty rather a few of them have demonstrated they are, good luck on getting any kind of notice from them. +

+

+ Of course, the idea this could happen is just craycray: +

+
+

+ @KevinColeman .@Angry_Drunk @s_bielefeld @marcoarment Wow, you guys are veering right into crazy conspiracy theory territory. #JetFuelCantMeltSteelBeams +

+
+

+ Yeah. Because a mature person like Marco would never do anything like that. +

+

+ Of course, the real point on this is starting to happen: +

+
+

+ you’re getting a lot of heat now but happy you are writing things that stir up the community. Hope you continue to be a voice! +

+

+ I doubt I will. +

+
+

+ See, they’ve done their job. Mess with the bull, you get the horns. Maybe you should find another thing to write about, this isn’t a good place for you. Great job y’all. +

+

+ Some people aren’t even pretending. They’re just in full strawman mode: +

+
+

+ @timkeller: Unfair to begrudge a person for leveraging past success, especially when that success is earned. No ‘luck’ involved. +

+

+ @s_bielefeld: @timkeller I plainly stated that I don’t hold his doing this against him. Way to twist words. +

+
+

+ I think she’s earned her anger at this point. +

+

+ Don’t worry, Marco knows what the real problem is: most devs just suck — +

+
+
+
+
+
+ +
Image for post +
+
+
+
+

+ I have a saying that applies in this case: don’t place your head so far up your nethers that you go full Klein Bottle. Marco has gone full Klein Bottle. (To be correct, he went FKB some years ago.) +

+

+ There are some bright spots. My favorite is when Building Twenty points out the real elephant in the room: +

+
+

+ @BuildingTwenty: Both @s_bielefeld & I wrote similar critiques of @marcoarment’s pricing model yet the Internet pilloried only the woman. Who’d have guessed? +

+
+

+ Yup. +

+

+ Another bright spot are these comments from Ian Betteridge, who has been doing this even longer than Marco: +

+
+

+ You know, any writer who has never made a single factual error in a piece hasn’t ever written anything worth reading. +

+

+ I learned my job with the support of people who helped me. Had I suffered an Internet pile on for every error I wouldn’t have bothered. +

+
+

+ To which Samantha understandably replies: +

+
+

+ and it’s honestly something I’m contemplating right now, whether to continue… +

+
+

+ Gee, I can’t imagine why. Why with comments like this from Chris Breen that completely misrepresent Samantha’s point, (who until today, I would have absolutely defended as being better than this, something I am genuinely saddened to be wrong about), why wouldn’t she want to continue doing this? +

+
+

+ If I have this right, some people are outraged that a creator has decided to give away his work. +

+
+

+ No Chris, you don’t have this right. But hey, who has time to find out the real issue and read an article. I’m sure your friends told you everything you need to know. +

+

+ Noted Feminist Glenn Fleishman gets a piece of the action too: +

+
+
+
+
+
+ +
Image for post +
+
+
+
+

+ I’m not actually surprised here. I watched Fleishman berate a friend of mine who has been an engineer for…heck, waaaaay too long on major software products in the most condescending way because she tried to point out that as a very technical woman, “The Magazine” literally had nothing to say to her and maybe he should fix that. “Impertinent” was I believe what he called her, but I may have the specific word wrong. Not the attitude mind you. Great Feminists like Glenn do not like uppity women criticizing Great Feminists who are their Great Allies. +

+

+ Great Feminists are often tools. +

+
+
+
+
+
+
+
+

+ Luckily, I hope, the people who get Samantha’s point also started chiming in (and you get 100% of the women commenting here that I’ve seen): +

+
+

+ I don’t think he’s wrong for doing it, he just discusses it as if the market’s a level playing field — it isn’t +

+

+ This is a great article with lots of great points about the sustainability of iOS development. Thank you for publishing it. +

+

+ Regardless of the numbers and your view of MA, fair points here about confirmation bias in app marketing feasibility http://samanthabielefeld.com/the-elephant-in-the-room … +

+

+ thank you for posting this, it covers a lot of things people don’t like to talk about. +

+

+ I’m sure you have caught untold amounts of flak over posting this because Marco is blind to his privilege as a developer. +

+

+ Catching up on the debate, and agreeing with Harry’s remark. (Enjoyed your article, Samantha, and ‘got’ your point.) +

+
+
+
+
+
+
+
+
+

+ I would like to say I’m surprised at the reaction to Samantha’s article, but I’m not. In spite of his loud declarations of support for The Big Lie, Marco Arment is as bad at any form of criticism that he hasn’t already approved as a very insecure tween. An example from 2011: http://www.businessinsider.com/marco-arment-2011-9 +

+

+ Marco is great with criticism as long as it never actually criticizes him. If it does, be prepared a flood of petty, petulant whining that a room full of bored preschoolers on a hot day would be hard-pressed to match. +

+

+ Today has been…well, it sucks. It sucks because someone doing what all the Arments of the world claim to want was naive enough to believe what they were told, and found out the hard way just how big a lie The Big Lie is, and how vicious people are when you’re silly enough to believe anything they say about criticism. +

+

+ And note again, every single condescending crack, misrepresentation, and strawman had an exclusively male source. Most of them have, at one point or another, loudly trumpted themselves as Feminist Allies, as a friend to women struggling with the sexism and misogyny in tech. Congratulations y’all on being just as bad as the people you claim to oppose. +

+

+ Samantha has handled this better than anyone else could have. My respect for her as a person and a writer is off the charts. If she choses to walk away from blogging in the Apple space, believe me I understand. As bad as today was for her, I’ve seen worse. Much worse. +

+

+ But I hope she doesn’t. I hope she stays, because she is Doing This Right, and in a corner of the internet that has become naught but an endless circle jerk, a cliquish collection, a churlish, childish cohort interested not in writing or the truth, but in making sure The Right People are elevated, and The Others put down, she is someone worth reading and listening to. The number people who owe her apologies goes around the block, and I don’t think she’ll ever see a one. I’m sure as heck not apologizing for them, I’ll not make their lives easier in the least. +

+

+ All of you, all. of. you…Marco, Breen, Snell, Vittici, had a chance to live by your words. You were faced with reasoned, polite, respectful criticism and instead of what you should have done, you all dropped trou and sprayed an epic diarrheal discharge all over someone who had done nothing to deserve it. Me, I earned most of my aggro, Samantha did not earn any of the idiocy I’ve seen today. I hope you’re all proud of yourselves. Someone should be, it won’t be me. Ever. +

+

+ So I hope she stays, but if she goes, I understand. For what it’s worth, I don’t think she’s wrong either way. +

+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+

+ +

+
+
+
+
+
+
+ +
+

+ +

+
+
+
+
+ + +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+

+ +

+
+
+
+
+ +
+
+

+ +

+
+
+
+
+
+
+ +
+

+ +

+
+
+
+
+
+ +
+
+ +
+
+ +
+
+
+ + +
+
+
+
+
+
+
+
+
+
+
+
+
+ John C. Welch +
+
+

+ Written by +

+
+
+
+

+ John C. Welch +

+
+ +
+
+
+
+
+

+
+
+ +
+
+
+
+
+
+
+
+ John C. Welch +
+
+

+ Written by +

+
+

+ John C. Welch +

+
+ +
+
+
+

+
+
+
+
+
+
+
+
+
+
+
+
+
+
+

+ More From Medium +

+
+
+
+
+ +
+
+
+ +
+
+
+
+
+
+ +
+
+
+
+ +
+
+
+
+
+ +
+
+
+
+ +
+ +
+ + +
+
+
+
+
+ +
+
+ +
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +

+ Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch +

+
+
+ +

+ Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore +

+
+
+ +

+ Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade +

+
+
+
+
+
+ + +
+

+
About +

+ Help +

+

+ Legal +

+
+
+
+

+ Get the Medium app +

+
+
+
+ A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store +
+
+ A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store +
+
+
+
+
+
+
+ + + + + + + + + + + + + + + + + + diff --git a/src/constants.rs b/src/constants.rs index ae10743..42a2c5d 100644 --- a/src/constants.rs +++ b/src/constants.rs @@ -62,9 +62,12 @@ pub static NEGATIVE: Lazy = Lazy::new(|| { }); pub static TITLE_SEPARATOR: Lazy = - Lazy::new(|| Regex::new(r#"[-|\\/>»]"#).expect("TITLE_SEPARATOR regex")); -pub static TITLE_CUT_END: Lazy = - Lazy::new(|| Regex::new(r#"(.*)[-|\\/>»] .*"#).expect("TITLE_CUT_END regex")); + Lazy::new(|| Regex::new(r#" [-|—\\/>»] "#).expect("TITLE_SEPARATOR regex")); +pub static TITLE_CUT_END: Lazy = Lazy::new(|| + RegexBuilder::new(r#"(.*)[-|—\\/>»] .*"#) + .case_insensitive(true) + .build() + .expect("TITLE_CUT_END regex")); pub static WORD_COUNT: Lazy = Lazy::new(|| Regex::new(r#"\s+"#).expect("WORD_COUNT regex")); pub static TITLE_CUT_FRONT: Lazy = Lazy::new(|| { RegexBuilder::new(r#"[^-|\\/>»]*[-|\\/>»](.*)"#) diff --git a/src/full_text_parser/readability/mod.rs b/src/full_text_parser/readability/mod.rs index cb0eaca..0b8007f 100644 --- a/src/full_text_parser/readability/mod.rs +++ b/src/full_text_parser/readability/mod.rs @@ -683,7 +683,7 @@ impl Readability { let heading = Util::get_inner_text(node, false); if let Some(title) = title { - Util::text_similarity(&heading, title) > 0.75 + Util::text_similarity(title, &heading) > 0.75 } else { false } diff --git a/src/full_text_parser/readability/tests.rs b/src/full_text_parser/readability/tests.rs index 56f6adc..53b4e54 100644 --- a/src/full_text_parser/readability/tests.rs +++ b/src/full_text_parser/readability/tests.rs @@ -307,6 +307,36 @@ async fn lifehacker_working() { run_test("lifehacker-working").await } +#[tokio::test] +async fn links_in_tables() { + run_test("links-in-tables").await +} + +#[tokio::test] +async fn lwn_1() { + run_test("lwn-1").await +} + +// #[tokio::test] +// async fn medicalnewstoday() { +// run_test("medicalnewstoday").await +// } + +#[tokio::test] +async fn medium_1() { + run_test("medium-1").await +} + +#[tokio::test] +async fn medium_2() { + run_test("medium-2").await +} + +#[tokio::test] +async fn medium_3() { + run_test("medium-3").await +} + #[tokio::test] async fn webmd_1() { run_test("webmd-1").await diff --git a/src/util.rs b/src/util.rs index a6736e9..e561e73 100644 --- a/src/util.rs +++ b/src/util.rs @@ -317,8 +317,8 @@ impl Util { pub fn text_similarity(a: &str, b: &str) -> f64 { let a = a.to_lowercase(); let b = b.to_lowercase(); - let tokens_a = constants::TOKENIZE.split(&a).collect::>(); - let tokens_b = constants::TOKENIZE.split(&b).collect::>(); + let tokens_a = constants::TOKENIZE.split(&a).filter(|token| !token.is_empty()).collect::>(); + let tokens_b = constants::TOKENIZE.split(&b).filter(|token| !token.is_empty()).collect::>(); if tokens_a.is_empty() || tokens_b.is_empty() { return 0.0; }