Aggregation has long been identified as both a benefit and a curse on media publishers the world over.

New media powerhouses like The Huffington Post have mastered and fine-tuned not just aggregation, but also social media, comments from readers, and most of all, a sense of what its public wants. In the process, Huffington has helped media companies, new and old, understand the appeal of aggregation: its ability to give prominence to otherwise unheard voices and to bring together and serve intensely engaged audiences, as well its minimal costs compared to what’s incurred in the traditionally laborious task of gathering original content.

Huffington often says that aggregation benefits original-content producers as much as it does the aggregators. When sites promote each others’ content, they create more engaged audiences through additional page views and commentary.

News organisations have always blended material from a variety of sources by combining editorial content from staff, news services, and freelancers; adding advertising; and then distributing the package to consumers. In the digital world, news aggregation is not so different.

There are two basic models for aggregation. The cheapest way to aggregate news is through code and algorithms, with little or no human intervention – think Google News or Yahoo News. On the other end of the spectrum, publications like Huffington Post starts with algorithmic selections but puts them into the hands of human editors who set priorities for sections and then condense, rewrite or bring several organisations’ versions of the same story together.

But what happens when aggregation turns into automation? A third model is emerging where the content itself is created with software, then aggregated and published.

Take a moment to read Robbie Allen’s “How I automated my writing career”.

In November of 2010, his company, Automated Insights, launched the StatSheet Network, a collection of 345 websites (one for every Division-I NCAA Basketball team) that are fully automated.

It’s easy to dismiss this example as ‘that’s just regurgitation of stats, of course that can be automated’ and you’d be right – to a point.

Here’s an excerpt from a fully automated article on one of their sites:

“The Tar Heels got to the NCAA Tournament as an at-large team after falling to Duke, 75-58, in the ACC tournament. In making the Elite Eight, North Carolina defeated 15th-seeded Long Island, 102-87 in the second round, seventh-seeded Washington, 86-83 in the third round, and then 11th-seeded Marquette, 81-63 in the Sweet Sixteen.

North Carolina was led by Tyler Zeller, who had 21 points on 75% shooting. The Tar Heels also got 18 points from Harrison Barnes, 11 from Dexter Strickland, and seven from Kendall Marshall.

Kentucky was on fire from beyond the arc, scoring 36 points in three-pointers to get an edge.

The Wildcats got their top scoring out of a game high from Brandon Knight with 22 points and seven rebounds. DeAndre Liggins (12), Josh Harrellson (12), Terrence Jones (11), and Darius Miller (11) all hit double figures.”

If that isn’t enough to send a shiver down your spine, consider this; machine-created content will never be worse, or more expensive to produce, than it is today. It will only get better, cheaper and more accessible to both legitimate publishers attempting to make their workflow more efficient, content farms who can finally do away completely with the human element and to spammers.

And it’s the latter who are most likely to push the boundaries faster more than anyone. In the never ending war to fool a tiny percentage of Internet users into helping out a poor Nigerian lottery winner, the algorithms used by spammers need to be more complex and more “human”.

As Allen helpfully points out:

  • Software doesn’t get writer’s block, and it can work around the clock.
  • Software can’t unionize or file class-action lawsuits because we don’t pay enough (like many of the content farms have had to deal with).
  • Software doesn’t get bored and start wondering how to automate itself.
  • Software can be reprogrammed, refactored and improved — continuously.
  • Software can benefit from the input of multiple people. This is unlike traditional writing, which tends to be a solitary event (+1 if you count the editor).
  • Perhaps most importantly, software can access and analyze significantly more data than what a single person (or even a group of people) can do on their own.


The slide down this slippery slop has already begun; and not just with websites.

Check out this video about Soylent, a plug-in for Microsoft Word. It’ll shorten down text to fit a specified length and offers a few options. It’ll  proofread and correct grammatical errors. It allows the author to do wholesale changes to a document like “Change this to past tense”.

There’s no doubt the current technology can be a valuable tool for media outlets, after all it’s only suited to quantitative and data-driven work. This allows journalists to focus on (cough) qualitative commentary however this is the crucial point.

Journalists must establish their personal, human, stamp on the work they produce. Regurgitating press releases and re-hashing statistics won’t cut it. Anything suitable for automation – which is a lot – will be picked up by newsrooms the world over as managers and publishers scramble to try and reduce overheads.

And then that boundary will shift. And shift again, and slowly the room of writers becomes a room of servers with a couple of database admins, and one or two sub-editors just checking through a cursory selection of articles.

Writers whose unique style engages readers and builds a dedicated following are the ones best placed to fight off this new threat to the established traditions of news media.

For the record, in the interest of self-preservation, I for one welcome our new robot overlords.