While Internet trolls and members of Congress wage war over edits on Wikipedia, Swedish university administrator Sverker Johansson has spent the last seven years becoming the most prolific author…by a long shot. In fact, he’s responsible for over 2.7 million articles or 8.5% of all the articles in the collection, according to The Wall Street Journal.
And it’s all thanks to a program called Lsjbot.
Johansson’s software collects info from databases on a particular topic then packages it into articles in Swedish and two dialects of Filipino (his wife’s native tongue). Many of the posts focus on innocuous subjects — animal species or town profiles. Yet, the sheer volume of up to 10,000 entries a day has vaulted Johansson and his bot into the top leaderboard position and hence, the spotlight.
The bot’s automatically generated entries are not the beautifully constructed entries one would find within the pages of the Encyclopedia Britannica, for example. Many posts are simply stubs – short fragments of posts that require editing and/or additional information — because the bot is dependent on what’s readily available on the web. Being on Wikipedia, nothing stops someone from refining the stubs and editing them into the beautiful prose that would make any human proud.
Whether Wikipedia purists approve of Lsjbot or not, data scraping software that can mass produce articles is increasingly on the rise.
Just last month, the Associated Press announced that it would be using software called Wordsmith, created by startup Automated Insights, to produce stories on the quarterly corporate earnings from US companies. Since October of 2011, Narrative Science has been automatically generating sports and finance stories on Forbes without much fanfare.
It isn’t just companies getting into the automated content game. Recently, a LA journalist utilized a bot to post a report just three minutes after an earthquake. Another academic, Philip Parker, has created over 100,000 ebooks on Amazon through similar software.
Much of this software employs fairly simple search functions to capture the data and reformat it into articles. In other words, very minimal artificial intelligence. Yet, growing interest in machine learning and natural language processing will inevitably mean that the quality of bot-generated content will only increase.
In the very near future, software-created articles will be indistinguishable from a vast amount of human-produced content. Whether that’s a good or bad thing, you can be sure the Wikipedia article on the subject will be furiously edited over time.
[Photo credit: STML/Flickr]