Two new research papers, written by Leskovec and Stanford PhD candidate Jaewon Yang, reveal patterns in the way news stories are shared online, which offer a way to predict early on how a story's popularity will rise and fall.
Predicting how widely a news story, or any other piece of information, will travel could help websites position their content and advertising more effectively, Leskovec says. It could also help determine influence of a writer or blogger, by showing how his or her content is shared. Combined with other work, it could help provide a better picture of how information travels online generally.
The researchers analyzed 170 million news articles and blog posts over the course of a year, and 580 million Twitter posts over eight months. They measured the attention each piece of content received by tracing how many times it was mentioned in other blog posts, news stories, and tweets. They did this not by looking at links, but by tracking the appearance of distinctive phrases—such as "lipstick on a pig"— in blog posts and articles. They used this data to create a graph that revealed six distinct patterns. Some stories, for example, spiked rapidly and then fell away, making a sharp, pointed shape. Others had more staying power, rising and falling more gently.
"By looking at when particular types of media get involved, you can see different patterns arise," Leskovec says. For example, if a blog breaks a story, the pattern tends to be different than when a story is broken by a traditional news media. The point at which blogs get involved in a story, Leskovec says, is a major factor in determining its longevity. For example, even if traditional media focus on a story for a brief time, blog discussion can keep it in the public eye longer.
The early response to a new piece of content allowed the researchers to predict, with 75 percent accuracy, the shape of that item's popularity over a longer period.
Leskovec says that these results are particularly powerful when combined with tools that can predict the volume of attention that a story will get, rather than just the pattern by which it will spread. To predict volume, the researchers look at where an item is published, its subject area, and other factors.
The research could be used to help sites manage their content, Leskovec says. For example, a large news site might use the approach to decide how long to give a story a prominent place on its front page.
Ilya Grigorik, CTO and cofounder of PostRank, a company that performs real-time analysis of topics and trends online, says the researchers' findings agree with the data his company has collected. In particular, he notes that stories are most talked about within the first 24 hours. PostRank has observed that 50 percent or more of the attention a story gets happens within the first hour, and 80 percent or more happens within the first 24—numbers that Grigorik says have been consistent over the past three years.
Grigorik thinks that more fine-tuning would need to be done to make the work useful in practice. In particular, he thinks the shapes the researchers identified need more characterization, so that people can grasp what it means about a story for it to follow a particular shape.
News-aggregation sites might use a tool based on the research to predict how well posts will do, Grigorik says, although it's unclear how much more effective that would be than using editorial judgment.
Jon Kleinberg, a professor of computer science at
Leskovec plans to do more research on how information spreads on the Internet. He and his colleagues are also looking into how information changes as it travels, possibly gaining insight into how rumors and inaccuracies are introduced.