How Twitter processes tons of mobile application data each day

Jonathan Vanian

Tue, Feb 17, 2015, 6:31 PM

It’s only been seven months since Twitter released its Answers tool, which was designed to provide users with mobile application analytics. But since that time, Twitter now sees roughly five billion daily sessions in which “hundreds of millions of devices send millions of events every second to the Answers endpoint,” the company explained in a blog post on Tuesday. Clearly, that’s a lot of data that needs to get processed and in the blog post, Twitter detailed how it configured its architecture to handle the task.

The backbone of Answers was created to handle how the mobile application data is received, archived, processed in real time and processed in chunks (otherwise known as batch processing).

Each time an organization uses the Answer tool to learn more how his or her mobile app is functioning, Twitter logs and compresses all that data (which gets set in batches) in order to conserve the device’s battery power while also not putting too much unnecessary strain on the network that routes the data from the device to Twitter’s servers.

The information flows into a Kafka queue, which Twitter said can be used as a temporary place to store data. The data then gets passed into Amazon Simple Storage Service (Amazon S3) where Twitter retains the data in a more permanent location as opposed to Kafka. Twitter uses Storm to process the data that flows into Kafka and also uses it to write the information stored in Kafka to Amazon S3.

Data pipeline

With the data stored in Amazon S3, Twitter than uses Amazon Elastic MapReduce for batch processing.

From the blog post:

We write our MapReduce in Cascading and run them via Amazon EMR. Amazon EMR reads the data that we’ve archived in Amazon S3 as input and writes the results back out to Amazon S3 once processing is complete. We detect the jobs’ completion via a scheduler topology running in Storm and pump the output from Amazon S3 into a Cassandra cluster in order to make it available for sub-second API querying.

Twitter

At the same time as this batch processing is going on, Twitter is also processing data in real time because “some computations run hourly, while others require a full day’s of data as input,” it said. In order to address the computations that need to be performed more quickly and require less data that the bigger batch processing jobs, Twitter uses a instance of Storm that processes the data that’s sitting in Kafka, the results of which get funneled into an independent Cassandra cluster for real-time querying.

From the blog post:

To compensate for the fact that we have less time, and potentially fewer resources, in the speed layer than the batch, we use probabilistic algorithms like Bloom Filters and HyperLogLog (as well as a few home grown ones). These algorithms enable us to make order-of-magnitude gains in space and time complexity over their brute force alternatives, at the price of a negligible loss of accuracy.

Twitter

The complete data-processing system looks like this, and it’s tethered together with Twitter’s APIs:

Twitter Answers architecture

Because of the way the system is architected and the fact that the data that needs to be analyzed in real time is separated from the historical data, Twitter said that no data will be lost if something goes wrong during the real-time processing. All that data is stored where Twitter does its batch processing.

If there are problems affecting batch processing, Twitter said its APIs “will seamlessly query for more data from the speed layer” and can essentially configure the the system to take in “two or three days of data” instead of just one day; this should give Twitter engineers enough time to take a look at what went wrong while still providing users with the type of analytics derived from batch processing.

Image copyright Shutterstock / Anthony Correia.

Related research and analysis from Gigaom Research:
Subscriber content. Sign up for a free trial.

More From paidContent.org

It’s arrived: The evolution of clean power & data centers

Suze Orman Decided To Drop Homeowners Insurance After An Outrageous Quote: '$28,000 For A 2,100-Square-Foot Condo. Are You Kidding Me?'
Finance expert Suze Orman has voiced concerns about the impact of climate change on property insurance costs, asserting it could threaten the American dream of homeownership. Orman, 72, faced a $28,000 annual insurance quote for her Florida oceanside condo, leading her to forego coverage entirely. She highlights a troubling trend where soaring insurance costs driven by frequent and severe weather events may deter Americans from buying homes. Don't Miss: For many first-time buyers, a house is abo
Benzinga•18h ago
Cathie Wood's Ark Empire Bleeds As Investor Outflows Spike: 'The Loyal Shareholders Have Become Frustrated'
In early 2021, Ark Invest, led by tech investor Cathie Wood, managed a staggering $59 billion across six funds, making it the world’s largest active ETF manager. Three years later, a significant 80% downturn occurred, with assets under management in those funds plummeting to just $11.1 billion. This decline can be attributed to a combination of factors, including high interest rates dampening Wood’s long-term speculative tech investments and the subsequent wave of outflows from disillusioned inv
Benzinga•16h ago
Vertiv Spikes On Earnings, Orders. It's A Good Sign For These AI Stocks.
Data-center coolant systems specialist Vertiv soared Wednesday on strong earnings. It's a good sign for AI stocks like Nvidia and Super Micro.
Investor's Business Daily•27m ago
JPMorgan’s ‘Bloody Friday’: Why Several Top Financial Advisors Jumped Ship the Same Day
Six teams managing nearly $15 billion in total assets quit JPMorgan Chase’s brokerage unit to join competitors last Friday. Here’s what may have triggered their departures.
Barrons.com•17h ago
Forget Bank of America; Buy This Magnificent Bank Stock Instead
This up-and-coming digital bank is ready to start generating outsized profits.
Motley Fool•3h ago
The IRS says it’s going after wealthy tax cheats. Here’s what new audit stats show.
After Congress approved billions of extra funding for tax compliance, the Internal Revenue Service pledged it would get tougher on rich taxpayers and corporations while avoiding extra scrutiny of middle-class households.
MarketWatch•21h ago
On a crucial earnings call, Musk reminds the world Tesla is a tech company. ‘Even if I’m kidnapped by aliens tomorrow, Tesla will solve autonomy’
After a dismal quarter for Tesla, CEO Elon Musk tried to reassure investors the company’s real value was its effort to perfect self-driving cars.
Fortune•10h ago
Russian court orders seizure of JPMorgan Chase funds in VTB lawsuit
A Russian court has ordered the seizure of funds in JPMorgan Chase bank accounts in Russia, court filings showed on Wednesday, in a lawsuit filed by state-owned bank VTB as it seeks to regain funds blocked abroad. JP Morgan Chase last week sued VTB in New York to halt its efforts to recover $439.5 million from an account that was blocked after Russia despatched its army to Ukraine in 2022 and VTB was hit with sanctions.
Reuters•2h ago
History Says the Nasdaq Could Soar: 2 Top Growth Stocks to Buy Now and Hold for the Bull Market
The Nasdaq Composite has returned an average of 215% during bull markets since 1990.
Motley Fool•3h ago
3 Super-Safe Stocks That Could Reach 70 Consecutive Years of Dividend Raises By 2032
When it comes to reliable passive income, these three dividend-paying companies are best in breed.
Motley Fool•1d ago

News

Life

Entertainment

Finance

Sports

New on Yahoo

Yahoo Finance

How Twitter processes tons of mobile application data each day

Recommended Stories

Suze Orman Decided To Drop Homeowners Insurance After An Outrageous Quote: '$28,000 For A 2,100-Square-Foot Condo. Are You Kidding Me?'

Cathie Wood's Ark Empire Bleeds As Investor Outflows Spike: 'The Loyal Shareholders Have Become Frustrated'

Vertiv Spikes On Earnings, Orders. It's A Good Sign For These AI Stocks.

JPMorgan’s ‘Bloody Friday’: Why Several Top Financial Advisors Jumped Ship the Same Day

Forget Bank of America; Buy This Magnificent Bank Stock Instead

The IRS says it’s going after wealthy tax cheats. Here’s what new audit stats show.

On a crucial earnings call, Musk reminds the world Tesla is a tech company. ‘Even if I’m kidnapped by aliens tomorrow, Tesla will solve autonomy’

Russian court orders seizure of JPMorgan Chase funds in VTB lawsuit

History Says the Nasdaq Could Soar: 2 Top Growth Stocks to Buy Now and Hold for the Bull Market

3 Super-Safe Stocks That Could Reach 70 Consecutive Years of Dividend Raises By 2032