[Inside the Newsroom] How we discovered the duplicate precincts

7 hours ago 2

Upgrade to High-Speed Internet for only ₱1499/month!

Enjoy up to 100 Mbps fiber broadband, perfect for browsing, streaming, and gaming.

Visit Suniway.ph to learn

Manage your newsletters with ease by visiting rappler.com/newsletters.

It began when our managing editor, Miriam Grace Go (we fondly call her Gigi) asked a question in one of our group chats at 11:42 pm on Monday, May 12: “Guys, why is our Laoag count higher than the official canvass there?”

What followed was a flurry of chatter not just on a video call but also through various chat channels between the Rappler tech and data teams in Makati and our editorial team in Pasig.

It resulted in the headline we published in the early hours after election day: Around 5 million in vote discrepancy caught just in time.

Hello again! I am Gemma Mendoza, head of digital services at Rappler.

During elections, my usual post is whatever location is chosen by the Commission on Elections (Comelec) for its transparency server operations. On May 12, it was at the Circuit Corporate Center in Makati City.

The first time I had to contend with the complexities of a “quick count” in the context of an automated election system (AES) was 15 years ago, when the Comelec introduced the AES during the 2010 presidential elections. I was then editor-in-chief of the ABS-CBN News website.

I remember our astonishment when, between 10 and 11 pm on election day, it became clear who the winner for president was: Benigno Simeon Aquino III.

Since then, I have worked on Rappler’s real-time count for four automated elections: 2013, 2016, 2019, and 2022. Each of these elections comes with its own stories and controversy.

Preparations happen months before election day. During these months, we apply for access, plan for resources needed, design and develop our election results sites, conduct accuracy tests, load test and tune our servers and databases, and anticipate every possible thing that could go wrong.

This obviously has to be done in close coordination with the Comelec.

Way ahead of election day, we attend briefings on what to expect and what updates have been made to the systems. We make sure we understand and are able to explain and communicate to our audience how the system works.

Before media groups are given access to results, the Comelec also releases sample files that media groups then use to program automated tabulation systems with. We use these files to test our systems.

We have come to realize that our role in the process was not to just make sure our own website is smoothly and accurately tabulating and rendering election results for the public to see. It is to watch the process and report on any issues we see inside the transmission room as well as in the data.

In previous elections, we’ve seen delays and failures in transmission, hash errors blown out of proportion, server glitches, and other mishaps related to the quick count. When controversies crop up on social media concerning the results, we would dig into the data and investigate dubious claims.

Yet, even then, the mishap that we witnessed at the Comelec’s Circuit data center was unprecedented. There was one word for it: chaos.

First, each newsroom was seeing a different hash. Prior to that, we were briefed that we had to compare our hash codes before transmitting. This delayed the transmission of results by media groups.

By around 7:50 pm on May 12, the Comelec’s IT lead confirmed that the hashes are no longer expected to match. We were allowed to transmit the next file, which came out shortly after 8 pm.

And then there was the question of file formats. These shifted more than once, which meant we had to test first before releasing results in case we interpreted the files wrong.

Finally, file sizes were growing, and then dropping. This was before we detected the extent of the duplicated results.

Results transmission operations are usually a flurry of activity. Each news website is in a race to get the data out fast, and accurately. To avoid missing anything, we had to make sure we were logging everything and taking down notes inside the transmission room.

To juggle my memory, I scoured through various chats, server logs, and notes made by Rappler’s Tech and Data teams to piece together the full timeline of how we discovered and corrected the duplicate precincts issue.

Gigi’s question prompted us to review how we were tabulating the results files. While probing this issue, one of our data scientists, Gilian Uy, discovered that the results for Laoag were indeed wrong. The sum of votes for both mayors were more than those who voted, but less than the registered voters. Further probing revealed that this was because some precincts for Laoag were being duplicated.

This prompted Rappler to do a global check for other locations that may have been affected by the duplicate precincts issue. It is a painstaking process which took a while — from around 12 midnight through 1 am — because the results files were really big by then, consisting of tens of millions of rows.

Meanwhile, in the transmission room — where media groups had workstations that were receiving files from the Comelec Media Server — the Comelec’s IT personnel called out two media groups (not Rappler) for “publishing inconsistent results.”

In the notebook where we were keeping tabs of the progress of the transmission, our technical team noted that this happened at 12:25 am.

At 12:48 am, one the two newsrooms reported to the Comelec IT inside the server room that they saw hundreds of duplicated records. Some of the records were precincts, some were candidates.

By 1:12 am, Rappler’s data team realized that the locations and contests affected by the duplicate results were significant. We alerted the newsroom in Pasig that we may need to correct the results files.

When we checked the other media websites at that point, we confirmed that all the sites at that time were displaying similar numbers, meaning Rappler was not the only one that was unable to immediately spot the duplicate precincts.

We immediately proceeded to de-duping or removing the duplicate precincts from the lists. By 1:43 am, we were able to update the site with a file that ignored the duplicate results.

As we were preparing to write a story to explain the issue, a staff member from one of the other newsrooms informed us that they also spotted the duplicate files error, but were told not to make changes to the data because “it should be the Comelec that corrects it.”

Together, the other newsroom’s representative and I immediately approached the Comelec IT to find out how the matter should be resolved. Comelec IT informed us that the issue had already been addressed and that a new file that already removed the duplicated precincts had been released.

The time stamp of the corrected Comelec results file was 1:46 am. Rappler successfully ingested and rendered the corrected file from the Comelec on our election site by 1:48 am. – Rappler.com

Read Entire Article

[Inside the Newsroom] How we discovered the duplicate precincts

Upgrade to High-Speed Internet for only ₱1499/month!

Related

3 Villar-owned firms slapped with trading suspension

DOTr on the lookout for a new common station contractor

PBA punishes Cliff Hodge with suspension, fine for flagrant ...