Statistics for Dummies: How to Read (and Lie) with Statistics

Bagaimana Membaca (dan Menipu) Dengan Statistik



Before you learn to lie, you must learn how to do statistics right. Then you will know when others are lying to you. You don't have to be good at Maths, you just need common sense.

* * *
Sebelum kamu belajar menipu, kamu kena belajar cara menggunakan statistik dengan betul. Baru kamu akan tahu bila orang menipu kamu. Kamu tak perlu pandai Matematik, cuma perlu berfikir dengan logik.



Here are the number of patients in two states. Which state is sicker? State B, right? Because State B has 3 times more sick people than State A.

This is called Absolute Number (the actual number).

Di atas adalah bilangan pesakit untuk dua negeri. Negeri manakah yang lebih sakit? Pasti Negeri B, kan? Sebab Negeri B ada 3 kali ganda pesakit Negeri A.

Ini dipanggil Absolute Number (Angka Mutlak/angka yang terpapar).



Now, let's look at the population of the state also. Which state is sicker?
Sekarang kita lihat sekali dengan bilangan penduduk. Negeri manakah yang lebih sakit?



To know this, we have to calculate the rate of infection. That means what percent of the population is infected.

How to calculate? Simple: Number of patients ÷ Population x 100. This gives us the percent (per 100).

So which state is sicker? State A has a smaller number of patients (100) but it is sicker because a larger percentage of their population is infected (0.1%) compared to State B (0.06%).

See how different it is than when you were just looking at absolute numbers? You should always look at the percent who are sick. Absolute numbers don't mean anything.
Untuk ketahui negeri mana yg lebih sakit, kita perlu kira kadar jangkitan, iaitu berapa peratus penduduk dijangkiti.

Macam mana nak kira? Senang saja: Bilangan pesakit ÷ Bilangan penduduk x 100. Dapatlah peratus (per 100).

Jadi, negeri manakah yang lebih sakit? Negeri A mempunyai bilangan pesakit yang lebih kecil (
100 orang) daripada Negeri B (300 orang), tapi Negeri A lebih sakit kerana peratus penduduk mereka yg dijangkiti (0.1%) adalah lebih tinggi daripada Negeri B (0.06%).

Nampak tak bezanya daripada membandingkan angka mutlak? Kamu patut selalu lihat peratus. Angka mutlak tak bermaksud apa2.



What is 0.1% and 0.06% sick? All these zeroes are confusing! We cannot relate it to the real world.

So, we move the decimal point and make the numbers bigger by multiplying them by 100,000 so that we can have a round number. The formula is still the same--Number of patients ÷ Population x 100,000.

So now the number is easier to understand--there are 100 sick people per 100,000 people in State A, and 60 sick people per 100,000 people in State B. This number is called per capita (per unit of population).

If the % has a lot of zeroes, like 0.0002%, then just multiply by 1,000,000 or 10,000,000 to  move the decimal point and get a whole number. The % is still the same, but the number is now easier to understand.
Apa itu 0.1% dan 0.06% sakit? Banyak sangat kosong ni, pening lah nak bandingkan dengan realiti.

Jadi, kita pindahkan titik perpulahan dan jadikan nombor itu lebih besar dengan mendarab dengan 100,000 untuk menjadikannya nombor bulat. Formulanya tetap sama--Bilangan pesakit 
÷ Bilangan penduduk x 100,000.

Sekarang, angka ini lebih senang nak faham--terdapat 100 orang pesakit di kalangan setiap 100,000 penduduk di Negeri A, dan 60 orang pesakit di kalangan setiap 100,000 orang penduduk di Negeri B. Angka ini digelar per kapita (per populasi penduduk).

Kalau sesuatu peratusan ada banyak kosong, misalnya 0.0002%, kamu hanya perlu darab dengan 1,000,000 atau 10,000,000 untuk menjadikannya angka bulat. Peratusnya tetap sama, cuma lebih senang faham.



Ok, now let's look at real world data. Which group is the sickest?

These are absolute numbers. Looks like adults 18-59 years old are the sickest because they have the highest number.

https://covidnow.moh.gov.my

Sekarang kita lihat data sebenar. Kumpulan manakah yg paling sakit?

Ini adalah angka mutlak. Pasti orang dewasa berumur 18-59 tahun adalah yg paling sakit, sebab bilangan kumpulan ini adalah yg paling ramai.



But... what is their population? Let's look at the age groups:

Children, age 0-11: Comprises people of 12 years of different ages
Adolescents, 12-17: 6 years
Adults, 18-59: 42 years
Seniors, ≥60
: Dunno got how many years.

From here, we can see that the largest group is adults, because it covers an age range of 42 years. The population of children is double the population of adolescents because their age group contains people of 12 years, whereas the adolescent group only contains an age range of 6 years.

* * *

Tapi... apakah populasi mereka? Jom lihat lingkungan umur setiap kumpulan:

Kanak-kanak, 0-11 tahun: Mengandungi orang daripada 12 umur yg berbeza
Remaja, 12-17 tahun: 6 tahun
Dewasa, 18-59 tahun: 42 tahun
Warga emas, ≥60 tahun: Tak tau ada berapa tahun.

Dari sini, kita boleh lihat bahawa kumpulan yg paling besar adalah orang dewasa kerana ia merangkumi penduduk di dalam lingkungan umur 42 tahun. Populasi kanak-kanak adalah 2x ganda populasi remaja kerana kumpulan kanak2 merangkumi lingkungan umur 12 tahun, manakala kumpulan remaja hanya merangkumi lingkungan umur 6 tahun.




Now, let's look at per capita numbers. This is per 100 people in the population (%). So, which group is the sickest? It's actually a tie between seniors (0.3%) and adults (0.3%). So even though the absolute number of seniors is low, their population size is also low. The absolute number of adults is high because their population is large. When we look at the percentage, we see that the sickest are adults and seniors, not just adults. From here, we also see that children and teenagers have much lower infection rates compared to adults.


Sekarang kita lihat angka per kapita. Ini adalah per 100 orang rakyat (%). Jadi, kumpulan manakah yang paling sakit? Ia adalah seri di antara warga emas (0.3%) dan dewasa (0.3%). Walaupun angka mutlak warga mas rendah, bilangan penduduk mereka juga rendah. Angka mutlak kumpulan dewasa tinggi, tetapi itu kerana populasi mereka besar. Jadi, apabila kita lihat angka peratus, barulah kita nampak bahawa kumpulan yg paling sakit adalah dewasa dan warga mas, bukan dewasa sahaja. Dari sini juga kita boleh melihat bahawa kanak-kanak dan remaja mempunyai kadar jangkitan yang lebih rendah daripada dewasa dan orang tua.



Ok, a quiz to see if you learned anything. What's happening in this picture?

* * *
Ok, satu kuiz untuk tengok samada kamu faham apa yg kamu belajar tadi. Apakah yg sedang berlaku di dalam graf ini?



First, are these absolute numbers or per capita? That's very important! Where to see? How to know?

Look on the Y axis (left side). The numbers range from 0 to 12,000. It doesn't say "Per 1,000", so these are absolute numbers. Absolute numbers are useless.

What do you see from this graph? Unvaccinated (grey line) and partially vaccinated patients (light green) are going down while fully vaccinated patients (dark green) did not go down much and are now the biggest group infected. So is it time to facepalm?

Always remember to compare it as a % of the population. What is the population of each group? Certainly the population of unvaccinated people is shrinking while the fully vaccinated group is getting larger. So of course infection rates follow suit. (That's also because vaccines are not as effective as they claim to be.) Graphs with absolute numbers are useless because they don't tell us anything.


Mula2, adakah ini angka mutlak atau per kapita? Ini paling penting! Nak tengok kat mana? Macam mana nak tahu?

Lihat paksi Y (sebelah kiri). Nombornya di antara 0 ke 12,000. Tak ada tertulis "Per 1,000 orang rakyat", jadi ini adalah angka mutlak. Angka mutlak tak berguna.

Apakah yang kamu nampak di dalam graf ini? Pesakit yang tidak divaksin dan tak lengkap vaksin berkurangan manakala pesakit yang lengkap divaksin tak turun banyak dan sekarang merupakan kumpulan yg paling besar dijangkiti. Adakah kita perlu panik sekarang?

Jangan lupa bandingkan angka mutlak sebagai peratus penduduk. Apakah populasi setiap kumpulan? Pasti rakyat yang tidak divaksin semakin berkurangan, manakala kumpulan yang lengkap divaksin semakin bertambah. Jadi, sudah tentu kadar jangkitan pun ikut naik. (Ini juga disebabkan vaksin tidak begitu berkesan seperti yang diiklankan.) Graf yg menggunakan angka mutlak tak guna sebab ia tidak memberitahu kita apa2.



Now, what do you see in this chart? Always look for the most important thing--is this absolute number or per capita? It says "Per 100 people" and "Per 10 million people", so this is per capita. This graph is useable.

Here, we see in the first chart that the percentage of fully vaccinated people who got infected (0.2%) is double the rate for the unvaccinated (0.1%). The death rate also follows the same pattern. Now you can facepalm. Per capita graphs give us good information.

Sekarang, apa yang kamu nampak di dalam carta ini? Mula2, cari benda yg paling penting--adakah ini angka mutlak atau per kapita? Di atas ada tulis "Per 100 orang" dan "Per 10 juta orang", jadi ini adalah per kapita. Ok, graf ini boleh pakai.

Kita boleh lihat di dalam carta pertama bahawa peratus orang lengkap divaksin yang dijangkiti (0.2%) adalah dua kali ganda peratus orang yang tidak divaksin (0.1%). Kadar kematian juga mengikut corak yang sama. Sekarang kamu boleh facepalm. Graf per kapita memberi kita maklumat yg berguna.




How to Lie with Statistics

Now that you know how to read statistics, let's see if you can spot the lies.

Sekarang kamu sudah tahu membaca statistik, jom lihat samada kamu boleh nampak di mana penipuannya.



The article below says unvaccinated deaths are 11x the number of vaccinated deaths. Is it true or not?

Firstly, the most important question--are these absolute numbers or per capita? "10,211 unvaccinated people died." These are absolute numbers. What have you learned about absolute numbers? They are useless! Because we don't know the population size of the group. If the population is large, then of course deaths are many also.

At the beginning of the vaccination campaign, nobody was vaccinated. The majority of patients were unvaccinated simply because vaccinations were rolled out gradually, so of course the majority of people who died were also not vaccinated, and vaccinated deaths only increased along with vaccination. So if you are going to count absolute numbers, of course the number of unvaccinated is larger. Simply because you are counting numbers from when vaccines were not available.

Do you see how they lied? They compared absolute numbers instead of percent. We don't know the population of the vaccinated and unvaccinated on these dates. They included numbers from when the majority of the country's population was not vaccinated. (They  probably counted deaths from before vaccination started too.) Is that a fair comparison? If you didn't know statistics, you would surely believe their lie.

10 Sept 2021, https://www.malaysiakini.com/news/590779

Menurut artikel ini, kematian pesakit yg tidak divaksin adalah 11x ganda pesakit yg divaksin. Betul kah ni?

Mula2, soalan yg paling mustahak--adakah ini angka mutlak atau per kapita? "10,211 orang yang tidak divaksin meninggal." Ini adalah angka mutlak. Apakah yang telah kamu belajar tentang angka mutlak? Ia tidak berguna! Kenapa? Sebab kita tak tahu bilangan penduduk. Kalau populasi kumpulan ini ramai, mestilah bilangan kematian pun tinggi.

Pada permulaan kempen vaksinasi, sudah tentu tak ada siapa yang divaksin. Majoriti orang yang dijangkiti tidak divaksin kerana rakyat divaksin secara beransur-ansur, jadi sudah tentu majoriti orang yang meninggal dunia juga tidak divaksin dan kematian yang divaksin hanya meningkat seiring dengan vaksinasi. Jadi kalau kamu lihat angka mutlak, sudah tentu bilangan yang tidak divaksin adalah lebih banyak.

Nampak tak penipuan mereka? Mereka membandingkan angka mutlak dan bukan peratus. Mereka mengira kematian pesakit yg tidak divaksin ketika majoriti negara tidak divaksin. (Mereka mungkin juga mengira kematian sebelum adanya vaksin.) Adakah ini perbandingan yg adil? Kalau kamu tak tahu statistik, pasti kamu percaya penipuan mereka.



How to Calculate the % of Reduction

Below is real-world data. In 2007, 25.7% of Malaysians are smokers. In 2018, 21.7% are smokers. So we see that the rate has gone down. The question is, by how much?

Most people would say it has gone down by 4%. This is called the Absolute Risk Reduction (ARR) rate. You just subtract the two numbers: 25.7 - 21.7 = 4%.

But, if you want to make it more dramatic, you can calculate the Relative Risk Reduction (RRR) rate. Take the 4% and count that as a percentage of the larger number, that means (4 ÷ 25.7) x 100 = 15.6%. Now it looks a lot more.

Both are valid ways to calculate. So, if you want to impress people, which method would you use? Certainly you would use RRR to show that you did a great job and reduced smokers by 15.6%.

When Pfizr said that their vaccine is 95% effective, which number do you think they used? Of course they used the RRR. Because the ARR is just a pitiful 1%. If they used this number, who would buy their product?

So you see, in statistics, you can use different methods of calculation to make the data look better than it really is.

Bagaimana Mengira % Penurunan

Ini adalah data sebenar dari Malaysia. Pada tahun 2007, 25.7% rakyat Malaysia adalah perokok. Pada 2018, 21.7% adalah perokok. Kita nampak bahawa % perokok telah turun. Soalannya, turun berapa banyak?

Kebanyakan orang akan kata perokok telah berkurangan sebanyak 4%. Ini dipanggil kadar Pengurangan Risiko Mutlak (Absolute Risk Reduction, ARR). Cara kira ini senang saja--cuma tolak dua angka: 25.7 - 21.7 = 4%.

Tapi, kalau kamu nak bagi perbezaan ini nampak lebih dramatik, kamu boleh kira kadar Pengurangan Risiko Relatif (Relative Risk Reduction, RRR). Ambil 4% dan kira angka ini sebagai peratus daripada angka yg lebih besar: (4 ÷ 25.7) x 100 = 15.6%. Sekarang, peratus penurunan nampak lebih besar.

Kedua-dua adalah cara yang betul. Jadi, kalau kamu nak bagi penurunan ini nampak lebih hebat, angka manakah yang akan kamu gunakan? Sudah tentu kamu akan guna RRR untuk menunjukkan bahawa kamu telah buat kerja dengan baik dan berjaya mengurangkan perokok sebanyak 15.6%.

Apabila Pfizr kata beksin mereka 95% berkesan, agaknya angka mana yang mereka pakai? Mestilah RRR. Kerana ARR hanya 1%. Kalau guna angka ini, pasti barang tak laku.

Di dalam statistik, cara pengiraan yang berbeza boleh digunakan untuk membuat data nampak lebih cantik daripada yg sebenarnya.



Case Fatality Rate

Case Fatality Rate (CFR) means what % of patients die of a particular disease. For example, MERS (Middle East Respiratory Syndrome, a cousin of SARS), is 34% fatal (Source). That means 1/3 of the people who got MERS died, and 2/3 recovered. (We can only count people whose cases are completed. Completed means you either survived or died. Those who are still sick or in the hospital cannot be counted because we don't know how their story will end.) Chickenpox is 0.001% fatal (1 out of 100,000 patients die). So, MERS is a more deadly disease than chickenpox.

How to calculate CFR? Simple:


If we want to look at the effectiveness rate of vaccines at preventing death for a particular disease, then we should use this formula.

How to lie? By dividing the number of people who died by the entire population of the country. Why is this wrong? Because you cannot die of a disease if you don't have it! What is your risk of dying of MERS if you don't have MERS? 0%. But some people who shall remain nameless invent their own formula and calculate it like this to make the % of deaths look smaller than it really is. The resulting numbers are what you always see in the news.

* * *

Kadar Kematian Kes (Case Fatality Rate, CFR) bermaksud berapa % pesakit yg menghidap sesuatu penyakit akan mati. Misalnya, MERS (Middle East Respiratory Syndrome, sepupu SARS), mempunyai kadar kematian 34% (Sumber). Maksudnya 1/3 orang yg kena MERS mati, dan 2/3 sembuh. (Kita hanya boleh kira pesakit yang kesnya sudah selesai. Selesai  bermaksud pesakit itu sembuh atau meninggal dunia. Mereka yang masih sakit atau berada di dalam hospital tak boleh dikira sebab kita tak tahu lagi kesudahan penyakit mereka.) Kadar kematian cacar air (chickenpox) adaalah 0.001% (1 dalam 100,000 pesakit). Jadi, MERS adalah penyakit yg lebih teruk daripada chickenpox.

Macam mana kira CFR? Senang saja:


Kalau kita nak bandingkan keberkesanan vaksin dalam mencegah kematian untuk sesuatu penyakit, kita patut gunakan formula ini.

Macam mana nak tipu? Dengan membahagikan pesakit yang mati dengan seluruh penduduk negara. Kenapa ini salah? Sebab kamu tak boleh mati akibat sesuatu penyakit kalau kamu tak hidap penyakit itu! Apakah risiko kamu mati akibat MERS kalau kamu tak kena MERS? Sudah tentu 0%. Tapi ada orang yg kita tak mau sebut namanya mencipta formula sendiri dan mengira macam ni untuk menjadikan % kematian lebih kecil daripada yang sebenarnya. Hasilnya ialah angka yg kamu selalu nampak di dalam berita.



* * * * *

More graphs on 
Homepage.

Follow me on:
Facebook ☆ Telegram

Comments

  1. I wonder if you saw this and could do your magic to debunk it … for fun. Thanks. If you do it, post in your Telegram ya.
    https://fortune.com/well/2022/12/13/covid-unvaccinated-greater-risk-car-crash-traffic-accident-new-study-says-canada-government-records-pfizer-moderna/

    ReplyDelete

Post a Comment