Urban75 Home About Offline BrixtonBuzz Contact

Coronavirus - worldwide breaking news, discussion, stats, updates and more

Yes, I've not managed to find anywhere that will compare things on a graph at that level. The site I took those graphs from will break down by US state but not into regions of European countries.

I would be very interested to see things graphed against population density, in city-sized units, across a number of different countries.
Yeah, every country has its hotspots. Some of the countries are more hotspot-sized, so their figures can be alarming. New York state now has more than 500 deaths per million. If it were a country, it would top the charts atm.
 
Yep, Belgium's in a mess. If you're breaking things down to more like Belgium-sized chunks, it also makes more sense to view the states of New York and New Jersey as one unit. London on its own, or perhaps London plus its commuter belt (which would be about Belgium-sized), would also not look too pretty.
The only way to make anything close to meaningful comparisons between geographically separate areas is to factor in population density and at a sub-national granularity. One has to figure out precisely what question it is you are trying to answer first in trying to process the data in this way. I suspect that countries are too disparate (structurally, culturally, reporting mechanisms, healthcare setups, etc, even at various internal levels) to easily compare on a like for like basis.
 
I think the conspiracy theorists will be trying to use the fact that there was research into bat-derived SARS-like Corona viruses for a number of years to construct new paranoid narratives.
 
Yep, Belgium's in a mess. If you're breaking things down to more like Belgium-sized chunks, it also makes more sense to view the states of New York and New Jersey as one unit. London on its own, or perhaps London plus its commuter belt (which would be about Belgium-sized), would also not look too pretty.
Need to be a little cautious, according to the Guardian (my emphasis),

========

Belgian authorities have announced 4,157 people have now died from coronavirus, making the country among the worst affected in Europe.

Deaths from coronavirus in Belgium exceeded the 4,000 mark with 262 new deaths confirmed in the last 24 hours, the National Crisis Centre reported in its daily briefing on Tuesday. Cases rose to 31,119, as testing was stepped up. The number of new patients admitted to hospital each day is continuing on a slow downward trend that has been evident since the end of March.

According to the European Centre for Disease Prevention and Control, the number of deaths in Belgium is behind only Italy, Spain, France and the UK, all significantly more populous countries. Belgium, which has a population of 11.5m, has recorded more deaths than its neighbour the Netherlands, which has a population of 17.1 million and 2,727 deaths at the latest ECDC count.

Comparisons are difficult as all countries are at different phases of the outbreak and calculate the death rate in varying ways. Belgium includes deaths in care homes, which so far account for 46% of all fatalities. Belgian authorities also include people suspected, but not confirmed, of having died from Covid 19.

The country’s national security council is due to meet on Wednesday to discuss when to end the lockdown, following a report from an expert group. The francophone state broadcaster RTBF reported that the lockdown was likely to be extended until 3 May.

The mayor of Brussels, Philippe Close, has called on the government to announce a date for ending the lockdown. “It is not enough to ask people to deprive themselves of their individual freedoms without telling them ‘we will come back to you and explain the next step,’” he said.

He also said people could forget about any mass events before the end of June.
 
Interesting, I was surprised population density had such a weak relationship. So, looks down to government policy, particularly how soon lockdown came - less of a surprise.
It doesn't really, at least not on that data, R^2=0.15 is not a strong correlation.
 
These things always mostly come down to what you’re specifically measuring and how you measure the way it changes. Picking the metric is 90% of the battle. I’m not convinced that a random journalist graphing the first data he comes across is really the best guide.
 
Thanks.
He's not really graphed against density though - number of large cities is not the same thing, and you'd need it broken into smaller geographical units to tell you anything useful.
Yeah, this second bit is important re comparisons. Good info from redsquirrel about the big discrepancies in reporting deaths - glad to hear that that is at least part of why Belgium looks so bad (although of course it also points at others actually being a lot worse than their figures might suggest). But another reason such figures can come up is due to the size of units being compared differing. US state of New York would have the highest deaths per capita in the world for a unit that kind of size if it were a country (San Marino may be ahead, but that's another example of the problem - San Marino is minute), and NY has a larger population than Belgium, but instead, it's lumped in with a Europe-sized unit when doing country comparisons. So there's a bit of a randomising factor here, and the smaller the unit being considered, the greater the chance of big anomalies either way - either from a region missing big hotspots or a region containing them - without necessarily reflecting much on that region's policies. And of course borders are a bit arbitrary - the New York and New Jersey outbreaks are essentially part of the same process given that Jersey City is effectively a suburb of NY City.
 
These things always mostly come down to what you’re specifically measuring and how you measure the way it changes. Picking the metric is 90% of the battle. I’m not convinced that a random journalist graphing the first data he comes across is really the best guide.

He’s not just a random journalist. Worth following the discussions on his graphs if data is something your interested in.
 
These things always mostly come down to what you’re specifically measuring and how you measure the way it changes. Picking the metric is 90% of the battle. I’m not convinced that a random journalist graphing the first data he comes across is really the best guide.
Yeah, I mean I think lockdown date probably does (or at least can) have a significant effect, it's just that from the data being used there is only a weak relationship.
 
The graph he's done showing deaths per capita vs country population, where I'm not exactly sure what he's trying to say (he seems to be countering people like me who want to see, generally, figures per capita rather than per country) -

It would be interesting to see what changes if you do the same but
(a) plot individual US states instead of the US as a whole, and/or
(b) plot the EU as a single entity, rather than as individual nations.
 
Also, there's a thing going round correlating woman leaders with more successful management of the epidemic. Let's graph that too.
 
About the escaped bioweapon/germ warfare theory: if C19 had escaped from a Chinese research lab then it would seem likely that the authorities would have known and acted immediately. They may have even succeeded in suppressing both the epidemic and any knowledge of it, at least in the short term.

That's what happened in Russia. They invested a lot in germ warfare research during the Cold War, possibly as a result of having first used bioweapons (tularemia) during the defence of Stalingrad, for which some circumstantial evidence exists. In 1979 there was an anthrax outbreak in Sverdlovsk caused by an accident at a military lab. The Russian authorities claimed it was the result of contamination in a meat processing factory and this story stuck until 1992.
 
Last edited:
Also, there's a thing going round correlating woman leaders with more successful management of the epidemic. Let's graph that too.

I've seen a similar thing somewhere as well.
And I'd like to see more about that too.
Germany's a significant one fair enough, but it must also influence that statistic significantly that New Zealand and Iceland are part of it ... (??)
 
He’s not just a random journalist. Worth following the discussions on his graphs if data is something your interested in.
The data he has chosen to graph has so many problems that if he’s claiming data expertise, that’s worrying.

Even if you cleaned up the data, though, the more fundamental problem is that it would only act to confirm the biases inherent in the way you cleaned it. We have a qualitative understanding of the way this virus spreads that is way richer than some blunt quantitative measures with problems in it. So we use our qualitative understanding to clean the quantitative measure and lo! We end up confirming what we already qualitatively understood. It’s begging the question.

If you wanted to do this properly, you’d need to standardise the data, because littlebabyjesus is right — smaller data sets are inherently more variable, so the smallest sets will always be to and bottom of a list that derived from the same underlying distribution. So that means understanding the intrinsic correlation — how well a dataset correlate with itself as it grows. And you’d need to allow for skewness, particularly because dependency is unlikely to follow a Gaussian copula (ie ranknormal correlation). And then you need to recognise the “hot spot” issue that teuchter is talking about.

In short, cleaning this data is such a big job that all it does is feed to you back the assumptions you made in cleaning it.

And yes, I’m interested in data. Professionally, not because it’s fun.
 
My biggest problem with all these graphs is that, for the most part, the raw data is obviously a load of old fucking bollocks.

There can be no way that countries like India and Brazil have the faintest idea of how many covid-19 related deaths they have even if they really wanted to be up front and honest about it. Any countries that have large slum dwelling and rural poor will simply have no means to count. That's before government starts cooking the books as they have been doing in pretty much every country.
 
About the escaped bioweapon/germ warfare theory: if C19 had escaped from a Chinese research lab then it would seem likely that the authorities would have known and acted immediately. They may have even succeeded in suppressing both the epidemic and any knowledge of it, at least in the short term.

How does that work? A major feature of accidental lab releases is that they are often not noticed until the broad health implications for the population start to show up in dramatic ways (in this case viral pneumonia and death in quantities that alerted local hospitals). And there is expected to be a big lag between these things, which means by the time the authorities notice, infection is widespread. And this sort of scenario is more likely in a disease with high transmissibility and with a very wide range of outcomes including plenty of mild or asymptomatic cases. All it takes is for someone who works at the lab to get infected and not realise, and they can easily spread it to the community in which they live.
 
Last edited:
The data he has chosen to graph has so many problems that if he’s claiming data expertise, that’s worrying.

Even if you cleaned up the data, though, the more fundamental problem is that it would only act to confirm the biases inherent in the way you cleaned it. We have a qualitative understanding of the way this virus spreads that is way richer than some blunt quantitative measures with problems in it. So we use our qualitative understanding to clean the quantitative measure and lo! We end up confirming what we already qualitatively understood. It’s begging the question.

If you wanted to do this properly, you’d need to standardise the data, because littlebabyjesus is right — smaller data sets are inherently more variable, so the smallest sets will always be to and bottom of a list that derived from the same underlying distribution. So that means understanding the intrinsic correlation — how well a dataset correlate with itself as it grows. And you’d need to allow for skewness, particularly because dependency is unlikely to follow a Gaussian copula (ie ranknormal correlation). And then you need to recognise the “hot spot” issue that teuchter is talking about.

In short, cleaning this data is such a big job that all it does is feed to you back the assumptions you made in cleaning it.

And yes, I’m interested in data. Professionally, not because it’s fun.
Yeah good point about assumptions. However, you can still get somewhere with that by making predictions based on those assumptions, no?

For instance, in trying to work out where this is going (doing it for its own sake, not professionally), I've made a few basic and broad assumptions. First that lockdown works in reducing spread and differences in lockdown effectiveness due to differences in the details of implementation are relatively unimportant, so the main determining factor in a country's trajectory from lockdown onwards is the state it was in at lockdown (a state that we can only work out later). And second, adding on to that, that countries that test more will do better as they will have a better idea of where the problems are.

You can then make predictions about where things will go in a place by a) adding in further assumptions based on the various estimates of time-lag regarding incubation-time and getting very sick and dying time, and b) comparing its data on all the aspects relevant to these assumptions to how things have gone in other places.

It seems to be broadly working, despite all the problems in the data. :hmm:
 
Yeah good point about assumptions. However, you can still get somewhere with that by making predictions based on those assumptions, no?

For instance, in trying to work out where this is going (doing it for its own sake, not professionally), I've made a few basic and broad assumptions. First that lockdown works in reducing spread and differences in lockdown effectiveness due to differences in the details of implementation are relatively unimportant, so the main determining factor in a country's trajectory from lockdown onwards is the state it was in at lockdown (a state that we can only work out later). And second, adding on to that, that countries that test more will do better as they will have a better idea of where the problems are.

You can then make predictions about where things will go in a place by a) adding in further assumptions based on the various estimates of time-lag regarding incubation-time and getting very sick and dying time, and b) comparing its data on all the aspects relevant to these assumptions to how things have gone in other places.

It seems to be broadly working, despite all the problems in the data. :hmm:
I agree. You’re taking advantage of the fact that although the data has big problems, some of those problems (eg reporting lags) are fairly common across data sets. Inaccuracies are not a big when they are consistent across time and across datasets.

Trying to correlate is where it gets stupid, because all the problems come home to roost. You generally need 50 good quality data points to establish correlation even between two datasets that are each producing data that is consistent within the dataset. Literally none of that applies when correlating population density to deaths for a handful of countries,

The evidence that it is nonsense is the results he turns up. Do we really believe that number of deaths (which is actually a dependent variable, not merely a correlate) are independent to the exposure size (ie population)? Or that the timing of the lockdown is essentially irrelevant (as pointed out by redsquirrel because R-squared=0.15) ? Either our understanding of the real world is flawed or those graphs are.
 
Don’t disagree with any of that kabbes but this is an evolving situation and the modelling should be expected to develop as time goes by. This is what the experts are currently working on, to improve the modelling and therefore the understanding of the pandemic.

As acknowledged by the author there are problems with the data but it’s the best data we have at the moment. Understanding the problems with it hopefully allows better data sources to be developed.
 
Portugals schools closed in the middle of last month but the new term started nationally yesterday online and via a channel that RTP ( sort of their BBC) agreed with the govt to be used. Are we doing anything similar in the UK?
 
Don’t disagree with any of that kabbes but this is an evolving situation and the modelling should be expected to develop as time goes by. This is what the experts are currently working on, to improve the modelling and therefore the understanding of the pandemic.

As acknowledged by the author there are problems with the data but it’s the best data we have at the moment. Understanding the problems with it hopefully allows better data sources to be developed.
The graphs posted on Twitter aren’t in any sense “modelling”. He’s just bunged some uncleansed data through Excel to get a correlation factor.
 
Back
Top Bottom